Any student of statistics, in fact, any middle school student, has constructed a box plot. A simple box plot (or box-and-whisker chart), like the one above, needs five parameters for each category. These are the minimum and maximum values for that category, the median value (middle value), and the first and third quartiles. In fact, the median can also be called the second quartile. The box plot above is not a meaningless sample of the chart type: it shows the variation in quartiles determined by several different methods. These will be described in excruciating detail in this tutorial.
A useful, if vague, definition of quartile is “one of three values that approximately divide a sorted data set into four parts of equal size”. This division is easy and exact, if the number of values in the set is evenly divisible by four. But in the majorityof cases, it is less certain.
Many techniques have been put forth for determining quartiles, and mostly they resolve into the handful of methods shown above, which are used by software packages. The techniques give similar, though not exactly the same, results. In this document I will describe these definitions of quartiles in hopes of shedding some light on this topic, which is more widely used than understood.
I am not a statistician, but I’ve had to understand quartiles for my Box Plot Utility. Many of my users wonder about how quartiles are calculated, so I’ve decided to document my understanding. If you have further questions, or if you find any mistakes, please let me know in the comments.
The median is the central value in a sorted data set. If the values are listed from left to right in order of increasing value, there are as many values to the left of the median as to the right.
Determining the median is easy. If there is an odd number of values, the median is the value in the middle. For example, in this set of nine values, the median is the fifth value (in this case, 5), with four values below it and four above.
If there is an even number of values, the median does not correspond to a value in the data set. Instead the median is the average of the largest value in the lower half and the smallest value in the higher half. In this set of eight values, the median separates the bottom four from the top four, so we define it as the average of the fourth and fifth values, in this case, 4.5.
For a small number of simple data sets, the definition of quartiles is as easy, but usually it’s more involved. Even when it’s easy, the statistical treatments make it seem harder than it is.
Hinge Techniques for Determining Quartiles
This topic is covered in the companion page Hinges.
Interpolation Methods of Determining Quartiles
This topic is covered in the companion page Quartiles.
Comparison of Values from All Hinge and Quartile Methods
This topic is covered in the companion page Comparison.
Quartiles in the Peltier Tech Box Plot Utility
This topic is covered in the companion page Quartiles in the Peltier Tech Box Plot Utility.
I found innumerable sources for this information about quartiles. Most were either very basic, or not useful at all. The following three are the most useful links I found.
Quartiles in Elementary Statistics
Eric Langford, California State University, Chico
Journal of Statistics Education Volume 14, Number 3 (2006).
This paper had an extensive and highly mathematical discussion of the methods described here, and several others.
Quartiles: How to calculate them?
David Journet, iTSS Wallingford
This short paper provided a summary of the SAS, Minitab, and Excel methods, supporting the information in the first reference.
Calculating Quartiles: Why Computer-Generated Results Don’t Always Agree
Delmar E. Searles, Asbury University
This article was the only place I’d ever seen a number line used to explain the difference between the N-1 and N+1 approaches to percentile definitions. I found this description almost intuitive, and decided to adopt it for all of my descriptions here. We are, after all, visual creatures, and most of us are predominantly visual learners.