Simple Box Plots
by Jon Peltier
Tuesday, September 8th, 2009
Peltier Technical Services, Inc., Copyright © 2010.
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
I’ve written several tutorials about creating box and whisker charts, including Horizontal Box Plots and Vertical Box Plots. I’ve also created a professional Box Plot Utility that generates box and whisker charts from raw observations. These techniques are complicated, since they are built using bar or column charts, with extra sets of series to accommodate data sets with positive and negative numbers. There is a simpler technique that works for vertical box plots with positive and negative values, without loads of column series.
This simple approach uses line charts, and the up-down bars and high-low lines that can be added to line charts. I’ve described these embellishments in Stock Charts and Other Line Chart Tricks. When two or more line chart series are present in a chart, the up-down bars draw vertical bars from the first line series to the last, and the high-low lines are drawn behind the up-down bars (if present) from the lowest to highest line series value. The formatting of up bars (the last series is greater than the first) is distinct from that of down bars (the last series is less than the first).
A stock chart is simply a line chart that uses these embellishments. When data is entered in Open-High-Low-Close order, the up-down bars and high-low lines are used to create candlestick-style stock charts in Excel. Excel 2003 also accepts data in Open-Low-High-Close order, while Excel 2007 insists upon OHLC. Excel 2003 allows the price data to be arranged in columns (as such data is usually provided) or in rows. Excel 2007 insists on price data in columns only. If you start with a line chart and add these graphic features, the arrangement of data is ignored.
Excel 2003 and Earlier
We’ll start with this set of calculations, based on three samples of a larger population that has mean of 25 and standard deviation of 10. The First and Third Quartiles are listed first and last so the up-down bars represent the inner quartiles of the box plot. The Min and Max will be used by the high-low lines, and the intermediate Mean and Median have no effect on the added features.

Here is the line chart produced from the data. (You could instead start with a candlestick stock chart using First Quartile, Min, Max, and Third Quartile data, add the Median and Mean data, and rearrange the series, but I think the way I’ve done it is easier and more straightforward.)

In Excel 2003, the Format Series dialog with any of the lines selected offer Up-Down Bars and High-Low Lines on the Options tab.

Hide all of the series lines, and all markers except for Median and Mean, and we have a serviceable box plot. Here the “Dash” marker is used for median.

If desired, the Min and Max series can also be formatted with a dash marker to provide end caps for the whiskers (high-low lines).

The dash marker is a horizontal line, with a minimum thickness of 2 pixels. The markers above are 14 points wide and still 2 pixels high. At 15 points, the markers expand to 3 pixels. At the size required to match the width of the up-down bars, the dashes are ridiculously thick (and even the two pixel thickness above is too thick).

Particularly ridiculous when used for the end caps too.

Horizontal median lines are relatively easy to obtain. Change the chart type of the median series from Line to XY, and if necessary format it to appear on the primary axes with the line series. Remove its marker, and apply positive and negative horizontal error bars with a value of 0.2.

The default gap width for the up-down bars is 150% of a bar width, the bar is 100% of a bar width, and the gap plus the bar are 250%, so half the bar is 50%/250% or 0.2.
You can add duplicate Max and Min series, convert them to XY series, and format them with no marker and error bars of ± 0.2, to get matching end caps for the whiskers (high-low lines).

Excel 2007 (and Later?)
The protocol in Excel 2007 is similar. Create the line chart.

Add up-down bars and high-low lines using the Chart Tools > Layout tab of the ribbon. Don’t panic if the Mean and Median markers disappear. In Excel 2003 markers always appear in front of the up-down bars, and lines behind them, but in 2007 if the series has lines, then the markers stay behind the bars with the lines.

Remove the series lines, and the markers appear.

Change the markers, and use the dash for Median. The dash gets wider in 2007 than in 2003, so it looks silly sooner.

As in 2003, the median series can be changed to an XY type, moved to the primary axis, and given horizontal error bars. Adding error bars in Excel 2007 is a trick, though. You have to add them using the Chart Tools > layout ribbon tab, and this adds both vertical and horizontal error bars, with the Format Vertical Error Bars dialog showing. Clear the dialog, select the vertical error bars, and press Delete. Then select the horizontal error bars, which have a default length of 1, press Ctrl+1 to open the Format Horizontal Error Bars dialog, change the value to 0.2, and remove the end caps.

Likewise, you can add duplicate Min and Max series with error bars, to provide end caps to the whiskers (high-low lines).

Related Posts:
- Candlestick Alternative: Individually Colored Up-Down Bars
- Simple Waterfall Chart with Up-Down Bars
- Callout Labels with Error Bars
- Box and Whisker Chart Median Options
- Announcing the Box and Whisker Chart Utility
- Add Individual Target Lines To Each Cluster in a Column Chart
- Stock Charts in Excel 2007
- High-Low Line Alternatives in Excel Charts
- Callout Labels with XY Line Segments
- Clustered-Stacked Column Charts with Vertical Separators
Posted: Tuesday, September 8th, 2009 under Chart Types.
Comments: 4
Comments
Comment from Bob
Time: Tuesday, September 8, 2009, 7:44 pm
My 2 Canadian cents,
I think the error bar functionality in Excel 2007 is a defect, masquerading as a feature.
If it were not for the kind souls, like Jon here, we’d all be up the creek without a visible means of propulsion.
Bob
Comment from Matt Healy
Time: Thursday, September 10, 2009, 9:45 pm
Pretty cool use of the stock chart and error bar features. My own habit has typically been to use an XY chart and basically draw everything with lines created by making auxiliary data series to the spreadsheet. I can remember making plots with old fashioned pen plotters where I had to calculate the coordinates of every point in Fortran, so using Excel as a glorified etch-a-sketch comes naturally to me…
Pingback from Charts Are for Sissies (Grumpy Old Man’s Guide to Charts)
Time: Tuesday, December 8, 2009, 2:31 pm
[...] and interpret a box-and-whisker plot then you’re on the right track (extra points if you can do it in Excel). If not, do yourself a favor and find a good entry-level statistics [...]
Pingback from Charts Are for Sissies (Grumpy Old Man’s Guide to Charts)
Time: Monday, February 15, 2010, 9:40 am
[...] and interpret a box-and-whisker plot then you’re on the right track (extra points if you can do it in Excel). If not, do yourself a favor and find a good entry-level statistics [...]



















Write a comment
I welcome comments from my readers. If you have an opinion on this post, if you have a question or if there is anything to add, I want to hear from you. Whether you agree or disagree, please join the discussion.
If you want to include an image in your comment, post it on your own site or on one of the many free image sharing sites, and include a link in your comment. I'll download your image and insert the necessary html to display the image inline.
Read the PTS Blog Comment Policy.