Excel Box and Whisker Diagrams (Box Plots)

Box and Whisker Charts (Box Plots) are commonly used in the display of statistical analyses. Microsoft Excel does not have a built in Box and Whisker chart type, but you can create your own custom Box and Whisker charts, using stacked bar or column charts and error bars. This tutorial shows how to make box plots, in vertical or horizontal orientations, in all modern versions of Excel.

Excel Box and Whisker Diagram

 

In its simplest form, the box and whisker diagram has a box showing the range from first to third quartiles, and the median divides this large box, the “interquartile range”, into two boxes, for the second and third quartiles. The whiskers span the first quartile, from the second quartile box down to the minimum, and the fourth quartile, from the third quartile box up to the maximum.

Sample Data and Calculations

To play along at home in Excel 2007 or 2010, download the workbook Excel_2007_Box_Plot_Workbook.xlsx.

Let’s use the following simple data set for our tutorial. The values were taken from a normally distributed population with a mean of 10 and standard deviation of 5. There are four sets of 20 values.

All of these values are positive. If your data set has mixed positive and negative values, this technique requires major modifications.

Excel Box Plot Data

First, insert a bunch of blank rows, and set up a range for calculations. Only the horizontal version of the box plot uses the last calculated row, “Offset”. It will not hurt to include it in the vertical box plot’s calculations.

Excel Box Plot Calculations

First, compute some simple statistics, such as the count, mean, and standard deviation. The formulas used in column B are shown in column G of the screen shot.

Excel Box Plot Calculations

Now let’s compute the minimum and maximum, median, and first and third quartiles.

Excel Box Plot Calculations

Finally, let’s determine which values we need to plot. Our chart has a box for the second quartile, which shows the difference between median and first quartile calculated above. It has a box for third quartile, which show the difference between the third quartile calculation and the median. The bottom of the lower box rests on the first calculated quartile. The down whisker is as long as the first quartile minus the minimum, and the up whisker is as long as the maximum minus the third quartile.

Excel Box Plot Calculations

The offset values are calculated as follows: In my example, I have four categories, Alpha through Delta. I can divide my horizontal chart into four horizontal strips, numbered from 0 to 4, each containing one box-and-whisker unit. I need to position my average points in the middle of each 1-unit horizontal strip. These will ultimately go onto a secondary vertical axis which I will have conveniently scaled from 0 to 4. Hence the Y values I will need are 0.5, 1.5, 2.5, and 3.5.

Chart Construction

Select the header row of the calculated data, then hold Ctrl while selecting the three rows that include Bottom, 2Q Box, and 3Q Box. This multiple-area range is highlighted in orange below.

Excel Box Plot Chart Data

With this range selected, insert a stacked column chart or a stacked bar chart. Be sure to use the stacked version, and not the 100% stacked version, of the column or bar chart.

Vertical Excel Box Plot (step 1)Horizontal Excel Box Plot (step 1)

The labels in the bar chart go bottom-to-top. To reverse the labels, select the vertical axis, press Ctrl-1 (numeral one) to open the Format Axis dialog, then check the “Categories in Reverse Order” box, then under “Horizontal Axis Crosses”, select “At maximum category”.

Horizontal Excel Box Plot (step 1a)

To add the down whisker, select the Bottom series, then in the Chart Tools > Layout tab, click Error Bars, and select More Error Bar Options from the bottom of the menu. Choose the Minus direction, select Custom for Error Amount, and click on Specify Value. Leave the contents of the Positive Error Value box alone (“={1}”) in the mini dialog that appears, then clear the Negative Error Value box and select the Whisker- row from the table (B14:E14). Click OK and Close to get back to Excel. These “down” error bars (whiskers) extend from the bottom (left) edge of the 2Q Box downward (leftward) into the Bottom series.

Vertical Excel Box Plot (step 2)Horizontal Excel Box Plot (step 2)

To add the up whisker, select the 3Q Box series,then in the Chart Tools > Layout tab, click Error Bars, and select More Error Bar Options from the bottom of the menu. Choose the Plus direction, select Custom for Error Amount, and click on Specify Value. Leave the contents of the Negative Error Value box alone (“={1}”) in the mini dialog that appears, then clear the Positive Error Value box and select the Whisker+ row from the table (B15:E15). Click OK and Close to get back to Excel.

These “up” error bars (whiskers) extend upward (rightward) from the top (right) of the 3Q Box.

Vertical Excel Box Plot (step 2)Horizontal Excel Box Plot (step 2)

Now we can format the boxes. Select the Bottom series, and apply no fill and no border, so it is hidden. Then select each of the 2Q Box and 3Q Box series, and apply a dark border and a light fill.

Vertical Excel Box Plot (step 3) – – Horizontal Excel Box Plot (step 3)

Adding the Mean

To add the mean as a series of markers, select the Mean row in the calculated range (highlighted in blue). If you are making a horizontal box plot, hold Ctrl and also select the Offset row (highlighted in green), so both areas are selected. Copy the selected range.

Excel Box Plot Mean Data

Select the chart, and use Paste Special to add the data as a new series. If you are making a horizontal box and whisker diagram, check the “Category (X Labels) in First Row” box. The “Series Names in First Column” box should already be checked.

The new series is added as another column or bar stacked on top of the existing ones.

Vertical Excel Box Plot (step 4) – – Horizontal Excel Box Plot (step 4)

Select this new series, then on the Chart Tools > Design tab, click on Change Chart Type. If you are making a vertical box plot, choose a Line Chart style. If you are making a horizontal box plot, choose an XY Scatter style.

Vertical Excel Box Plot (step 5) – – Horizontal Excel Box Plot (step 5)

The points in the horizontal box plot are in reverse order. To change the order of points, select the secondary vertical axis (right edge of the chart), press Ctrl-1 (numeral one) to open the Format Axis dialog, then check the “Values in Reverse Order” box.

Horizontal Excel Box Plot (step 5a)

If you’re making a horizontal box plot in Excel 2003, this last process is a little more involved. Excel draws both secondary axes, but the vertical one is hidden behind the primary axis with the text labels (below left). Double click on the secondary horizontal axis (top of chart), and on the scale tab of the Format Axis dialog, check “Value (Y) Axis Crosses at Maximum Value” (below right).

Horizontal Excel 2003 Box Plot (step 5) – – Horizontal Excel 2003 Box Plot (step 5c)

Excel 2003, continued: Double click the secondary vertical axis (right of chart), and on the scale tab, check “Values in Reverse Order” and uncheck “Value (X) Axis Crosses at Maximum Value” (below left). Finally, select the secondary horizontal axis (top) and click Delete; Excel will now plot the XY series on the primary horizontal axis.

Horizontal Excel 2003 Box Plot (step 5d) – – Horizontal Excel 2003 Box Plot (step 5e)

All Versions: Now format the mean series: remove the line, and use an appropriate marker of a contrasting color. If you’ve made a horizontal box plot, hide the secondary Y axis (right edge of the chart) by choosing no tick marks, no tick labels, and no line in the Format Axis dialog.

Vertical Excel Box Plot (finished) – – Horizontal Excel Box Plot (finished)

That was easy and didn’t take too long.

 

Box and Whisker Charts in Peltier Tech Charts for Excel 3.0

Peltier Tech Charts for Excel 3.0

This tutorial shows how to create Box and Whisker Charts (Box Plots), including the specialized data layout needed, and the detailed combination of chart series and chart types required. This manual process takes time, is prone to error, and becomes tedious.

I have created Peltier Tech Charts for Excel 3.0 to create Box Plots (and many other custom charts) automatically from raw data. This utility, a standard Excel add-in, lays out data in the required layout, then constructs a chart with the right combination of chart types.

The utility creates vertical Box Plots …

… horizontal Box Plots …

… and Grouped Box Plots …

Outliers can be shown or hidden, and a number of quartile definition options are available.

This is a commercial product, tested on thousands of machines in a wide variety of configurations, Windows and Mac, which saves time and aggravation.

Please visit the Peltier Tech Charts for Excel 3.0 page for more information.

 

Peltier Tech Chart Utility

Filled Histograms Using Excel XY-Area Charts

I recently showed how to create Histograms Using Excel XY Charts. This technique produces a human-friendly numerical X axis scale, which is easier to read and harder to be deceived by than the bin labels used by column chart histograms.

Histogram Using XY Chart

The drawback of that technique is that it produces histogram bars in outline only, without a fill color. In this post I will show how to extend that technique to fill the bars, using the protocol from Fill Below an XY Chart Series with an XY-Area Combination Chart.

Filled XY Chart Histogram

I started with the data from Histograms Using Excel XY Charts. I added a row at the top of the data for the left edge of the chart, and another row at the bottom for the right edge of the chart. I inserted a column between the Time and Count columns, containing calculated X values for the area chart that will provide the fill color.

The calculated values in the middle column are based on the small table to the right. It contains the minimum and maximum X values (minimum and maximum times) from the XY chart, and the minimum and maximum values I will use in the area chart series. Using 1000 for a chart that’s a few hundred pixels wide is a reasonable resolution, though you could use something larger, like 10,000. The conversion uses this formula (fill in the appropriate cell addresses for your worksheet):

Area Value = Area Scale Min + (Time - Time Min) / (Time Max - Time Min) * (Area Scale Max - Area Scale Min)

or something like this, depending on where the tables are in the worksheet:

=(A2-$F$2)/($F$3-$F$2)*($G$3-$G$2)+$G$2

Data for XY-Area Chart Filled Histogram

We need to use both XY and Area chart types. The XY chart series provides the X axis formatting and border, and the Area chart series provides the fill.

If we just used an area chart, the time points would be equally spaced, so the points at the same time would not occur at the same X position, but at consecutive positions. There would be no vertical lines between bars; instead the bars would be shaped like trapezoids.

Area Chart Histogram

When we convert this axis to a date type, all the times are less than one (dates are whole numbers, remember, and times are fractions). The date axis treats days as whole number values, spaced according to their values, and ignores any times. Since the times occur within the same day (day zero), they occur along the vertical line at day zero, or January 0, 1900. To get the area series to look like bars with vertical sides, we need the times to be some integer multiple, hence our calculated values in the middle column.

Area Chart Histogram

Note: In cases where your bin boundaries are all integers, you can get away with using an area chart series without the XY chart series needed for the borders of the bars and for the axis. Simply use the protocol in Histograms Using Excel XY Charts with an area chart, and format both the fill and border of the area chart series as desired.

So let’s make the chart. Select the Time values and Counts in the first and third columns, not including the added first and last row, and create an XY chart. (Use Ctrl+Click to select multiple regions.) With one exception, these are the Excel 2003 version charts, but the Excel 2007 charts look the same, and with one exception, the steps are the same.

Filled XY Chart Histogram - Step 1

Copy the Area and Counts in the second and third columns, including the first and last row. Select the chart, and use Paste Special to add the data as a new series, categories in first column. Since the new X values are so much larger than the old, the first series is squashed against the left hand axis.

Filled XY Chart Histogram - Step 2

Format the new series so it appears on the secondary axis.

Filled XY Chart Histogram - Step 3

Excel adds only the secondary vertical axis, so we need to add the secondary horizontal axis. In Excel 2007, go to Chart Tools > Layout tab > Axes; in Excle 2003, go to Chart menu > Chart Options > Axes tab. The first axis makes its reappearance.

Filled XY Chart Histogram - Step 4

Convert the second series (red lines above) to an area type (colored in light blue with no border below).

Filled XY Chart Histogram - Step 5

Convert the secondary horizontal axis to a date scale axis. Excel 2007: right click on the axis and select Format Axis; Excel 2003: go to Chart menu > Chart Options > Axes tab. It’s filled above the lines, not below, but we’ll take care of that shortly.

Filled XY Chart Histogram - Step 6

Format the secondary horizontal axis so it is hidden. No line, tick marks, or tick labels.

Filled XY Chart Histogram - Step 7

In Excel 2003 you need to format the secondary vertical axis to make the secondary horizontal axis cross at the bottom. Uncheck the Axis Crosses At Maximum Value box.

Filled XY Chart Histogram - Step 8

Ignore this step in Excel 2007. If you make the date axis cross at the bottom, even if it’s hidden, it will carve out a little space for itself, and push the other horizontal axis away from the data. No matter how you juggle primary and secondary axes, no matter what order you do the steps, no matter which incantations you may use. Speaking of incantations, I have uttered many an epithet over this behavior. It’s only a few pixels, but it looks like crap. They must have been celebrating Geek Liberation Day or something the day they programmed this part of the Excel 2007 charting module. (I don’t know whether Excel 2010 has addressed this issue.)

Filled XY Chart Histogram - Step 8

The Excel 2007 chart is still filled light blue above the line. Select the secondary vertical axis, and press Delete. The Excel 2003 chart is filled below the line, so all we need to do is hide the secondary vertical axis (no line, no tick marks, no tick labels) or delete it.

Filled XY Chart Histogram - Step 9

Format your X axis with human-friendly scale parameters, add a few titles, et voilà! Oh yeah, I used a thicker line when the histogram was unfilled, but with it filled, you can use a thin line for the borders of the bars.

Filled XY Chart Histogram

There is a slight misalignment of area and XY series in the exported Excel 2003 chart, evidenced by the sliver of white in a couple of the bars. This does not appear in the actual chart, only in the exported image. Excel 2007 does not exhibit this problem. If I worried about it, I could probably fudge the formula used to compute the area chart’s X values. Perhaps adding 1 to all the calculated values except in the first and last (highlighted) rows would do the trick, or using a larger scaling factor (10,000 rather than 1000), and nobody’d be any the wiser.

To me this is much less aggravating than the gap at the bottom of the Excel 2007 chart I had a mini-rant about earlier, because there is no teeny fudge factor that makes that misalignment go away, other than the scorched-earth approach of deleting the vertical axis.

 

Peltier Tech Chart Utility

Histograms Using Excel XY Charts

I wrote a tutorial about Histograms long ago, and haven’t used any of that for my own histograms in years. I don’t use regular column charts, unless I need something really quick and dirty. Instead, I use XY charts for various reasons, and I follow the protocol in this post.

Histogram Using XY Chart

A histogram is basically a column chart with value ranges (bins) laid out in numerical order along the horizontal axis, and the height of the columns encodes the number of observations within that value range.

Data for a histogram typically looks like this, with the bin label in the first column and the count (frequency) within each bin in the second column.

Histogram Data in Bins

The typical Excel histogram is a column chart from this data. It shows the data well enough, although having more than a dozen bins squashes the labels together making them hard to read.

Histogram Using Column Chart

Rotating the labels 90° unsquashes them, but they are equally hard to read unless you awkwardly swivel your neck.

Histogram Using Column Chart

Another problem with a plain old column chart is that the bins look numerical, but the axis is categorical, and does not represent the numerical values of the bin boundaries. Unless you can do math in your head, and can force your mathematical brain to ignore what your visual brain sees, you don’t know that the bins are of equal width. In this case they are equal, but I’ve seen histograms which have unequal bin widths represented by equal width bars. This is just another way of distorting the data.

To correct this problem, let’s modify the data, so we place the bin start values in the first column, and the count for the bin starting at each of these values in the second column.

Histogram Data Table

Now the fun begins. This is a similar exercise to that used to generate step charts, but we need not just a vertical line segment connecting adjacent horizontal steps, we need vertical line segments connecting these steps to the horizontal axis at the bottom of the chart. So we need to modify the simple table above.

In the first block below I have stacked up three copies of the data. These are color coded so we can trace the data through the next steps.

In the second block I have indicated two cells that need to be deleted, with cells shifting up to fill the gap. The third block shows the data after deleting these two cells.

Finally I sorted the data by the first column, resulting in the striped last block below.

Data for XY Chart Histogram

The rest is easy. Make an XY chart, using the lines-without-markers option, from this last block of data, and format it to look nice. The lines connecting the XY points trace the sides and tops of each histogram bar.

Histogram Using XY Chart

The advantage of this type of histogram is that the X axis is truly numerical. The axis can be formatted with the minimum, maximum, and spacing that you want. The min, max, and spacing can be natural “human-friendly” values, not values defined by statistical calculations to excessive significant figures. And you can see by inspection that the bins are of equal width.

The disadvantage of the XY chart histogram is that the bars are only present in outline: they are not filled with your favorite colors. In the next article I will show a technique to fill these bars. If you want a head start, you can refer back to the tutorial that shows how to Fill Below an XY Chart Series with an XY-Area Combination Chart.

Filled XY Chart Histogram

 

Peltier Tech Chart Utility

Main Effects and Interaction Plots

In Dating Site Photo Effectiveness I proposed dot plots to show how different topics of profile pictures lead to different success rates of attracting attention from potential dates. The original analysis in The 4 Big Myths of Profile Pictures used bar charts which were potentially confusing because the origin of the bars was not zero, but instead was the average of all the data. Since my earlier post, Nathan wrote Get a Date With Your Online Profile Pic – Myths Debunked in his Flowing Data blog, and I was inspired to write about some simple graphical statistical tools.

The original analysis showed also the effects of facial expressions and eye contact on photo effectiveness. Profile pictures of women making eye contact are more effective than those without eye contact, for all of the facial expressions. Flirty-face pictures with eye contact are the most effective, while flirty-face pictures without eye contact are least effective.

Women's Photo Effectiveness Factors - Original Bar Chart

Bar charts show the data reasonably well. When the bars overlap, they may lead to conclusion, because the front bars partially obscure the back bars, and the back bars may appear smaller than they actually are because of this obstruction. It’s also not as easy to see the relative effects.

Main Effects Plots

When performing a statistical analysis, one of the simplest graphical tools at our disposal is a Main Effects Plot. This plot shows the average outcome for each value of each variable, combining the effects of the other variables as iff all variables were independent. In the chart below, we see that the averages for smiling (with and without eye contact) is highest, the average for not smiling is lowest, and flirty-face lies in between. The average for eye contact is greater than for no eye contact (for all facial expressions combined). We could have estimated these effects from the bar chart above, but it’s helpful to take the time to plot these effects.

Women's Photo Effectiveness Factors - Main Effects Plot

The data is shown below, with ranges shaded to match the color of the plotted points. The two columns to the left of the data are used to produce the two-level category axes in the charts. There is no need to label the series, since the series identification is simplified by these dual axis labels.

Note: I’ve used simple averages from the data in the original article’s charts, but in a real analysis you would have to weight the averages by the proportion of individuals using each level of each factor.

Women's Photo Effectiveness Factors - Main Effects Plot Data

The main effects plot is simple and does not provide a great deal of information. Showing just the main effects of each factor level without accounting for the levels of other factors is simplistic and misleading.

For example, we might assume that the best success would come from smiling and making eye contact, and the worst from not smiling and not making eye contact. In fact, the flirty-face picture accounts for the best success (with eye contact) and the worst (without eye contact).

Interaction Plots 1

Another graphic statistical tools at our disposal is called an Interaction Plot. This type of chart illustrates the effects between variables which are not independent. Such a plot looks like the charts here. There are two versions, to illustrate better the effects of eye contact and of facial expression.

In fact, the original bar chart above shows interactions between factors, but it is more effective to use markers with connecting lines to display the data.

Women's Photo Effectiveness Factors - Interaction Plot Women's Photo Effectiveness Factors - Interaction Plot

The two charts show the effect of expression for the two eye contact categories (left) and the effect of eye contact for the three expressions (right). The charts can be made easily using data with the appropriate arrangement. The chart on the right shows that the flirty-face expression is most effective, with eye contact. Without eye contact, this expression is a loser.

Women's Photo Effectiveness Factors - Interaction Plot Data – – – Women's Photo Effectiveness Factors - Interaction Plot Data

The two charts need independent data ranges. The two columns to the left of the data are used to produce the two-level category axes in the charts. There is no need to label the series, since the series identification is simplified by these dual axis labels.

Interaction Plots 2

An alternative and perhaps more common layout for interaction charts is shown below. As above, the two versions emphasize the effects of facial expression and of eye contact.

Women's Photo Effectiveness Factors - Interaction Plot Women's Photo Effectiveness Factors - Interaction Plot

The series use the same single tier of category labels, and the lower tier of labels has been replaced by data labels on the series themselves. This alternative has the possibility of becoming more cluttered, but it also shows interaction effects more clearly.

For example, the nearly parallel smiling and not smiling lines in the right chart above indicate only a very weak interaction between eye contact and smiling, but the much steeper flirty-face line shows a stronger effect (or an interaction) of eye contact when the facial expression is a flirty face. The nearly parallel segments in the left hand chart show the same weak interaction between smiling and eye contact, while the huge difference between the two flirty-face data points show a strong interaction.

Women's Photo Effectiveness Factors - Interaction Plot Data – – – Women's Photo Effectiveness Factors - Interaction Plot Data

These two charts can use the same data range, using either columns or rows for the series data. Using series names as data labels adjacent to data points effectively identify the data.

Extension

The same approach above provides insights into the photo effectivemess for male subjects.

Here is the original bar chart.

Men's Photo Effectiveness Factors - Original Bar Chart

Main effects plot.

Men's Photo Effectiveness Factors - Main Effects Plot

Interactions plots, both variations.

Men's Photo Effectiveness Factors - Interaction Plot Men's Photo Effectiveness Factors - Interaction Plot

Men's Photo Effectiveness Factors - Interaction Plot – – Men's Photo Effectiveness Factors - Interaction Plot

 

Peltier Tech Chart Utility

Peltier Tech Charts for Excel 3.0

 

Create Excel dashboards quickly with Plug-N-Play reports.