Peltier Technical Services, Inc.
 

Excel Chart Add-Ins | Training | Charts and Tutorials | Peltier Tech Blog


Peltier Tech Chart Utility

 

Excel Dashboards

 

Books at Amazon.com

 

Histograms in Microsoft Excel.


 

A histogram is a column chart which shows the distribution of a population's values over the range of values. There are a few ways to generate histograms in Excel. We'll ignore the trivial case in which you type in and plot the totals directly. I am basing the following example on the following table of 30 values, generated from a normally distributed population with an average of 10 and standard deviation of 2.

{8.0371603, 8.1436440, 10.125382, 8.9247589, 9.0852620, 8.5756267, 9.6140195, 10.197724, 10.110226, 8.3650832, 9.2756920, 8.9988418, 10.678214, 10.042841, 11.715097, 12.481356, 9.7590839, 7.8203231, 9.7956774, 7.4370712, 12.433889, 11.021358, 12.771175, 11.906137, 8.3745556, 7.7164043, 8.6591274, 9.4994391, 6.2430774, 9.6564702}

Using the Analysis Toolpak

Excel comes bundled with an add-in called the Analysis Toolpak (or ATP), accessible through the Data Analysis command on the Tools menu. The Analysis Toolpak provides many statistical functions not available directly through Excel. Depending on your installation, the Analysis Toolpak may or may not be installed.

Installing the Analysis Toolpak (from Microsoft Help Files)

If the Tools menu has a Data Analysis command, the Analysis Toolpak is installed and you can skip this section. Otherwise you must install the Analysis ToolPak.

First, on the Tools menu, click Add-Ins. If Analysis ToolPak is not listed in the Add-Ins dialog box, click Browse and locate the drive, folder name, and file name for the Analysis ToolPak add-in, Analys32.xll usually located in the Microsoft Office\Office\Library\Analysis folder or run the Setup program if it isn't installed.

Select the Analysis ToolPak check box.

Creating an Analysis Toolpak Histogram Without Specifying Bins

The ranges within which you want to sort values are the bins. You can specify the bins into which you want your raw data values placed, or you can let the ATP choose bins. The ATP simply divides the range of data, from the minimum to the maximum, into five equally spaced bins. A value is placed into the first bin with an equal or higher value.

Select Data Analysis from the Tools menu, and choose Histogram from the list of analysis tools. For Input Range, select the range of data. Check the Labels box if you include the header in the selected range. Under Output Options, you can select whether to put the output on a new worksheet ("ply") or in a range on the active sheet. Also you can choose whether to make a Pareto chart, a special histogram with the columns sorted from tallest to smallest, whether to show cumulative percentages in addition to bin totals, and whether to draw a chart. I assume if you're reading this page, you want to check this option.

The output table and chart from this procedure are shown below. The table is clear, although the bin settings are probably not ideal. The first bin has one value, because it was defined for the minimum in the data set. The second bin contains all values greater than the minimum but within the first fifth on the range, an so on up to the last bin, "More", which means the top fifth of the range.

The chart leaves much to be desired. The chart area itself is too small. The plot area is barely larger than the legend, and the margins are too large. But fortunately we know how to reformat charts.

BinFrequency
6.24307741
7.54869711
8.85431688
10.15993612
11.4655563
More5
   

Creating an Analysis Toolpak Histogram With Specified Bins

Select Data Analysis from the Tools menu, and choose Histogram from the list of analysis tools. For Input Range, select the range of data. For Bin Range, select the range containing your bins. Check the Labels box if you include the header in the selected ranges (you must specify Labels for both ranges, or neither range). Under Output Options, you can select whether to put the output on a new worksheet ("ply") or in a range on the active sheet. Also you can choose whether to make a Pareto chart, a special histogram with the columns sorted from tallest to smallest, whether to show cumulative percentages in addition to bin totals, and whether to draw a chart. The bins I've specified are shown here:

{7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0, 12.5, 13.0}

The output table and chart from this procedure are shown below. The table is clear, and makes more sense because we have selected bin values according to our data.

As before, the chart is not very appealing or informative. The chart area is too small. The plot area is barely larger than the legend, and the margins are too large. Again we'll have to reformat the chart.

BinFrequency
71
7.51
82
8.54
94
9.53
104
10.54
111
11.51
122
12.52
131
More0
   

The following shows how the output is enhanced for Cumulative Percentages and for Pareto. An extra column is inserted for the percentages, and the table is duplicated with the values sorted from largest to smallest in the right hand half of the table. The chart has grown even more grotesque, with the legend becoming larger than the plot area, so it's going to need some attention.

BinFrequencyCumulative %BinFrequencyCumulative %
713.33%8.5413.33%
7.516.67%9426.67%
8213.33%10440.00%
8.5426.67%15453.33%
9440.00%9.5363.33%
9.5350.00%8270.00%
10463.33%12276.67%
15476.67%12.5283.33%
11180.00%7186.67%
11.5183.33%7.5190.00%
12290.00%11193.33%
12.5296.67%11.5196.67%
131100.00%131100.00%
More0100.00%More0100.00%
   

Dynamic Do-It-Yourself Histograms

There are times when you may want to bypass the ATP and build your own tables and histograms. The ATP calculates its values and pastes them into the output range as values, not as formulas. If the underlying data changes, the ATP output ranges do not change. This may be desired behavior in some cases, but other times it can be inconvenient. This technique places formulas into the output range, so the values adjust to changing data.

Start by placing your bin settings into a column of cells. Select a columnar range of cells, one row taller than the range of bins (if any values are expected to exceed the largest bin), type a formula like this:

=FREQUENCY(DataRange,BinRange)

and make it into an array formula by holding down Ctrl+Shift while pressing Enter. For my example, with the data in $A$2:$A$31 and the bins in $C$2:$C$14, the formula is:

=FREQUENCY($A$2:$A$31,$C$2:$C$14)

When correctly entered as an array with Ctrl-Shift-Enter, Excel places curly braces around the formula like this:

{=FREQUENCY($A$2:$A$31,$C$2:$C$14)}

Notice in the output table below, I have placed a label in the row above the Frequency data, but I have kept the cell above the bins blank; this reminds Excel to use the first row for series names and the first column for category labels. Select the range containing the bins and the frequency output (include the header row, and don't forget the extra cell in the Frequency column), and create a column chart with the Chart Wizard. The chart is the default size, larger than the ATP charts.
 Frequency
71
7.51
82
8.54
94
9.53
104
10.54
111
11.51
122
12.52
131
0
   

Note: For a nice discussion of how array formulas work, check out Using Array Formulas in Excel, a white paper by Excel MVP Bob Umlas.

Embellishing your Histogram

People often expect adjacent columns in their histograms to touch each other, without the gaps. This is an easy setting to fix: Double click on the charted series, click on the Options tab, and change the Gap Width setting to zero.

Rather than using bland columns for your histogram, suppose you want to use an X in the histogram for every occurrence of a value. You can do this with a scatter chart, using a Custom Histogram procedure offered by Excel MVP Debra Dalgleish in the Microsoft Charting news group.

To attain other interesting effects in your histogram, you could also apply any of the column chart tricks on my Custom Series Formatting page. For example you may want to create a pictogram, which uses a symbol to represent a unit of each category.

Better Histograms Using Excel

Mike Middleton's treeplan.com Decision Modeling Add-Ins for Excel web site describes a technique to show more a more descriptive category axis than is usually attainable with the Analysis Toolpak or a normal column chart. He constructs the better category axis by adding an XY Scatter series to the chart, hiding the column series' primary category axis, and showing instead the XY series' secondary X value axis at the bottom of the chart. Click on the chart below to go to Mike's Better Histograms page.

This is a technique I've used and described in this site, an extension of the use of Dummy Series to get desired charting effects, incuding Arbitrary Axis Scales. Mike has nicely shown how to fix up your histograms using the technique.

 

Page copy protected against web site content infringement by Copyscape

Peltier Tech Chart Utility

Create Excel dashboards quickly with Plug-N-Play reports.


Peltier Technical Services, Inc.

Excel Chart Add-Ins | Training | Charts and Tutorials | PTS Blog

Peltier Technical Services, Inc., Copyright © 2014. All rights reserved.
You may link to this article or portions of it on your site, but copying is prohibited without permission of Peltier Technical Services.

Microsoft Most Valuable Professional

Microsoft Most Valuable Professional

My MVP Profile