I wrote a tutorial about Histograms long ago, and haven’t used any of that for my own histograms in years. I don’t use regular column charts, unless I need something really quick and dirty. Instead, I use XY charts for various reasons, and I follow the protocol in this post.
A histogram is basically a column chart with value ranges (bins) laid out in numerical order along the horizontal axis, and the height of the columns encodes the number of observations within that value range.
Data for a histogram typically looks like this, with the bin label in the first column and the count (frequency) within each bin in the second column.
The typical Excel histogram is a column chart from this data. It shows the data well enough, although having more than a dozen bins squashes the labels together making them hard to read.
Rotating the labels 90° unsquashes them, but they are equally hard to read unless you awkwardly swivel your neck.
Another problem with a plain old column chart is that the bins look numerical, but the axis is categorical, and does not represent the numerical values of the bin boundaries. Unless you can do math in your head, and can force your mathematical brain to ignore what your visual brain sees, you don’t know that the bins are of equal width. In this case they are equal, but I’ve seen histograms which have unequal bin widths represented by equal width bars. This is just another way of distorting the data.
To correct this problem, let’s modify the data, so we place the bin start values in the first column, and the count for the bin starting at each of these values in the second column.
Now the fun begins. This is a similar exercise to that used to generate step charts, but we need not just a vertical line segment connecting adjacent horizontal steps, we need vertical line segments connecting these steps to the horizontal axis at the bottom of the chart. So we need to modify the simple table above.
In the first block below I have stacked up three copies of the data. These are color coded so we can trace the data through the next steps.
In the second block I have indicated two cells that need to be deleted, with cells shifting up to fill the gap. The third block shows the data after deleting these two cells.
Finally I sorted the data by the first column, resulting in the striped last block below.
The rest is easy. Make an XY chart, using the lines-without-markers option, from this last block of data, and format it to look nice. The lines connecting the XY points trace the sides and tops of each histogram bar.
The advantage of this type of histogram is that the X axis is truly numerical. The axis can be formatted with the minimum, maximum, and spacing that you want. The min, max, and spacing can be natural “human-friendly” values, not values defined by statistical calculations to excessive significant figures. And you can see by inspection that the bins are of equal width.
The disadvantage of the XY chart histogram is that the bars are only present in outline: they are not filled with your favorite colors. In the next article I will show a technique to fill these bars. If you want a head start, you can refer back to the tutorial that shows how to Fill Below an XY Chart Series with an XY-Area Combination Chart.