Line Chart For Unequal Intervals

Recently Naomi Robbins, author of Creating More Effective Graphs, and I discussed the difficulty of making Excel use appropriate axis tick spacing to chart binned data that has unequal bin widths. The example I’ll use here to describe the problem is taken from page 286 of Naomi’s book, under the heading Do not use equally spaced tick marks for uneven intervals on an arithmetic scale. I’ve optically digitized, that is, eyeballed, some data from her chart for the examples here.

Original Line Chart

The original line chart looked very much like this, showing normalized mortality rates as a function of body mass index (BMI). This is how not to make your chart.

Mortality vs BMI - Original Chart

Data was reported as a single value for each range of BMI as in the table below.

Mortality vs BMI - Data

Improving the Line Chart

The first improvement we can make to the original chart is to rotate all labels into the horizontal orientation, so they are easier to read.

Mortality vs BMI - Horizontal Labels

The next step is to change the horizontal spacing so that the separation between points is proportional to the difference in their X axis values. I’ve inserted the yellow column into the data, and in this column I’ve entered the midpoint of each bin range.

Mortality vs BMI - Proportional Data

I plotted the mortality values against this new BMI column, and converted the chart to an XY Scatter chart so the new X values are recognized as numeric values and not equally spaced text labels. I added markers to show the actual points, but made the markers small so they aren’t emphasized with respect to the connecting lines.

Mortality vs BMI - Proportional Chart

Pretty good, but we have lost the bin range labels. No problem, let’s just add data labels to the series. You can add any type of labels (series name, Y values, whatever) and change the text of the labels manually, or you can use Rob Bovey’s Chart Labeler to do it automatically.

Mortality vs BMI - Proportional Chart with Data Labels

Yuck, that’s too cluttered. It was even worse when I initially had all labels below their respective points.

Let’s keep the points and their proportional spacing, but put the labels along the horizontal axis, as in the original chart. We can’t do that with built-in axis labels, but we can add a second series to the chart, with data labels. I added a column to hold the Y values for the added series; the Y values are equal to the axis minimum of 0.6.

Mortality vs BMI - Proportional Data

Here I’ve highlighted the new series, and used the Chart Labeler (see link above) to add the labels below the points.

Mortality vs BMI - Dummy Series for Axis Labels

I’ve hidden the dummy series.

Mortality vs BMI - Proportional Chart with Axis Labels

We’re still missing something: tick marks. Let’s add short vertical error bars to the dummy series. (See my article Custom Error Bars in Excel Charts for some helpful hints, and for a simply free utility I wrote to make it easier, even in Excel 2003 where it’s already pretty easy.) I placed a value in a cell, and used that cell as my custom error bar source range. My first value was way off, producing huge error bars. But by linking the error bars to the custom value in a cell, I made it easy to adjust the error bars until they looked appropriate, at a custom value of 0.04.

Mortality vs BMI - Proportional Chart with Axis Ticks and Labels

That’s quite an improvement over the original.

Alternative: Step Chart

I decided to make a step chart, in which each horizontal line segment spans the bin range at the Y value for that bin. I split each data point from the original data into two points, one at either end of the bin range. I chose endpoints that extended a reasonable distance beyond their bin cutoffs.

Mortality vs BMI - Step Chart Data

And here’s the step chart. No need for bin range labels, because you can see where the bin edges are along the X axis. And it makes clear that each bin’s value is constant across the bin. Well, it isn’t really constant, but the way that the data was presented implies that it is.

Mortality vs BMI - Step Chart

Alternative: Histogram

Another variation is to make a histogram with variable width bins. Can’t do this with a column chart in Excel: all those columns have to be the same width. This chart was made with the original line chart data, which leads to equal width bins for unequal bin ranges. I did not make the common mistake of starting the vertical axis at the same minimum (0.6) that was suitable for the line charts.

Mortality vs BMI - Histogram

Instead of a column chart, we’ll use an XY chart and draw just the outlines of the columns. To do this we take the step chart data, and insert a row between each pair of duplicate X values. We copy in the X value from the cell above and assign a Y value of zero. This will produce a line from the end of one bin, straight down to the horizontal axis, then straight up to the start of the next bin.

Mortality vs BMI - Pseudo Histogram Data

Plot the data in an XY chart to get the pseudo-histogram. I left the endpoints at their Y values, not at zero, to indicate that these bins were not enclosed, but extended indefinitely. And again, I started the vertical axis at zero.

Mortality vs BMI - Pseudo Histogram

Summary

In her book, Naomi Robbins says Do not use equally spaced tick marks for uneven intervals on an arithmetic scale. Here I’ve demonstrated a few techniques to use proportionally spaced data points. Some are based on the “How Not To” line chart example in her book, and some are more intricate. Of all the options, I think I like the step chart best. It’s clean and uncluttered, and shows real values along both axes.

Comments

Calvin Graham says

Monday, March 14, 2011 at 4:55 am

I often find the question with a problem like this is “why were the spaces uneven to begin with?” Normally you’d expect the start/end bin sizes to be larger to accomodate outliers but there’s a lot of varience here for a planned survey/experiment
Bill McNair says

Monday, March 14, 2011 at 8:26 am

this tour de force of graphic representation is awesome. i like the x-y plot with x-axis labels as a series… but the pseudo-histogram is close behind.

one thing… BMI is so much more than the numbers. I would have included a picture such as the one from this link… http://www.healthyweightforum.org/eng/calculators/bmi-visual-graph/

thanks -bill
Bob says

Monday, March 14, 2011 at 12:19 pm

Hi Jon,

Great post.
Lots to learn with the approaches shown here!

Cheers,

Bob
Jon Peltier says

Monday, March 14, 2011 at 3:15 pm

Bill –

I didn’t get into it here, but BMI is really a poor way to characterize someone’s weight. You ideally should be measuring density, right? Body fat percentage? But BMI measures weight per area, not per volume, and it’s not really per area, but per height squared. No measure at all of width and thickness. Anyone who is at all active is off this curve. If someone does a little weight lifting, this curve and BMI in general is completely irrelevant. But BMI was latched onto years ago and we’re stuck with it.
derek says

Tuesday, March 15, 2011 at 6:09 am

Nice work. I’d go a step further and use a dummy series to replace the equally-spaced tick marks in your final graph with an unevenly-spaced set, where the numbers fall underneath the boundaries between columns.

BMI was invented by Adolpe Quetelet, the Belgian statistician and friend of Florence Nightingale (he also invented the sector area chart she famously used to illustrate Crimean War casualtires). You can see why he needed a transformation, because on the one hand he had access to large sets of weight statistics, but weight alone is not a measure of obesity. Fortunately he had corresponding sets of height data, but you can’t use weight/height either. Could he use height cubed? No, because people’s weights don’t go up in a population as the cube of height, but more like (height^3)^2/3.

(the 2/3 slope is empirical, it could just as easily be 3/5, or 0.69)

The bottom line is no individual person should ever be described as having “a BMI”, because BMI only makes sense, if it makes sense at all, as a comparative measure of whole populations, and you must take care that the populations have similar distributions of height, and similar numbers of keen weightlifters. One person alone can be a population of 100% weightlifters, and thus useless for comparison.

It goes double that no person should ever be taxed with having a BMI that’s “too high”. Any doctor who thinks his patient is in poor health has the tools and the subject available to determine that without recourse to tables of statistics.
Tom Rein says

Friday, May 31, 2013 at 12:48 pm

I came across this post in a google search and it worked out perfectly for what I was doing! (Which had nothing to do with BMI, but with glass structures) Thanks for the great post!
Ciarán F says

Friday, January 31, 2014 at 8:26 am

Hei John, thanks for all your helpful tips!
I have one question (I haven’t managed to find an answer anywhere): How can you modify the y-axis for uneven increments, such as two different patterns? For example, take a y-axis split into 7 equal parts (by equal I mean equal in pixel-length), but with y-values of: 0, 5, 10, 15, 20, 30, 40, 50. Is this possible?
Thanks,
Ciarán
Jon Peltier says

Friday, January 31, 2014 at 7:40 pm

Ciarán –
It’s possible to distort the axis scale like that, but it’s difficult to make it work, and it may confuse people trying to interpret the chart. I would advise not screwing up the scale.
Ciarán F says

Monday, February 3, 2014 at 5:32 am

Ok thanks Jon, it may not be worth all the hassle then, i’ll think of another way of presenting the data.
C

Line Chart For Unequal Intervals

Original Line Chart

Improving the Line Chart

Alternative: Step Chart

Alternative: Histogram

Summary

Comments

Trackbacks

Leave a Reply

Original Line Chart

Improving the Line Chart

Alternative: Step Chart

Alternative: Histogram

Summary

Reader Interactions

Comments

Trackbacks

Leave a Reply