How do you chart a wide range of values? There are numerous solutions to this, each with pros and cons. I’m showing these in Excel charts, of course, but they apply no matter what charting package you may be using.
It’s easy enough to think of an example. The following two charts show monthly sales data. One month had very high sales, while the rest of the months had low sales. (I’ll be illustrating the concepts in this article using both line and column charts, because each has special considerations.
In both charts, the January sales value stands proud, while the other months are hiding in the weeds along the bottom of the chart.
Use a Logarithmic Scale
The first approach to chart a wide range of values was suggested in Logarithmic Scale In An Excel Chart, a tutorial on the MyExcelOnline Excel Blog. The My Excel Online web site is run by my colleague John Michaloudis, and it features lots of great tutorials, podcasts, free training, and paid courses. John’s point was well taken, but it was incomplete. It was a short and sweet review of the technique, but it left out the thought process that should accompany any charting effort.
It’s easy enough to apply a logarithmic scale to a numerical axis in an Excel chart. Select the axis, then press Ctrl+1 to open the Format Axis task pane (or dialog in Excel 2010 and earlier). In the same view that allows formatting of scale parameters like minimum and maximum, you should find a checkbox that says Logarithmic Scale. Check the box, and the scale is adjusted. You can also change the base of the log scale, but it’s usually best to keep it at 10.
Here are my two charts with log scales. The data I’m using here is very similar to the data in the MyExcelOnline post cited above.
Your first thought might be satisfaction that the data are now all shown with a similar magnitude. That’s wrong, of course, because they aren’t of similar magnitude. But that’s secondary, because we are really only comparing extreme values with very little in between.
The other point about the charts above is the scale limits. Excel likes to use zero as one endpoint of an axis, and 1 corresponds to zero on a log scale. However, this choice of scale minimum means that the bottom two-thirds of the chart is blank, all of the data occurs in the top.
We can reduce the white space by entering a better value for axis minimum.
There is still a problem, especially with the bar chart. In a bar chart, our precognitive sense of the values is based on the lengths of the bars, and it’s hard to overcome that. When I look at the bar chart above, the short bars all look about half as tall as the long ones, so I get the mistaken impression that the smaller values are around half of the large value. (The first logarithmic column chart is even worse, since the shorter bars are 80% as tall as the long one.) The actual ratio of values is less than one-tenth.
The log scale also tends to wash out the variability in the shorter values. In the charts below, I show the previous log scale charts, and beside them a linear scale chart with the scale blown up so the mean of the linear scale and log scale are at about the same height. Of course the large value is way off scale, which I’ve tried to indicate by fading the top of the visible indication of the large value.
The variability in the small values is shown best using the linear scale.
I should mention a couple more points about logarithmic scales.
First, as should be clear above, there is no place for log scales in bar or column charts. Bar and column charts need to include zero in the axis scale, but you can’t include zero on a log scale, since log(0) is negative infinity.
Second, most audiences will not be able to properly appreciate a logarithmic scale. Sure, engineers, scientists, and quants may have a strong enough numerical sense to interpret them accurately. But general audiences, and even smart managers and executives, are likely to be misled by such numerical transformations. It’s best to stick to the linear scale that most people are comfortable with.
Show Part of the Data
The charts used above aren’t bad, showing that linear scale charts preserve variation in the data better than logarithmic charts. Maybe we don’t need to actually show the extreme values on the axis scale, if we just indicate that they’re way off scale.
I can adjust the scale a bit more, to center the smaller values in the chart, keeping the scale minimum at zero, and fading the top of the largest value. The large value is so much bigger anyway, maybe we don’t need to actually show it to indicate that it is much larger than the others. This fading is a bit tricky, but it’s important to indicate that the data point isn’t just at the top of the chart, but instead extends far beyond.
Break the Axis Scale
A common approach to chart a wide range of values is to break the axis, plotting small numbers below the break and large numbers above the break. An advantage here is that it generally uses a linear scale. A disadvantage is that it distorts data, and doesn’t really give a sense for the differences in value on either side of the break.
Here are the two original charts, with a break in the vertical axis scale.
Bravo, you got in all the data points. But despite the visual cues that the axis has broken, there is still a strong tendency to mentally interpolate the values: those short bars still look half as tall as the large one, and the faded center and gap in the axis tell my conscious mind but not my precognitive mind that my first impression is wrong.
Sure, it’s easy enough to read the value off the axis scale, corresponding to the data point. But if you have to do that much work, what’s the point of a chart? You might as well just read the data from the cells in the worksheet.
Another drawback to breaking an axis is that it’s hard. You need to hide the real axis, construct two parts of a fake axis with a combination chart and data labels, and change at least some of the data you’re plotting. Nobody understands how to make these charts, and nobody understands the output anyway. So why bother.
The charts in the Show Part of the Data section are more effective.
Use Multiple Charts, or Make a Panel Chart
A lot of people are obsessed with getting all of their data into exactly one chart. Sometimes this is fine, but as we’ve seen above, sometimes charts with all of the data are not very easy to interpret without distorting the relationships within the data.
What’s wrong if we use two charts? We can show the whole range of data while highlighting the larger values (see the first chart of this article), then add the chart from the Show Part of the Data section which highlights the smaller values.
There are some redundant chart elements, so let’s hide the category labels in the top charts and the titles in the bottom charts.
From here it’s not a big stretch to combine both charts. Plot the original data twice, once each on the primary and secondary axes, do some heavy axis formatting, and voila. Here are panel charts, where one panel shows the full extent of the data, and the other zooms in on the smaller data.
In my tutorial Broken Y Axis in an Excel Chart, I explain some of the shortcomings of a broken axis scale, and I give step-by-step instructions for creating exactly this kind of panel chart. Panel charts are a little more complicated to build and maintain, but using a single chart helps with alignment and other formatting.
Panel charts are among my favorite ways to show this kind of data. But we aren’t done yet with alternatives.
Pareto Charts
A Pareto chart is a combination chart that combines a column chart (sorted from largest value to smallest), like the one that led off this article, with a line chart showing the cumulative total.
Here are two versions of a Pareto chart for this data. The Pareto one on the left is sorted by value. The one on the right is sorted by month, so the cumulative line doubles as the cumulative YTD sales.
Below is a combination of Pareto chart and waterfall chart, which I call a “Floating Pareto” chart. The bars show the incremental values and the increasing additive value. The one on the left is sorted by value, while the one on the right is sorted by month, again showing YTD sales.
First and only shameless plug of the entire article: the Advanced Edition of Peltier Tech Charts for Excel 3.0 includes Pareto and Floating Pareto charts. Check it out.
Think Further Out of the Box
You always have to keep in mind why you need to plot this data. If you want to say that our sales team kicked ass in January, then the sales director left for a competitor and took with him his account book and his entire staff, you can even get away with a pie chart.
Sure, you’re thinking, “A PIE chart? Wut? Has he lost his mind? Pie charts suck.” They’re not good at showing numerical data, and blah blah, yada yada, etc etc etc.
Well, I’ve softened a bit in my stance on pie charts, in part because of some actual research into their effectiveness by Robert Kosara of EagarEyes and Tableau (go read An Illustrated Tour of the Pie Chart Study Results, and read the papers that post links to). Seriously, the chart above practically shouts, “What the hell changed between January 31 and February 1?” There really seems little point in examining the month to month variation from February on.
If you want something slightly more quantitative, you can make yourself a nice little stacked column chart. The chart below clearly says that January sales were enormous, more than the rest of the year combined.
Why do you want to chart a wide range of values?
As with any charting exercise, you need to ask some questions. What am I trying to learn from this chart? Who am I making this chart for? What message am I trying to give them? What’s on Netflix tonight?
If your purpose is a quick overview, then something like the Pie Chart or Stacked Column Chart might be all you need. If you’re trying to show something in more detail, the Pareto Chart, the Panel Chart, or a chart from the Show Part of the Data section may be more appropriate. If it’s detailed engineering data or scientific model predictions and your audience is highly literate in mathematics, then consider Logarithmic Scale charts.
Jamie says
Good post. Always a tough subject.
It seems that a lot of people’s default is to go the log axis immediately, with no thought as to the implications. Data such as that in the example here makes no sense on a log axis, which should be reserved for data that fits a log distribution, and an audience (and designer) who knows what that means. :)
As far as the pie chart is concerned – I have also softened my stance somewhat, and I find that they can be useful in certain situations. The example data does not seem to be one of them, to me.
You said “Seriously, the chart above practically shouts, “What the hell changed between January 31 and February 1?” “.
Ok, that’s true.
But so does the initial bar chart!
If the point is to show that something is really out of the ordinary, either because January was abnormally high, or because the other months were abnormally low, then the original chart is precisely how to effectively do that. If that’s not the goal, the pie chart does not improve upon our original problem in any useful way.
What’s the best solution?
¯\_(ツ)_/¯
I think your summary is the important part of this post – it depends on what you’re trying to show in the chart(s), and you should not ever feel constrained to a single chart.
As Stephen Few said:
“If you’re inclined to think that a data visualization should limit itself to a single graph,
it’s time to leave that constraint behind. The best solutions often require multiple graphs.”
Jon Peltier says
Jamie –
Thanks for your thoughtful comment.
H. Alex Huskey says
It’s great to have permission to use more than one chart, haha! The panel charts do seem better than breaking the vertical scale, but it would be super for visualizing software to take a cue from the world of graphic design and offer Sankey-Diagram-like paths to guide us visually to see the smaller values as an inset or subset of the big picture. These indicators could be good looking and wide swaths of color tint evoking the feel of a bracket or large arrow. They wouldn’t have to be intrusive, just elegant and automatic. Eventually, visualizers will allow much more graphic flexibility but for now their developers have their hands full just responding to customer requests to make them code-free.
H says
Hi Jon, thank you for your input on the charts. In my case, i may have some days that could be abnormally high or just higher than normal. With this in mind, the smaller more common values are shown very small. I have used the LOG base 10 to show all the values but like you have advised, the smaller values look like its half of the big values. I guess the best way to do it in my case is to show the larger values off the scale shaded like you have done and show the smaller values (SHOW PART OF THE DATA EXAMPLE fits best i think to show a trend of what really is happening on a day on day basis).
How can i add this shaded part for larger values and ensure the axis is updating correctly ust encase i get a stupidly high value 1 day?
Jon Peltier says
Helal –
Here’s one way to preserve the smaller values’ presence in the chart. The first two columns of the data range below should look familiar. I’ve inserted a cell with an arbitrary cut-off value (D2, shaded blue). Cell D5 has this formula (filled down to D16):
=IF(C5>$D$2,$D$2,NA())
and cell E5 has this formula (filled down to E16):
=IF(C5<=$D$2,C5,NA()) Column D now has only the large values, cropped at 150,000, while column E only has the smaller values, up to 150,000. I'm plotting two different series, so I can format the tops of the large values as fading away.
Select the shaded ranges above (use Ctrl+Select to add subsequent ranges to the first selection), and insert a clustered column chart (top left chart below).
Format the vertical axis scale: explicitly set the minimum to 0 and the max to 150,000, while keeping the spacing at 20,000. This makes 140,000 the highest gridlines, but allows the tall bars to extend beyond this apparent top of the chart (top right chart).
Format the second series to use the same fill color as the first. Also set Overlap to 100% and Gap Width to 50%. Delete the legend (bottom left chart).
Format the large series with a gradient fill. I used a three-step gradient. Step 1 is at position 0%, fill color white, 100% transparent. Step 2 is at position 10%, fill color blue, 0% transparent. Step 3 is at position 100%, fill color blue, 0% transparent. Adjust to suit your own chart (bottom right).
Now when I extend the data, the original chart (top chart below) shows the new months, and any other abnormally large values are also cropped off and fade into the clouds.
Jon Peltier says
Alex –
You mean something like one of these:
The insets, of course, are simply smaller charts placed in front of larger charts.
The trick is showing the data clearly, without the inset obscuring or distorting the data, or even distracting from the data.
George Kuhn says
Thanks for sharing this posts. Some practical yet helpful comparisons. Data visualization is key and it can be frustrating to see data shown in ways when other, more applicable showings exist. Keep fighting the good fight with chart makers! Enjoyed reading through this post.
Kathleen says
Hi! I enjoyed this article. Can you recommend any resources on how to format a bar chart to do the fade-out that you showed here? I can’t quite figure this out. Any help would be great!
Jon Peltier says
On the first page of search results, there are a few pages on the Microsoft website. but they pretty much only say select the gradient fill option.
I did find a more detailed explanation at Office Watch: Gradient Effects and Text in Word – in depth.
I should write a tutorial….
Kathleen says
Thanks, Jon!