If you’re looking for a tutorial on breaking an axis scale, you won’t find it here. Instead you’ll read why breaking an axis is a bad idea, and you’ll get a tutorial in Panel Charts, which are a more effective (and easier) means to show your data.
People frequently ask how to show vastly different values in a single chart. Usually they ask because a few very large values (for instance, Paris in June or Madrid in May in the chart below) overwhelm the other, relatively much smaller, values.
One suggestion is to use a logarithmic scale. For scientific data presented to scientific audiences, this is often an excellent suggestion. For the general public, and for general data, this may not be so useful. Especially in a bar chart, where the length of bars is important to comprehension, not some mathematical abstraction of length.
Another suggestion is to “break” the axis, so that part of the axis shows the small values, then another part of the axis shows the large values, with a section of the axis scale removed. Sounds good, but you’ve lost any correlation between the large and small values. Also our eyes are likely to see the two broken bars in the chart below as only about twice the value of the tallest of the unbroken values (despite our conscious brains “knowing” that the axis has been cut).
Another problem with this approach is that it’s cumbersome to create and nearly impossible to maintain charts like this.
A better suggestion than either a log scale or a broken axis is to plot the data in a panel chart. This chart has two panels, one with an axis that shows all the data, the other with an axis that focuses on the small values. I generally advise strongly against using any kind of gradient in a chart, because the gradients are pretty much meaningless. In this chart, the gradient at the tops of the (truncated) large values are not meaningless, but are intended to show the large values extending high up into the clouds.
Making the Panel Chart (It’s Easy!)
If you want to play along at home, the data is located in BrokenYData.csv.
Here is the data for the chart. Columns E, F, and G have the same data as columns B, C, and D, except the two very large values (>30 million) have been replaced by cut-off values of 7,500,000 (shaded cells).
The first step is to plot all of the data in one chart. By default, all series are plotted on the primary axis.
The second step is to move the three extra series to the secondary axis. They block the primary axis data…
… but if I format the secondary axis series with outlines and no fills, you can see the primary axis data.
Back to solid fill colors. I have rescaled the vertical axes. The primary (left) axis now has a minimum of -40 million and a maximum of +40 million; the secondary (right) axis now has a minimum of 0 and a maximum of 16 million.
Add the secondary horizontal axis. Excel by default puts it at the top of the chart, and the bars hang from the axis down to the values they represent. Pretty strange, but we’ll fix that in a moment.
Format the secondary vertical axis (right of chart), and change the Crosses At setting to Automatic. This makes the added axis cross at zero, at the bottom of the chart.
(The primary horizontal axis also crosses at zero, but that’s in the middle of the chart, since the primary vertical axis scale goes from negative to positive.)
Now we need to apply custom number formats to the vertical axes.
The primary (left) axis gets a format of
0,,"M"; (zero, comma, comma, and capital M within double quotes). Each comma knocks a set of three zeros off the displayed value, making for example 1,000,000 appear as 1. The M will be shown after the number of millions. The semicolon indicates that this format is for positive values, and nothing after the semicolon indicates that negative values are not to be shown. Since no special format is indicated for zero (which would be after a second semicolon), it is shown with the same format as a positive number.
The secondary (right) axis gets the trickier format of
[<8000000]0,,"M"; (less than eight million enclosed in square brackets, zero, comma, comma, and capital M within double quotes). The first format in the string is normally for positive numbers, but square brackets indicate a non-default condition for the first string. This means that any values less than 8 million will appear as the number of millions folloewd by capital M. The semicolon with nothing following means that any other numbers will not be displayed.
Now I’ve cleaned up a bit. I’ve used a medium gray line for the plot area border, and for both horizontal axis lines. I’ve also set the labels of the primary horizontal axis (center of the chart) to No Labels, because they are redundant and clutter up the chart. The primary and secondary axis scales conveniently have the right spacing so that the primary horizontal gridlines work for the secondary axis as well.
Now I’ve applied the same fill colors to the secondary axis columns as are used for the primary axis columns.
Finally I’ve formatted the two large values separately. To format just one point in a series, click once to select the series, then click again to select the particular point (column) to format.
I used a gradient that had white fill at 0%, and column’s regular fill color at 15% and at 100%. This gradient makes the bars extend upward, and fade as they reached into the clouds.
Finally I deleted the duplicate legend entries. To delete an unwanted legend entry, click once to select the legend, then click again to select the particular legend entry, then press the Delete key.
This is the finished panel chart. The top panel shows that the two outlying values are drastically larger than the others, while the bottom panel allows comparison between the smaller values.
The Final Word
I know everybody’s case is special, and everybody knows better than I do about why using improper techniques is correct in their particular situation. Your boss needs it this way, or it’s a specialized scientific chart, or you don’t see how anybody could be confused, or it’s really really important. However, I am under no obligation to share something that I do not want to share. I do not even have the old tutorial, so I cannot send it to you, nor will I recreate a new version of the tutorial.