Chart Axes

Select Meaningful Axis Scales

Thursday, January 26, 2012 by Jon Peltier 4 Comments

Thursday, January 26, 2012 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

Last week, in You Have 1 New Notification On Klout!, I used social media metrics site Klout to illustrate how choice of axis scales can exaggerate or wash out the variation in a data set. Today I’ll pick on another social media metrics site, Topsy, to show how to pick meaningful axis tick spacing parameters. [Note from the future: according to the Social Media Management Software Graveyard, both Topsy and Klout have been shut down.]

A meaningful axis spacing allows a human viewer to make sense of the numbers in your chart.

Original Topsy Charts

Here is a chart of Twitter mentions of my blog over a one week period. Sorry the chart’s too wide, that’s as small as Topsy would make it and still have text large enough to read. You could right click on it and choose your browser’s equivalent of “Open Image In New Tab” to see it in all its glory.

Topsy graph for one week

Here’s the Topsy graph for two weeks.

Topsy graph for two weeks

Notice anything wrong with these charts? No, they do have the right number of points. But the vertical gridlines and the horizontal axis labels are not aligned with the points. In the 7-day chart (top), there are 8 labels between the axis min and max values. To accommodate this mismatch, some adjacent pairs of tick marks fall within the same day, so a couple of labels are repeated. In the 14-day chart (above), there are 9 labels between the axis min and max value; some days have no tick marks, so dates are left out, but not in a regular pattern.

This kind of unorthodox labeling causes the humans to have to think too much about the chart. Sometimes the choice of incredible charting options like this leads to lack of credibility of the whole chart.

Human-Friendly Axis Spacing

In a 7-day graph, what would be an intuitive axis tick spacing? Let’s try one day, since one week is too wide and one hour too narrow. In general, numbers that are 1, 2, or 5 times a power of ten make good values for axis tick spacing. 1, 20, 500, 0.01, and 0.5 are reasonable choices. If the scale is days, and a spacing of 1 or 2 days result in crowded labels, 7 days is a reasonable choice.

Here I’ve reconstructed Topsy’s 7-day chart with a 1-day axis tick spacing. It’s very natural, the ticks and gridlines are spaced the same as the data points, one day apart. Nobody has to use any excess gray matter to understand the time scale.

One week graph with 'normal' 1-day axis spacing

Here is the 14-day Topsy data plotted with a 1-day axis spacing. It is as easy to read as me 7-day chart with 1-day spacing, which is to say, much easier than the Topsy Turvy spacing in the original chart.

Two week graph with 'normal' 1-day axis spacing

This is really more axis labels than are needed, and some of them are forced to wrap so they don’t overlap. We can fix this by using a 2-day axis spacing. Also easy to read. I’ve helped the viewer by placing small minor tick marks at 1-day intervals.

Two week graph with 'normal' 2-day axis spacing

Intermediate gridlines work as well as intermediate tick marks.

Two week graph with 'normal' 2-day axis spacing and 1-day gridline spacing

Topsy’s Axis Scale Parameters

So what was Topsy thinking? Well, I can’t answer that, but I can estimate the axis tick positioning that they used.

Here is Topsy’s 7-day data. I’ve secretly replaced the regular time scale axis with an XY series that has spacing independent of the actual plotted points. Vertical error bars on the invisible points serve as my gridlines. The X values are based on formulas I can tweak in the worksheet, and I align the custom gridlines to closely resemble the original Topsy alignment. Jan 13 and Jan 15 both appear twice as in the original chart.

One week graph with reconstructed Topsy spacing

To get the spacing right, the first gridline appears at 3:44 pm on January 12, which rounds up to the Jan 13 shown in the label. Each subsequent gridline is 16 hours and 40 minutes after the previous one. I think we can all agree that 16:40:00 is not as intuitive as 24:00:00.

I’ve reproduced the 14-day chart as well. The first gridline appears at 3:20 pm on January 5, which rounds up to Jan 6. Subsequent labels are 33 hours and 20 minutes apart. Again, not so intuitive.

Two week graph with reconstructed Topsy spacing

I can’t really say where these strange tick spacing values came from, but I have a suspicion. 16:40 is 1000 minutes, and 33:20 is 2000 minutes. If the time dimension were plotted in minutes, the two charts have ranges of 8640 minutes (7 days) and 18720 minutes (14 days), so in fact 1000 and 2000 are human-friendly numbers. Of course, the data is spaced 1440 minutes apart, so the nice minute-based axis spacing is really irrelevant.

I suspect the charting mechanism has a nice algorithm to calculate the spacing based on the minimum and maximum data values, but it doesn’t consider the actual data spacing, nor does it investigate alternative units. And the algorithm was automated before a human had a chance to look at it and say “Huh??”

You Have 1 New Notification On Klout!

Monday, January 16, 2012 by Jon Peltier 7 Comments

Monday, January 16, 2012 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

Every so often I get an email with a subject line that’s something like “Jon, you have 1 new notification on Klout!” Wow, another social network thing, to go with all the other ones. Sure, I follow a bunch of people on Twitter every day, and I have a neglected Facebook account and a LinkedIn account that I don’t know what it’s useful for yet. So I also have a Klout account. [Note from the future: according to the Social Media Management Software Graveyard, Klout has been shut down.]

My current Klout ranking?

Klout

I dunno, I guess that’s pretty good. Some of my colleagues, and the people I follow on Twitter mostly fall in the range between 25 and 55. So who else has a 43? My fellow Excel chart master, Jorge Camoes, has a 43. An old college buddy, Joel Foner, has a 43. So does fellow Microsoft Excel MVP Ken Puls. Annmariastat has a 43, and she doesn’t even have a Klout account, but she has a Wikipedia entry. So I’m in good company.

There’s some kind of algorithm that determines this score, based on how many people I influence through my internet presence (I guess, how many people follow me), how much I influence them (how often they repeat what I say), and how influential the people I influence are.

The first thing I noticed when I followed the link in the email this morning (I delete most Klout emails without following the link) was this chart showing my score over the past month.

Klout Timeline

Wow, a few days ago, I really got a huge boost in my score.

Then I looked a little more closely, and read the axis labels. Over the past month, my score has ranged from about 42 to about 43.

Klout Timeline with Axis Labels

This is a good example of how the scale of a value axis can exaggerate or play down the variability in a signal. Here I’ve reproduced the Klout timeline in Excel:

Klout Timeline Reproduced in Excel

The total variation in the displayed data is about ±1.8% from the mean. It looks ginormous when the visible Y axis scale is less than 5% of the maximum value. It doesn’t look so substantial when the Y axis scale starts at zero:

Klout Timeline with Full Scale Y Axis

Maybe there’s a reasonable intermediate scale. This scale shows some variation, but it doesn’t look like a tsunami:

Klout Timeline with Reasonable Y Axis Scale

Another funny thing about Klout is the listing of topics that I’m influential about. Data Visualization is rated Strong, and Visualization is rated High. Well, that’s good, those are topics I’d like to think I’m influential about. New England is rated High? Well, in season I will occasionally tweet about the Red Sox, but that’s the extent of my internet discussion about my home region. Statistics, Technology, good to see both of these here.

Klout Topics

But further down the list. Disease? Why am I listed for Disease? Sure, I’m a doctor, but not of medicine. As Randy Pausch‘s mother said of him, “He’s a doctor, but not the kind that helps people.”

Jorge Camoes (who also has a Klout score of 43) says Klout must think I’m sick (insert a pun here for influence/influenza). Klout tells Jorge that he is influential for Coffee, from which he abstains in real life.

Microsoft Excel falls far down my list of topics. Not lowest, but as far as the screen shot could capture. Why am I only Medium for Excel? I’m a Microsoft MVP for Excel, and Excel is one thing I talk a lot about.

I guess this Topics of Influence algorithm needs a bit of work.

Posted: Monday, January 16th, 2012 under Chart Axes.
Tags: .
Comments: 7

Microsoft MVP Logo

Broken Y Axis in an Excel Chart

Friday, November 18, 2011 by Jon Peltier 205 Comments

Friday, November 18, 2011 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

If you’re looking for a tutorial on breaking an axis scale, you won’t find it here. Instead you’ll read why breaking an axis is a bad idea, and you’ll get a tutorial in Panel Charts, which are a more effective (and easier) means to show your data.

Chart with Broken Y Axis

The Problem

People frequently ask how to show vastly different values in a single chart. Usually they ask because a few very large values (for instance, Paris in June or Madrid in May in the chart below) overwhelm the other, relatively much smaller, values.

Chart with Unbroken Y Axis

Logarithmic Scale

One suggestion is to use a logarithmic scale. For scientific data presented to scientific audiences, this is often an excellent suggestion. For the general public, and for general data, this may not be so useful. Especially in a bar chart, where the length of bars is important to comprehension, not some mathematical abstraction of length.

Chart with Logarithmic Y Axis

Broken Axis

Another suggestion is to “break” the axis, so that part of the axis shows the small values, then another part of the axis shows the large values, with a section of the axis scale removed. Sounds good, but you’ve lost any correlation between the large and small values. Also our eyes are likely to see the two broken bars in the chart below as only about twice the value of the tallest of the unbroken values (despite our conscious brains “knowing” that the axis has been cut).

Chart with Broken Y Axis

Another problem with this approach is that it’s cumbersome to create and nearly impossible to maintain charts like this.

Panel Chart

A better suggestion than either a log scale or a broken axis is to plot the data in a panel chart. This chart has two panels, one with an axis that shows all the data, the other with an axis that focuses on the small values. I generally advise strongly against using any kind of gradient in a chart, because the gradients are pretty much meaningless. In this chart, the gradient at the tops of the (truncated) large values are not meaningless, but are intended to show the large values extending high up into the clouds.

Chart with Panels Having Distinct Y Axis Scales

Making the Panel Chart (It’s Easy!)

If you want to play along at home, the data is located in BrokenYData.csv.

Here is the data for the chart. Columns E, F, and G have the same data as columns B, C, and D, except the two very large values (>30 million) have been replaced by cut-off values of 7,500,000 (shaded cells).

Data for Panel Chart

The first step is to plot all of the data in one chart. By default, all series are plotted on the primary axis.

Panel Chart Step 1

The second step is to move the three extra series to the secondary axis. They block the primary axis data…

Panel Chart Step 2

… but if I format the secondary axis series with outlines and no fills, you can see the primary axis data.

Panel Chart Step 3

Back to solid fill colors. I have rescaled the vertical axes. The primary (left) axis now has a minimum of -40 million and a maximum of +40 million; the secondary (right) axis now has a minimum of 0 and a maximum of 16 million.

Panel Chart Step 4

Add the secondary horizontal axis. Excel by default puts it at the top of the chart, and the bars hang from the axis down to the values they represent. Pretty strange, but we’ll fix that in a moment.

Panel Chart Step 5

Format the secondary vertical axis (right of chart), and change the Crosses At setting to Automatic. This makes the added axis cross at zero, at the bottom of the chart.

(The primary horizontal axis also crosses at zero, but that’s in the middle of the chart, since the primary vertical axis scale goes from negative to positive.)

Panel Chart Step 6

Now we need to apply custom number formats to the vertical axes.

The primary (left) axis gets a format of 0,,"M"; (zero, comma, comma, and capital M within double quotes). Each comma knocks a set of three zeros off the displayed value, making for example 1,000,000 appear as 1. The M will be shown after the number of millions. The semicolon indicates that this format is for positive values, and nothing after the semicolon indicates that negative values are not to be shown. Since no special format is indicated for zero (which would be after a second semicolon), it is shown with the same format as a positive number.

The secondary (right) axis gets the trickier format of [<8000000]0,,"M"; (less than eight million enclosed in square brackets, zero, comma, comma, and capital M within double quotes). The first format in the string is normally for positive numbers, but square brackets indicate a non-default condition for the first string. This means that any values less than 8 million will appear as the number of millions folloewd by capital M. The semicolon with nothing following means that any other numbers will not be displayed.

Panel Chart Step 7

Now I’ve cleaned up a bit. I’ve used a medium gray line for the plot area border, and for both horizontal axis lines. I’ve also set the labels of the primary horizontal axis (center of the chart) to No Labels, because they are redundant and clutter up the chart. The primary and secondary axis scales conveniently have the right spacing so that the primary horizontal gridlines work for the secondary axis as well.

Panel Chart Step 8

Now I’ve applied the same fill colors to the secondary axis columns as are used for the primary axis columns.

Panel Chart Step 9

Finally I’ve formatted the two large values separately. To format just one point in a series, click once to select the series, then click again to select the particular point (column) to format.

I used a gradient that had white fill at 0%, and column’s regular fill color at 15% and at 100%. This gradient makes the bars extend upward, and fade as they reached into the clouds.

Panel Chart Step 10

Finally I deleted the duplicate legend entries. To delete an unwanted legend entry, click once to select the legend, then click again to select the particular legend entry, then press the Delete key.

Panel Chart Step 11

This is the finished panel chart. The top panel shows that the two outlying values are drastically larger than the others, while the bottom panel allows comparison between the smaller values.

The Final Word

I know everybody’s case is special, and everybody knows better than I do about why using improper techniques is correct in their particular situation. Your boss needs it this way, or it’s a specialized scientific chart, or you don’t see how anybody could be confused, or it’s really really important. However, I am under no obligation to share something that I do not want to share. I do not even have the old tutorial, so I cannot send it to you, nor will I recreate a new version of the tutorial.

Posted: Friday, November 18th, 2011 under Chart Axes.
Tags: Panel Charts.
Comments: 205

Microsoft MVP Logo

Fake Line Chart (Dummy XY Series for X Axis)

Tuesday, August 31, 2010 by Jon Peltier 7 Comments

Tuesday, August 31, 2010 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

In Excel, the difference between Line charts and XY charts has nothing to do with formatting the data with or without lines, and everything to do with different behavior of the X axes in the charts. I’ve written about these differences numerous times, in X Axis: Category or Value?, Line Charts vs. XY Charts, Line-XY Combination Charts, and in innumerable forum and newsgroup posts.

Comparison of Line Charts and XY Charts

Essentially, the difference is that Line charts plot X values as nonnumerical categorical values, like {A, B, C}; XY charts treat X values as continuously varying numerical values. Here is a brief comparison of the two chart types:

*Line Charts*		*XY Charts*
Nice date scaling (e.g., first of each month)		No special date scaling
Integer values only: Data plotted directly on category or on integer day numbers (midnight at start of each date)		Continuous values: Data plotted anywhere along axis (e.g., any fractional time of day, not just midnight)
All series use same X values (same dates or categories)		Each series uses independent X values
Identical series formatting: lines or no lines, markers or no markers		Identical series formatting: lines or no lines, markers or no markers
Only vertical (Y) error bars can be applied		Vertical (Y) and horizontal (X) error bars may be applied

A Typical Line Chart

A user is simply trying to create an XY scatter plot in Excel where the X axis values as shown in the example below can be maintained as-is on the final graph’s X axis points:

Line Chart Data

This data is perfectly suited for a line chart.

Line Chart

When you try to use the data in an XY chart, Excel ignores the non-numerical X values, and substitutes counting numbers, 1 for the first category, 2 for the second, etc.

XY Chart Step 1

For some reason, the user is insisting that a scatter plot is used, and all they want is the X axis to end up exactly as the line graph is formatted. I suspect this is due to a lack of understanding of the axis differences above, but I’ll never know, because this user is someone else’s client.

Making an XY Chart Mimic a Line Chart

An XY chart can be used to display this data, but it is a poor second choice. The tricks that are needed to make an XY chart display the nonnumeric labels like a line chart make the fake labels static, and they must be rebuilt if the amount of data expands, or if rows are inserted or deleted.

But I understand users, so here is the second option, which is what he thinks he wants, not what he needs.

We need to adjust the data. Since the Line chart X values are unsuited for an XY chart, we must insert a column of valid X values. To accommodate the Line chart style axis labels, we will use a dummy XY series along the X axis, which serves as placeholders for data labels which will look like the Line chart labels. The dummy series uses the column of zeros.

XY Chart Data

Select the yellow shaded range, and make an XY chart.

XY Chart Step 2

Hide the standard X axis labels but maintain the margin beneath the axis by using custom number format of ” ” (space character). If you simply format the axis with no labels, the plot area will be too close to the bottom of the chart, without leaving room for the dummy labels.

XY Chart Step 3

Use Rob Bovey’s Chart Labeler (a free utility that is indispensable for creating and manipulating custom labels) to label the dummy series. Apply the labels in the first column of the data (not shaded yellow in my example) to the “Label” series.

Each label remains linked to its cell. If the chart stays at 4 points using the same range, the labels will update as these cells are changed. Stretching the range or inserting/deleting rows will force you to rebuild the axis.

XY Chart Step 4

Finally hide the “Label” series by formatting it without markers or lines.

XY Chart

That’s a long way to go to mimic a completely dynamic Line chart with an XY chart which may need subsequent maintenance if the data changes. The wrong wrench, as we say, to hammer in the wrong screw.

Peltier Tech Update

Last week I had the pleasure of attending the Juice Analytics Viva Visualization tour at the Juice Boston Tea Party. They served breakfast, then presented their take on data visualization. Basically, they remind us to send a message, keep the visualization simple, put it in context, and follow good design fundamentals. We’ve heard the message many times before, but it’s worth retelling, especially by these experts who build solutions for big clients. Among the takeaways are:

Don’t let novelty obscure the data.
Don’t let visuals obscure the data.
Choose the right chart type.

There is a lot of great content on the Juice web site and blog, including a guide for choosing the right chart.

The Learning As You Go blog has a nice article about plotting highly skewed data, at Graphing Highly Skewed Data. The article covers use of secondary axes (a bad idea), breaks in the axis (also a bad idea), logarithmic axis scales (okay if users understand log scales), and multiple charts. This article is a response to a discussion started by Chandoo in How do you make charts when you have lots of small values but few extremely large values?

Posted: Tuesday, August 31st, 2010 under Chart Axes.
Tags: dummy series, Line Charts, XY Charts.
Comments: 7

Microsoft MVP Logo

Why Are My Excel Bar Chart Categories Backwards?

Monday, November 23, 2009 by Jon Peltier 45 Comments

I came across a blog post called Is it just me? (software defaults), which asks the age-old question, Why Are My Excel Bar Chart Categories Backwards? The post was in a new blog by Alex Kerin of Data Driven Consulting. Alex works on projects in analytics and dashboarding.

I have been asked this question a number of times, and being a founding member of Chart Busters, of course I know the answer. I’ve answered the question a number of times, but if I answer it here, it will become available for the ages.

I describe the problem and how to correct it. If you are really interested, I finish with an explanation of why this happens.

The Problem

Let’s use some very simple data to illustrate the problem.

Data for bar chart axis order study

Let’s make a simple bar chart.

Bar chart with backwards category labels

The labels were sorted from top down in the worksheet, but they appear from bottom up along the chart axis.

The Fix

It’s easy, if tedious, to correct the order of category axis labels. Select the axis, press Ctrl+1 (numeral one), the universal shortcut in Excel for Format This Object, and in Excel 2003 the following dialog appears.

Excel 2003 Format Axis Scale Dialog

The fix is simple: check the two boxes for Categories in reverse order and Value (Y) axis crosses at maximum category.

The protocol in Excel 2007 is the same, except the dialog looks a little different. You select the same options, but they are located far apart on the dialog.

Excel 2007 Format Axis Scale Dialog

This changes the order of axis labels in our bar chart.

Bar chart with appropriately ordered category labels

If you forget to make the value axis cross at the maximum category, the axis will now appear at the top of the chart. After reversing the order of the categories, the maximum category is at the bottom of the axis.

Bar chart with appropriately ordered category labels but value axis on top

Why Does Excel Do That, Anyway?

If we use the same data to make a column chart (line and area chart, too), the labels go from left to right, as expected.

Column chart with correct category label order

Take another look at the column chart, and note where the origin of the axis system is located. I’ve indicated the origin with a red circle.

Column chart with origin encircled

The values start low (at zero in this case) at the origin and increase in value as they move away from the origin. The category labels start with the first one next to the origin and later labels in the list extend further from the origin.

Now look at the bar chart and consider the origin of its axis system.

Bar chart with origin encircled

The values start low (zero) at the origin and increase in value as they move away from the origin. The category labels start with the first one next to the origin and later labels in the list extend further from the origin. Just like in the column chart.

Perhaps this is better illustrated if we remove the category data from the bar chart. In this case, Excel uses the counting numbers 1, 2, 3, etc. in place of the empty categories.

Bar chart with origin encircled and counting numbers used for category labels

Both axes have low numbers next to the origin and higher numbers further away.

The whole problem arises because Excel follows the same axis ordering scheme for bar chart category axes as for any other axis in any other chart.

This describes the mechanics of axis label ordering. But, 99% of the time, a user expects the axis labels to go in the same order top to bottom as in the data source. Why Are My Excel Bar Chart Categories Backwards? is still a valid question: Why can’t bar chart categories automatically be reversed? Alternatively, why can’t the options for a bar chart’s category axis default to:

Alternate Defaults for Bar Chart Category Axes

Works for me.

Posted: Monday, November 23rd, 2009 under Chart Axes.
Tags: Axis Labels, category axis.
Comments: 45

Microsoft MVP Logo

Chart Axes

Original Topsy Charts

Human-Friendly Axis Spacing

Topsy’s Axis Scale Parameters

More Axis Scale Articles

The Problem

Logarithmic Scale

Broken Axis

Panel Chart

Making the Panel Chart (It’s Easy!)

The Final Word

Comparison of Line Charts and XY Charts

A Typical Line Chart

Making an XY Chart Mimic a Line Chart

Peltier Tech Update