In this edition of Chart Busters, I’ve taken on a charting tutorial on the Microsoft web site. I know they are trying to help their customers make better use of their software, and some of the Microsoft tutorials are really pretty good. They have some helpful pages on using pivot tables, for example. But this charting tutorial is an absolute disaster.
The tutorial in question is entitled Charts III: Create a professional-looking chart, and it is the third in a series of four tutorials. The others are Charts I: How to create a chart, Charts II: Choose the right chart type, and Charts IV: Charts for the scientist. I will probably review these other tutorials at some point, but this one is really a job for Chart Busters.
Before I begin, I’d like to thank Jorge Camoes, whose Spiffy Charts: Want to know how to create bad charts? Visit Microsoft Office Online Training blog post pointed me to this horrid tutorial.
I’d also like to acknowledge all of the bloggers and commenters who have bemoaned the fact that “making a professional chart” is really a nice sounding phrase that really means “use as much gratuitous formatting and effects to make your chart pretty, without regard for its effectiveness.”
Tutorial Example Chart 1
Microsoft kicks off their tutorial with a simple column chart that uses the defaults of Excel 2003.
The first thing the Microsoft tutorial suggests is removing the gridlines. This is perfectly reasonable as far as it goes.
But it really doesn’t go far enough. Let’s really clean up the chart (below left).
- Remove the border around the entire chart.
- Move the chart and axis titles close to the edges of the chart area, and stretch the plot area into the freed up space.
- Take away the plot area fill and border.
- Remove the tickmarks from the X axis.
- Move “First Quarter Sales” to below the title, so it’s more of a subtotal. If the X axis has dates, you probably don’t need an axis title.
- Remove the border around the legend, arrange the legend entries horizontally, and move the legend beneath the chart.
And the gridlines wouldn’t be bad if they were lighter. This allows the Y axis line to be removed (below right).
Next the Microsoft tutorial suggests putting all of the data into data labels in the chart. This renders the Y axis redundant, so it can be removed.
It may make sense to add data labels to some points, but putting a label on every point clutters up the chart. It makes one ask, why not just use a table, if the individual numbers themselves are so critical? I like my chart with labeled gridlines and no axis.
The third Microsoft enhancement to the chart is to start adding gratuitous formatting. Add a gradient background to the plot area. Add a different gradient fill to the data bars.
Well, if we add a gradient, we will no longer have a nice uncluttered chart background. So let’s ignore that advice. As far as the pseudo-columnar appearance of the columns, that’s distracting, because folks are going to spend more time thinking about how we accomplished that cool formatting. Se let’s skip that one, too. But there’s nothing wrong with using nicer colors. Here’s my updated chart with axis (below left) or with gridlines.
The fourth enhancement to this simple chart is to remove the border from the legend. Hmm, I think we’ve heard that somewhere before. Also, the tutorial suggests adding a shadow to the chart. However, that means adding back the border, which with the shadow would make the chart even more cluttered.
The best piece of advice in this step is (emphasis mine):
You could add shadows around the chart and axis titles as well, but make sure the chart doesn’t look cluttered.
Good idea.
Chart Axis Sin
The final enhancement Microsoft wants to apply to their chart, after changing the product name, is to change the axis scale. In some cases this is acceptable, even desirable. However, bar and column charts encode their values by their length. When you shorten the axis, you effectively cut the bottoms off of the bars. This makes the longer bars seem much longer in comparison to the shorter bars than they really are.
Why start the axis at 50? Why not at 100? This makes the increases from January to February all the more striking.
This is a a major chart fail. The value axis on a column or bar chart should always include zero. Always. If you want to expand the scale to help resolve the values, then a column chart is not the right chart type.
Update
Scott Ziolko commented below that a further improvement to these charts would be to rotate the Y axis title to make it easier to read. In the chart below left, the axis title has been rotated and moved above the axis, and the plot area was widened (alternatively the chart could have been made narrower with no loss in resolution). In the chart below right, I have incorporated the axis title into the chart subtitle, where it is just as effective.
Chart Type Selection
Speaking of the right chart type, it is generally more effective to show time-based data in a line chart, i.e., in a time series (below left). When the axis is rescaled (below right), there is no loss of meaning or distortion of data.
Tutorial Example Chart 2
The next chart is “a combination chart that needs help.” And it does need help, but I think the “after” chart needs as much help as the “before” chart.
My first thought is about which series was assigned to which chart type. To me it has always made more sense to use columns for the Projected or Target values, and show the Actual values with a line. Alternatively, an area series could show the Projected or Target values.
What also looks better is a chart without all those distracting borders. Removing the fill helps too. I found it a bit hard to look at the green and gray chart above: the clashing gradients made the bars hard to pick out.
I also found it unusual that the Actual values showed a steady progression from month to month, while the Projected jumped around. Maybe they did get the chart types correct, but the labels wrong. Of course, it is just data made up for the benefit of this illustration.
Tutorial Example Chart 3
The final chart in the tutorial is one of my favorite types (cough cough): a 3D column chart.
The tutorial correctly points out that the tall bars in the first chart below obscure the short bars. They spoil this observation by simply moving the tall bars behind the short ones. Does this make it any easier to compare their values?
The strange perspective eliminates any chance of understanding the relative values in this chart. Especially since the vertical axis tick labels in the second chart have been deleted.
The tutorial blindly follows the mantra: If 2D are good, 3D must be better. But for clearly displaying data, 2D rules. In both charts below, the 2D column chart and the line chart (hey, it’s a time series!), make comparisons easy.
Summary
I guess the takeaway here is that no matter how much effort is spent generating pretty tutorials, if the rules and best practices are not followed, the tutorial will not succeed. It illustrates how the producer of a software package may not be the best to advise users how to apply the software.
Keep your wits about you when seeking help, look for multiple sources of advice, and use good judgment when following that advice.
Jon Peltier says
I linked to Jorge’s post in my intro, since it was what led me to the tutorial in the first place.
Chandoo says
Very good ripping of the charting guidelines on MSFT site Jon… Jorge also did the same few days back.. http://www.excelcharts.com/blog/spiffy-charts/
I think together we can scare people away from making such career ending charting choices…
Chandoo says
Oops.. missed it while skimming the intro.. :D
Debra Dalgleish says
When they say “professional looking,” I think they’re referring to the world’s oldest profession. Those charts should be standing on a street corner in a shady part of town.
Martín says
Great post Jon ! it couldn’t be more on time for me !!!
I am currently working on a chart from a Pivot Table, where I have on the Page field the Year, on the Row Area the Line of Business, on the Column area a Category, and 2 sets of data, A and B
I wanted to create a Clustered Column chart with 2 axes, putting A on the primary and B on the secondary, so I have something like LoB 1 – Categories 1, 2 and 3, LoB 2- Categories 1, 2 and 3, LoB 3…..
So far, so good.
Pimped it out as per your and Chandoo’s tips, and looked great.
I need the pagefield to be displayed on the chart, as it has to be dynamic. But (oh, yes, there is a but!) every time I change the year, the whole pimping dissapears !!!
little help, please !!!
Thanks !!
Jon Peltier says
Debra – I laughed so hard I cried. Thanks for your wonderful point of view.
Scott Ziolko says
Only other obvious thing to suggest is making the y-axis titles horizontal.
For ease of reading we don’t rotate the tick mark labels to be parallel to the axis (e.g the “0, 100, 200, 300, 400”) so why do most people accept the default of having the title parallel to the axis.
There are likely cases where a long y-axis title would take up too much space, but for “Cases Sold” you aren’t losing any substantial amount of space by making it a horizontal title and for the particularly space conscious you could have it as a two-line title if you’d like:
Cases
Sold
Just another simple thing to make the graph a quick/effective communication device on a quick read.
Jon Peltier says
Scott –
Good point. I was so busy being indignant about truncating the Y axis scale on the column chart that I neglected the orientation of the chart’s text. Another option would be to edit the subtitle to something like “First Quarter Sales (Cases)”, “First Quarter Cases Sold”, or “Cases Sold, First Quarter”.
I’ve included a couple more charts in the main post to illustrate this improved labeling.
Martín says
Jon,
thanks for your guides.
I’ve created a paralell table, linked to the cell address on the corresponding pivot table, and additionally created the series manually on a separated chart. both ways I have a problem: there are some values missing on some years, so ranges change. and no, I cannot change the data adding a zero value for those categories…..
And even so, the problem of how to change the year value remains. how do I do that?
Rgds.
Jon Peltier says
Martin –
You would change the year by changing the page field in the pivot table. I always keep my charts on the worksheets with the data, so I can browse around, so this is no big deal.
What I’ve done in the past is written routines that rebuild my regular charts when the pivot table updates. I’ve described how this would work on one specific pivot table and chart in Update Regular Chart when Pivot Table Updates. The VBA procedures would have to be adapted to your specific situation.
Jon Peltier says
Martin –
The formatting of pivot tables and especially pivot charts is not sticky. When refreshed, custom formatting tends to get blown away.
Microsoft acknowledges this behavior, and recommends that you record a macro the next time you reapply the nice formatting to your chart. Then when the formatting is wiped, rerun the macro.
If the configuration of the pivot table (i.e., the shape) doesn’t change, but only the numbers change, you could make a regular chart from the pivot table data, as described in Making Regular Charts from Pivot Tables and in Regular Charts from Pivot Tables.
Martín says
Thanks, Jon, that was exactly the same conclusion I arrived. now it works perfectly.
Or at least, applies the format when year changes. The problem will appear if they want to change anything else…… ;)
Rgds.
DaleW says
Jon,
Possibly MSFT might be cut some slack when creating charts of Teatime Biscuits and Marmalade sales. I look forward to your review of IV: Charts for the scientist, their final charting tutorial where they can finally get serious about visualizing data.
Just glancing at it today, I learned a couple of valuable tips:
(1) Named cells. When plotting a complex equation with coefficients (a, b, c, d, e), one can replace the range references with cell names, for simplicity (a, b, d, e, f), to avoid messy & confusing absolute references. (Excel conveniently reserves c, as explained.) The author documented that all but the very first calculated y value used these names; in fact, none of the calculations in the downloadable workbook actually used the named ranges. In science, many seemingly good ideas don’t pan out.
(2) Scientific use of a pie chart. By changing the initial start point of a pie chart to 90° (for 3’clock), one can start pie charts at the conventional starting point on the x-axis of 0°, which was mentioned as standard for trigonometry. (Sweeping around clockwise towards the -y axis with that first slice is not so standard in trig, but perhaps that is why pie charts have limited use in scientific publications?) If one uses the workbook to experiment with different initial angles rather than the x=30 slice shown, one could demonstrate that a circle still contains 330 + x degrees of arc, although strange nonEuclidean pie charts were probably too advanced for this lesson.
I’m sure you can find more.
Jon Peltier says
Dale –
Maybe I’ll ask you to ghost write the review for me. I have yet to read the tutorial, but I think I can’t wait.
By nature of their sheer size and market share, people look to Microsoft for leadership. Thus Microsoft has a certain responsibility to helping their users use their software correctly and effectively. So whether they are plotting nuclear engineering data or sales of biscuits and marmalade, they should do it properly.
Dennis Wallentin says
Jon,
1. Nice post
2. No, I don’t work at MSFT
3. Why don’t You help (as an Excel MVP) MSFT by:
a) writing some articles that can be published at MSFT’s site and/or
b) write some articles at the Excel Team’s blog instead of making fun of their shortcomings
Kind regards,
Dennis
Sean Carmody says
Jon, it’s good to see that your re-formatting suggestions align precisely with the steps I always follow when creating a chart in Excel.
I’d be interested in exploring your thoughts further on one topic, namely axis scales for column charts. You say:
The value axis on a column or bar chart should always include zero. Always.
This line (no pun intended) has solid antecedents, going back at least to Darrell Huff’s 1954 book “How to Lie with Statistics”. In Huff’s view, starting the axis with anything other than zero is simply dishonest. However, in “The Elements of Graphing Data”, William Cleveland expressed an alternative perspective. He says “do not insist that zero always be included on a scale showing magnitude”. He expands on this as follows:
For graphical communication in science and technology assume the viewer will look at the tick mark labels and understand them. Were we not able to make this assumption, graphical communication would be far less useful. If zero can be included on a scale without wasting undue space, then it is reasonable to include it, but never at the expense of resolution.
Maybe a key phrase here is “in science and technology” and, in any event, including zero in the example in your post would not seem to waste “undue space”.
Jon Peltier says
Sean –
“In science and technology” there are very few charts which are not XY charts (I should know: I have a doctorate and worked in R&D for many years). An exception is histograms, but the nature of most distributions forbids an axis minimum greater than zero.
Regardless of what Cleveland says, I find that I can look at the tick mark labels, and still be fooled by the length of bars or columns. I wonder whether this was a blanket statement, or if he was applying it only to certain chart types.
DaleW says
Jon & Sean,
I’ve got Cleveland’s book in front of me, and in his entire text I only found one example of a bar or column chart, and it was intended as a bad example that he redrew as an XY chart (well, line chart with symbols — while keeping the axis at zero because it made sense in this case). His point of contention with Huff was over a line chart.
My impression is that Cleveland advocated dot plots in his book as a substitute for bar charts for categorical data in part because his dot plots were not constrained by convention to start at zero like bar charts are.
Anyway, if you don’t want to start at zero, why would you cause confusion by using a bar or column chart??
Sean Carmody says
Jon and Dale,
It is true that Cleveland has virtually no references to column or bar charts and the context of his discussion on zero on axis scales focuses on scatterplots (x-y) and line charts. The first example he gives is taken from Huff’s book and I have reproduced it on this post. Still, if there is substance to Cleveland’s argument, it would presumably also apply to column charts.
Dale, as to why you would use zero, the main scenario I could envisage is where you have values for a number of non-numeric categories (so x-y plots don’t work) and there is no natural ordering of the categories (so line plots would not really appropriate). While Cleveland would doubtless argue that a dot chart would be the best option, most people are not familiar with dot charts but would have seen countless bar and column charts.
I should also say that I don’t know how I feel about this myself. I’d always been an advocate of starting at zero (I have not actuall read Huff’s book, but somehow his arguments have trickled down to me through others), but I have just been reading the Cleveland book and, for the first time, I am re-thinking this position.
DaleW says
Sean,
Different charts have different ground rules. For example, the categories for a bar chart may or may not be mutually exclusive, but the categories of a pie chart do need to be mutually exclusive (so they sum to 100% of the entire sample, not 50% and not 150%).
True, an Excel XY chart won’t support categorical data, but by formatting an Excel line chart for categorical data to show symbols instead of lines, you can easily create a version of dot plot (without the dot trail) which should be quite easy for anyone to understand. Since such a chart shows a circle or other symbol instead of a bar, very few people will object that they think the scale should always start at zero, as one gets with bar charts or column charts.
BTW, making a dot plot with the categories along the vertical Y axis for easier readability is harder in Excel, but Jon teaches how to do that on this site also.
Sean Carmody says
Dale,
I do like dot charts and have used them myself (although I use R rather than Excel). However, there is an argument that when presenting to an audience that is unfamiliar with dot charts it may be better to stick to a more traditional chart form, so a column chart may be appropriate.
So, my point is that it may sometimes be appropriate to use a column chart and then Cleveland’s arguments against always including zero on a scale could apply. Essentially his arguments are that always including zero can
* waste space
* make it harder to judge variation in data
* viewers can read tick marks
The main counterargument would seem to be Jon’s point, which I tend to agree with, that even experienced users of charts can neglect to carefullly not tick mark values and thus be fooled by the length. This, of course, is the strength of the dot chart as the “length” is not emphasised.
DaleW says
Sean,
Using a traditional chart helps communicate clearly, provided one respects the tradition = ground rules = metaphor of the chart.
Just as for a pie chart communicates the whole to part relationship by its slices, the bar chart or column chart communicates relative magnitudes by its lengths.
Cleveland’s “strive for clarity” rule would not allow him to advocate a chart which violates its own metaphor.
A scatterplot with categorical X instead of ordered X should not shock an audience very much. Cleveland prefers the dot trail to help the audience follow his often numerous categories, but if one adopts the XY plot design but makes one axis categorical with labels, that is also quite acceptable and easy to interpret. Such a design may be slightly less familiar to an audience, but it gives the presenter freedom to avoid starting an axis at zero — in a way that is not confusing and protects the presenter from being credibly accused of trying to mislead.
Sean Carmody says
Dale: would you say then that a vertical axis starting at zero is part of the tradition or metaphor of a column chart?
DaleW says
Sean –
Yes, I believe the bars in bar charts and the columns in column charts should start from zero. Otherwise, they may be accused of being lying SOBs, or, technically, Significantly Offset Bars having Lie Factors (per Edward Tufte) that are noticeably different from an honest figure whose naturally perceived dimensions are directly proportional to the values represented.
The irony here is that I’m pretty much convinced William Cleveland promoted dot plots in part because he concluded bar charts must start from zero to avoid misleading the eye, although you took the opposite message from Cleveland. One of our expert occasional contributors to this blog, charting author Naomi B. Robbins, wrote a nice article (http://www.b-eye-network.com/view/index.php?cid=2468&fc=0&frss=1&ua) that is largely based on Cleveland’s work (all 4 citations excluding her own book) and presents the argument for using dot plots, mentioning that they “do not require a zero baseline as do bar charts.” Unfortunately I don’t have the 1984 American Statistician reference handy to close the case on what Cleveland thought about this.
Over on Junk Charts, there is a nice discussion this month of how the Washington Post mis-charted a surprisingly close reader vote on whether Obama deserved his Nobel Peace Prize. All because a bar chart axis didn’t start at zero.
Would you suggest column charts and bar charts should follow different ground rules?
Sean Carmody says
Dale,
Thanks for the link to the Naomi Robbins article. It is a good one. An extremely relevant point made there is “A dot plot is judged by position along the horizontal axis. Length is not an issue with dot plots.”
I think that this gets to the heart of what I have been grappling with. Although my own preference has always been to start column charts with a zero scale (that’s if I use them at all), after reading the section of Cleveland’s book I referred to I have been trying to work out whether his comments apply to bar/columns charts or not. Of course Cleveland prefers dot plots anyway, so the book has barely any mention of bar/column charts. In fact the terms don’t even appear in the index.
But Robbins point highlights the fact that people probably make judgements of charts such as line charts, scatterplots and dot charts based on position but column charts, bar charts and, most likely, area charts are judged less on position and more on length/area. I think that this is a reasonable basis on which to argue that different ground rules do apply to column/bar charts and so Cleveland’s comments about zero cannot be carried across from the positional domain to the length/area domain.
Of course another important consideration is the intended audience. The broader the audience and the less familiar the audience is with reading charts, the more careful one has to be about possible misreadings of charts. Cleveland is emphatically focusing on charts used for a technical audience and not everything necessarily makes sense for a broader audience. For example, he strongly advocates the use of log scales to better illustrate variation when distributions are skewed. I can’t help thinking that this can lead to misconceptions for a non-technical audience (e.g. deciding that one value is double another when in fact it would be the square). And of course, as Jon points out, even someone as experienced with charts as himself can be caught out by not reading the scale labels.
Finally, I did find a PDF copy of Cleveland’s article. In it the following comment is made about bar charts: “We conjecture that the primary elementary task is judging position along a common scale, but judgments of area and length probably also play a role”.
DaleW says
Sean,
I still suspect that Cleveland’s earlier 1983 manuscript / 1984 paper “Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging” would reveal what he really thought about bar charts that don’t start at zero. His other 1984 paper that you found does seem to suggest he’d concluded bar length distracted from the judgment of endpoint position relative to scale.
Yes, a general audience is more likely to be mislead by bar charts that don’t start from zero. A technical audience — and especially those who read Chart Busters — is more likely to notice the exception and perhaps wonder if the presenter is trying to mislead them visually or simply doesn’t know how to use Excel and/or chart selection that well.
The pattern of tick marks and powers of ten labeling seems a dead giveaway for a proper log axis. One usually doesn’t have to investigate the numbers (if any!) to notice that a log scale is not a simple linear scale starting from zero. Also, a log scale has clear redeeming social or scientific value, and I’m not sure you can make that case for a bar chart whose bars don’t start from zero.
derek says
It’s simply not true that anything that is said of a line or dot chart must also be true of a bar or column chart. The necessity of starting the latter with zero but not necessarily the former, is because they are charts of two completely different variable types. Jacques Bertin, in the 1960s, categorized graphically-presented variables in two ways: Locational (where they were on the page) and Retinal (how big an impact they had on the eye). Lines and dots are locational, while bars and columns are retinal.
But in order for the bar’s retinal properties to be an accurate depiction of the quantity, you have to show the whole bar. To hide part of the bar beneath the bottom of the chart is to mislead the eye. I’ve stopped trying to convince people to include zero on their bar charts; I just say “fine, leave the zero off, but you still have to show me the whole bar.” It’s just a coincidence that this forces them to include the zero :-)
Note that my “show the whole bar” rule not only short circuits the objections, but also allows the circumstances where you genuinely can have a bar chart with no zero on the scale: when it’s a floating bar chart. Then you only have to go down as far as the bottom of the floating bar, not all the way down to zero.
Sean Carmody says
Dale, Derek:
I decided to bite the bullet and buy the relevant Cleveland article from JSTOR. Here is the most relevant quote (which does indeed support the “bar charts are different” line):
“The bar of a bar chart has two aspects that can be used to visually decode quantitative information–size (length and area) and the relative position of the end of the bar along the common scale. The changing sizes of the bars is an important and imposing visual factor; thus it is important that size encode something meaningful. The sizes of bars encode the magnitudes of deviations from the baseline. If the deviations have no important interpretation, the changing sizes are wasted energy and even have the potential to mislead (Schmid 1983)”.
The reference to Schmid is to a book called “Statistical Graphics”.
For me this brings a bit of closure: I have always been a fanatic for zero for bar charts too, but was thrown by Cleveland’s comments about not getting too hung up on zero. It is clear now that he only applies this comment to charts that encode solely by position (e.g. dot charts, x-y plots, line charts).
So Jon, thanks again for the post…the discussion here has led me to enlightenment!
Jon Peltier says
Sean –
I just write what I write, and I am always happy and surprised when a post generates this much discussion.
Don’t miss Derek’s comments above. He no longer tells people to include zero, he changes emphasis and tells them to include the whole bar. So for a floating column chart you need not include zero, just include the minimum and maximum extent of the bars.
DaleW says
Sean,
Thank you for sharing. I would feel more sense of closure about Cleveland and bar charts at this moment myself . . . except I’ve written myself into a corner.
Ironically, I noticed today that an Excel app that I’d set up years ago is still being used to report results to my organization using bar charts with, well, partial bars. Clearly, I am guilty of an engineering compromise, made long before I’d stumbled onto this website and learned about how to make dot plots using Excel, or read Cleveland to understand why they are better. My coworkers seem quite content with these bar charts — and it’s not like the abstract index they represent is something one can actually count or measure with a ruler, anyway. No harm, no foul?
Suddenly I’m appreciating your shades of gray, Sean!
What to do when one’s karma runs over one’s dogma? I guess that my best way of atoning is to try to update my less than ideal bar charts with a dot plot. If that radical idea isn’t accepted, my fall-back recommendation would be to show the whole bars, even though the underlying data is about 90% locational and 10% length (or retinal position), and showing the whole bars will hurt the resolution somewhat.
Sean Carmody says
Dale:
Love the karma/dogma line…I plan to steal it!
Sean.
Matt H says
Edward Tufte would probably ask, “Do you really need a chart to show six data points?”
Table: Cases of Marmalade Sold
Sean Carmody says
Matt,
Much as I love Tufte’s work, he does base much of what he says on aesthetics and his (considerable) intuition. Another approach, of which Cleveland has been an important pioneer, is to draw conclusions based on empirical testing of which forms of data representation are most effective. In their 1991 paper Displaying Proportions and Percentages, Ian Spence and Stephan Lewandowsky compared the efficacy of bar charts, divided bar charts, pie charts and tables. Their focus was on asking three different types of questions about sets of data containing between four and seven data points (A, B, C,…). These were to determine which was greater A or B, A or B+C and A+B or C+D (note that the data was not necessarily presented in that order). They concluded that subjects found it harder to make all three judgements with a table than with either bar charts or pie charts.
One important point here though is that the subjects of the experiment were deliberately selected from groups that were not scientists, statisticians or experts at reading charts and data. So, these conclusions are only really applicable for presenting information to a general audience. Still, that is an important domain to consider.
Sean Carmody says
I have written up some of my thoughts generated by this discussion on my blog.