In Liquidations – Map Visualization, Tony Rose of DSA Insights reviewed a set of charts created by Business Week (click on the image below for the original full-size graphic). Tony described some shortcomings of these
Bar Charts
Bad Bar Chart Practices, or Send in the Clowns
In Get a Clown Suit, my colleague Jorge Camoes bemoans the overuse of the phrase “professional looking charts” to describe an ever expanding selection of gaudy and distracting visual effects. The particular graphic that set Jorge off was this chart from SmartDraw showing population of the ten most populous countries:
Although this chart used some kind of androids to represent the data, Jorge was envisioning the following chart in his head.
Choice of symbol to encode data
There are a number of things about this chart that should be adjusted. First, the clowns are distracting, perhaps even a little bit more so than the characters in the first chart. Whenever images like this are used, they really fill a rectangle, but the missing parts of the rectangle around the image cause us to misinterpret the true height of the bars. Let’s remedy this by replacing the clowns by rectangles of the same height and width. If you notice, even in the chart above, the value axis labels are more concise: 1500 M instead of 1,500,000,000.0 (who needs one decimal place precision on a billion and a half?).
Okay, much better, at least we can better judge the extent of the numbers. There is a lot of overlapping, though. In the first clown chart above, you may have missed the Brazilian clown behind the huge shoe of the Chinese clown. In the rectangle plot above I’ve used white outlines to highlight boundaries of the rectangles which are placed in front of larger ones, but it’s better not to overlap bars.
Now the neat alignment of the axis labels and the bars is lost, with the wide bars pushing the narrower ones aside.
Use of color (brightness and contrast)
Another issue with the SmartDraw chart is that the larger bars are bolder than the smaller bars. They are darker in color and less blurry. This emphasizes the larger values more than dictated by just the heights of the associated figures.
In fact, the larger androids have a thicker, darker border as well as a darker interior. If the intention was to show that China and India are enormous and nothing else matters, well, that was achieved.
Aspect ratio of data points
In addition to the color, the width of each bar distorts the perceived value it encodes. The height of each bar is proportional to the value it represents. So is its width. The area therefore is proportional to the value raised to the second power. The color difference probably makes the whole effect proportional to the value raised to the power of 2.5.
We can improve the clown chart by making each clown the same width.
But that distorts the clowns so that they are hardly recognizable. Just as the clowns themselves distort the data so that it is hardly recognizable. We can address this by retaining the aspect ratio of the clowns, and stacking them to heights that represent the different values.
The poor Brazilian clown has been decapitated. I suppose that’s why he was hiding in the first chart. The partial clowns may be even more distracting than the stretchy clowns above.
Finally, a bar chart
We can eliminate the distraction of the clown symbols by using a plain rectangle.
Sorting order
One thing that would help us to compare values across the chart is removing the arbitrary alphabetical sort of the country names with a more meaningful sort based on the values being charted. All of these charts use up lots of room, and without the fat clowns, we no longer need so much space.
See, China and India are much bigger than the other countries, but the US still accounts for 20 to 25% of the population of each (not 1/50 of the emphasis provided by length, width, and color).
Chart Orientation
The labels overlap because of the shrinking of the chart. This can be remedied in several ways: use abbreviations for the country names, rotating the labels so the reader must turn the page or tilt his head, or rotating the chart so the labels and the values can be read without contortions. Excel will volunteer to omit some labels, but this is not useful if the data are not numeric and proportionally spaced.
The nice thing about this chart is that it scales nicely if we decide to plot the top 25 countries.
A little embellishment won’t kill you
If you still need to resort to colorful images to draw attention to your chart, you can still add your clown, but a bit to the side so you can still make out the data. Remember that less can be more.
But to paraphrase Professor Tufte, if your data isn’t interesting, you need other data.
Summary of charting badness
Here are some things you should be aware of in your own charting efforts.
- Axis labels are too long, and show excessive precision. All of the text in the original was too small.
- Distracting symbols were used instead of bars.
- Symbols encode data in their length, width, and color, where only length is needed, and where only length should be used.
- Varying width bars overlap each other, partially hiding data.
- Values are sorted arbitrarily, not by value.
Funnel (Tornado) Chart
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.
Chandoo wrote a tutorial about Sales Funnel Charts in Excel. Technically his protocol was okay, and the result looks, well, like a funnel. I’ve written about Funnel Charts (Bad Graphics) and Stacked Pyramid Charts (Bad Graphics) before, and Chandoo’s doesn’t suffer from a misuse of 3D and shading effects.
The problem with Funnel and Pyramid charts is that they usually suffer from a misapplication of the funnel metaphor, and Chandoo’s is no different. In this incorrect metaphor, a funnel is used to show an ever narrowing amount of something (sales, in Chandoo’s example) the further along the stream the process goes. The quantity decreases because some of it is removed from the stream or blocked from proceeding. In a real funnel, there is no loss of this quantity, in fact the flux of the quantity past a point must increase to account for the decreased opening size. But nothing is lost.
In fact, a better representation would be a Sankey Diagram (see below), which accounts for the losses through the process. But that’s a discussion for another day.
One can still use a bar chart to represent the quantities at each phase of a process. One improvement is to move the bars to one side of the chart. This gives better resolution to the values (wider difference on the chart for the same difference of the measured quantity). This also makes it easier to compare relative values. Because of the shared baseline along the left edge of the chart, it’s easier to judge that in Phase 2 and in Phase 3 the values are respectively a bit over and a bit under 50% of the value in Phase 1. In the tornado chart above, one could not be so confident in these estimations.
In Tornado Charts and Dot Plots I described a different type of chart, variously called a tornado chart or a funnel chart, which compares two populations by plotting bars either to the left or to the right of the middle of the chart. The symmetry is appealing, but the ability to compare two bars that strike out in opposite directions is compromised. I suggested using dot plots instead. Here is a rudimentary dot plot of Chandoo’s data:
If we are satisfied with the resolution of the original funnel chart, we can still benefit from the common baseline by using left-aligned bars half as wide as those of the funnel chart. This also saves a lot of space in a crowded report.
One could also make a column chart of the data, but the category labels either must be rotated or shortened, reducing legibility in both cases.
At the outset I said that Chandoo’s protocol was “okay”, but he could have used a different bar chart approach to make it better. Basically, a funnel chart is made of two series, one showing the values (orange below) and the other normally hidden (green) which helps to position the value bars.
When I make charts like this, I make floating bar charts, which are stacked bar charts, with a dummy series that pushes the value series into position. Below I’ve offset my green and orange bars to illustrate the construction. I can hide my dummy series by formatting with no borders and no fill.
Chandoo chose instead to use clustered bars, as shown below, with the orange bars including the data plus the (green) offset into a total value. The green bars are then moved in front of the orange bars and hidden by formatting them with a fill color that matches the plot area background.
As soon as somebody tries to read values off the chart by mousing over the bars, the superiority of the stacked bar approach is evident. Below I’ve moused over one of the value bars, and we see it’s from the Funnel series and the value is 21,180.
If I try to mouse over one of my dummy bars, the cursor can’t see it, because it has no fill, and instead all that is detected is the plot area.
In the clustered bar approach, the exaggerated value of the Total series is displayed (53,593), and if this point is selected, the covered end of the bar is outlined.
The hidden points can also be detected, as in the Dummy point below with a value of 32,408.
Sometimes it helps to use fancy formatting or designs to make a chart more noticeable. But in a business report, if the data is important, it does not need fancy effects to be noticed. Selecting a funnel chart, even using the improved stacked bar approach, is still a compromise which decreases the effectiveness of the chart.
Pie Chart Plotting Deficiency
In the Engineering Windows 7 blog on MSDN, in Windows 7 Energy Efficiency, The Windows team posted a chart showing how energy is consumed in a modern laptop.
No surprise that almost half is spent lighting the display.
But wait, that looks a lot closer to 50% than 43%. Let me see what I get using their numbers.
Looks the same. Let’s do a quick check of the numbers. 43% + 21% + . . . look at that, 90%! Often the numbers are off by 1% or thereabouts, due to rounding effects in the labels (but see Pie Chart Rounding in Excel for a discussion of rounding errors in Excel pie charts). But in this case they’ve obviously left something out.
All the well-intentioned guidelines for charting say to use a pie chart to show how parts make up the whole. And yet, here we have only some of the parts making up 90% of the whole. The basic premise of a pie chart is violated by omitting this 10% of the total energy usage.
Pie charts are supposedly the best way to show proportions of this sort, although they are not really so good even at that. People mistake familiarity with effective information display.
The same data is shown as effectively in a bar chart, either with the original values:
or with the missing 10% accounted for:
These charts are not distorted by exclusion of some of the data; no basic assumptions of bar charts have been violated.
In addition, at 320×221 pixels, the original bar chart uses only 54% of the space as my original pie chart (426×306 pixels).
Scary Info Graphic
In a comment to my post How to Make a Donut-Pie Combination Chart, Sjoerd Hoogwater pointed me to Scary Bailout Money Info Graphic. The post features information about the federal bailout of the banking industry, but I don’t know what’s scary, the bailout money or the info graphic. In this scary post, Wade took data from Boing Boing’s Cory Doctorow (Bailout costs more than Marshall Plan, Louisiana Purchase, moonshot, S and L bailout, Korean War, New Deal, Iraq war, Vietnam war, and NASA’s lifetime budget — *combined*!), whe got it from Barry Ritholtz, Big Bailouts, Bigger Bucks, and made this outstanding example of why pies are bad, and why two pies are worse than one:
There are a few things to note about this chart.