## The Problem

I encountered a new blog recently called HelpMeViz.com, which was recently established to give the visualization community a forum to discuss their projects, and give and receive advice and feedback about their work.

In Excel: Multiple Lines Across X-Axis Categories, Lars Verspohl described a column chart with columns forming 12-month time series of some variable for nine countries. Lars wanted to plot another variable on this chart as a set of line chart time series. This is his sketch of the chart he envisioned (click on the image to see it full size). Maarten Lambrechts suggested using lines for both variables, dispensing with the original column chart. I agreed, and since the description of how I would do it was much too long to fit into a comment, I have written this tutorial.

### Data

The data I used in my example is shown below. Since I made up the values, I didn’t bother showing all 118 rows of the worksheet, just enough to see the pattern. The blue shaded range has the values we want to plot. There are 12 monthly values for each country, plus a blank in between countries. The blank leaves a gap between the connected points of one country and the next. The pink shaded range contains data for a dummy series which will anchor the country names along the bottom axis. The X values (6.5, 19.5, etc.) are centered within each country’s monthly points (6.5 is between June and July for Spain, etc.). The Y values are zero so the points are located along the bottom of the chart. The green shaded range contains the country names which will be used as data labels for this dummy series.

### The Chart

Select the blue shaded range, and insert an XY Scatter chart, the style with lines and no markers. I’ve already fixed the X axis so it has a tick label (and vertical gridline) every 13 units, from 0 to 9×13=117.

### The Labels

Select and copy the pink shaded range, then select the chart, and use Paste Special to add the data to the chart as a new series, by column, with X values (categories) in the first column and series name in the first row. I’ve formatted this new data as round markers without a line, so you can see how it all works. If you have Excel 2013, select the added points, and add Data Labels, using the Below position, and the Labels from Cells option, selecting the green shaded range with country names for the labels. If you don’t have Excel 2013 (or even if you do) you can use Rob Bovey’s Chart Labeler add-in to add these labels. It’s free, but you’ll need to download and install the Labeler before you can continue. Again, position the labels below the Label series data points, using the labels in the green shaded range.

### Clean Up

Hide the Label series by formatting it with no line and no markers. Choose No Labels for the X axis. Delete the Label legend entry by clicking on it once to select the legend and again to select the legend entry, then pressing Delete. That’s pretty good, and it didn’t take all that long.

### Secondary Panel

Most of the time, the two different variables will have much different magnitudes. It is tempting to stick one of the variables on the secondary axis, and let the two series just lie across each other. But there are many more good reasons not to overlap the data than good or bad reasons to overlap them. So we’ll put the secondary variable into a secondary panel of the chart. We’ll start by formatting the Red series so it is on the secondary axis (ignore the fact that Red and Blue values are not so different). I’ve gotten a head start here by giving the plot area a slightly darker gray border, and giving the horizontal gridlines a lighter gray. Now we need to fix the scaling. The Blue data spans 0 to 0.6, and to fit it into the bottom panel, we will scale the left hand axis from 0 to 1.2. The Red data really spans 0 to 0.4, but I’ll use 0 to 0.6 as its range to make things work out nicely. To make Red fit the top panel, we’ll scale the right hand axis from -0.6 to 0.6. The scales work out nicely, and the primary horizontal gridlines fit both axes. Sometimes this requires more trial and error. Now we need to neaten up the axes. We don’t want the primary axis ticks to extend across the secondary panel, and vice versa, so we’ll use custom number formats to hide the labels we don’t want. The right hand axis is easy. Format the axis so it has a custom number format of `0.0;;0.0;` which has four elements separated by semicolons. The first element tells Excel to use one decimal digit for positive labels. The second element is actually absent, telling Excel not to show negative labels. The third element tells Excel to display zero labels with one decimal digit. The fourth element is also absent, indicating that text labels should not be shown. The left hand axis is a bit trickier. Format the axis so it has a custom number format of `[<=0.6]0.0;` which has two elements separated by semicolons. The first element has a conditional expression in square brackets which tells Excel to use one decimal digit for labels less than or equal to 0.6. The second element is absent, telling Excel to omit labels that do not meet the first condition. Finally, we need a line to separate the top and bottom panels. Let’s use the horizontal axis, which so far has remained at the bottom of the chart. First, I formatted this axis to match the plot area border. Then I formatted the left hand axis so that the horizontal axis crosses at 0.6. Separating the series into distinct panels clarifies the data even when they don’t need different axis scales.

## Keeping the Column/Line Style

In a follow-up comment to the original article cited at the top of this tutorial, Lars Verspohl indicates that he wants to keep the columns, and not convert everything to lines. This is also possible.

### The Data

The column chart data is a simple grid of months by countries, shown in the blue shaded range below. The data for the lines is given in the red shaded range below. I’ll explain the calculation of X values in column N in a moment.

### The Column Chart

The column chart is easy. Select the blue shaded data, and insert a Clustered Column chart. Switch rows and columns, because Excel by default wants to plot fewer series of more points each. The most tedious part of making this chart is formatting all the graduated shades of blue.

### X Values for the Added Lines

I’ll use a subset of the data below to show how to compute the X values. The column chart by default has a gap width of 150%. That means the gap between the clusters of columns is as wide as 1.5 individual columns. This means there is a gap of 0.75 columns to the left and to the right of each cluster, and the entire width of a category is 0.75 columns for the gap on the left + 12 columns for the months + 0.75 columns for the gap on the right, or 13.5 columns. The categories are internally numbered starting at 1 for the first category up to N, the number of categories. These whole numbers are the X values at the center of each cluster (at the vertical yellow lines in the chart below). By extension, the X values between clusters are the halves. Rather than 0 to N, then, the X axis effectively runs from 0.5 to N+0.5, as shown by the numerical labels at the top of the chart below. The first data point for the red line is in the middle of the first column, so it is 0.75 columns for the gap and 0.5 columns for the half of a column to the right of the vertical axis, which itself has an X value of 0.5. We want X to go from 0.5 at the left of the first gap to 1.5 at the right of the first gap. So the first X value is 0.5+(0.75+0.5)/13.5, or 0.59259. The second data point is in the middle of the second column, one column to the right of the first point, so its X value is 0.5+(0.75+0.5+1)/13.5, or 0.66667. The third X value is 0.5+(0.75+0.5+2)/13.5, or 0.74074. Continue to the last X value for the first continuous line, at 0.5+(0.75+0.5+11)/13.5, or 1.40741. There is a blank row between countries, so there will be a gap in the plotted line. The next set of X values is calculated by adding 1 to the first set of values. Continue for the rest of the countries.

When this data is plotted by itself in an XY Scatter plot, the result is shown below. I’ve set the axis scale to 0.5 to 9.5 to match the internal X values in the column chart. A quick comparison of this chart with the column chart indicates that our calculations are correct.

### Adding the Lines

Adding the lines will take a few steps. First select and copy the red shaded range which includes our meticulous calculations. Then select the chart, and use Paste Special to add the data as a new series, by columns, with X values (category labels) in the first column, and series names in the first row. This temporarily scrunches the existing columns way to the left of the chart, and makes all of the columns very thin. Right click on the skinny red columns, choose Change Series Chart Type, and select the XY Scatter style with lines and no markers. Excel automatically places the XY series on the secondary axis. The secondary X axis (top) runs from 0 to 10, so the red lines are not in alignment with the blue columns. The secondary Y axis (right) is also not in alignment with the primary Y axis (left). The last step is to format the XY Scatter series so it is plotted on the primary axes with the Clustered Column series. This brings all the data into alignment.

## Chart Busters: Fix the Heat Map Donut Chart

I reviewed a colorful yet ineffective graphic in Wow! Heat Map Donut Chart!. No critique of a graphical technique is complete without describing alternative techniques which would be more effective. Starting from the heat map donut chart, I’ll work backwards, removing features that negatively affect chart readability, then forwards, changing to features that improve readability.

## Hierarchy of Visual Elements

To guide our redesign of the heat map donut chart, we will refer to Figure 2 of Presentation Graphics by Leland Wilkinson.

This figure shows William Cleveland’s ranking of different graphical features in terms of how effectively they are for encoding and decoding data.

## Heat Map Donut Chart

Here is the heat map donut chart in all of its glory, including intense primary colors and gradients.

This graphic uses color to encode its data, which according to Wilkinson’s chart is the least effective graphical feature for this purpose. The gradients in color between data points only makes the use of colors more ineffective.

## Unsmear Colors

Because of the gradients of color, the data for each category is encoded only along the spoke at the center of that category’s segment of the donut.

To use the entire segment to encode the data, to clearly delineate the segments, and to eliminate potentially misleading intermediate colors between spokes, let’s unsmear the colors.

## Size Wedges Proportionally

Now that we’re not using color to encode data, we can move up in Cleveland’s hierarchy. Let’s skip volume, since we already know (haven’t you been paying attention for all these years?) that 3D effects distort and mislead. Excel doesn’t even offer 3D donut charts, but there are blah blah professional state of the art yada yada business intelligence solutions that do. We’re using a donut chart anyway, so let’s make the area of the wedges proportional to the underlying data.

## Use Color-Vision-Deficiency-Friendly Colors

Those primary colors are still glaring, and there are multiple points using the same colors. Let’s apply Excel’s dull but useful default color scheme.

A companion article, Color Vision Issues with Heat Map Donut Charts, discusses how color vision deficiencies can hinder readability of certain color schemes, such as the scheme used in the heat map donut chart.

## Sort by Size

Since we’re not very good at comparing areas, especially of rotated shapes, we can help our brains rank the data by sorting the points according to value.

## Replace Legend with Data labels

The legend forces us to move our eyes back and forth from the donut to the legend, then remember which color corresponds to which category. This slows comprehension and overloads our feeble short term memory. Let’s remove the legend altogether and apply data labels to the wedges.

## Put the Hole Back Into the Donut

The donut chart uses area to encode data. The figure above lists angle above area, hmmm. The donut also uses angle to an extent, but there is a big hole where the angles come together. Let’s put the hole back into the donut, and use a pie chart, which uses redundant approaches of area and angle.

## Encode Using Length: Bar Chart

Another glance at the figure of hierarchical graphic elements shows that length is more effective than area or angle. How do we get length? In a sense the donut chart uses length, arc length, but the elements not straight, nor do they share a common starting point, so this length isn’t used effectively. Let’s totally scrap the circular chart paradigm, and draw bars proportional in length to the underlying values.

Notice an additional benefit that the bars don’t have to be different colors, and the labels are right there on the axis.

## Encode Using Position Along Scale: Dot Plot

Again we check Cleveland’s list. In this context position along nonaligned scales doesn’t make sense. But position along a common scale? But position along a common scale? We have the common scale from the bar chart. Let’s put a symbol at the end of each bar, then hide the bars.

Like the bars in the previous chart, the dots in this plot need not be different colors. In fact, color can be used here in its most effective role, identifying different sets of dots. We could overlay target and actual values with little confusion, particularly if colors were chosen to minimize the problems of color blindness. Using different shapes for the symbols is a redundancy that helps identify the groups of data.

The bar chart and the dot plot are the two most effective ways to display this data quantitatively and clearly. Not only are these charts easy to read, they are easy to create and maintain.

## What if We Work Backwards?

Suppose we were unfamiliar with Cleveland’s hierarchy of graphical effectiveness. How could we lead ourselves astray starting from such effective graphs as the bar chart and dot plot? To reinforce the lessons from Cleveland and Wilkinson, let’s explore.

We could color code the bars or dots. Not a terrible idea, but also unnecessary.

Now we’re free to lose sight of the effectiveness of the different encoding methods. We can decide color is all we need to show the data, so we remove the variation in lengths of the bars, or the horizontal positions of the dots.  Do you notice the conversion from a quantitative display to a qualitative one?

Let’s label the bars directly, or put the dots in front of the labels.

The last bulleted list could be made easier in the worksheet (or in PowerPoint or Word!) than in an Excel chart.

We can simplify the labeled bullets or labeled bars by simply coloring the text. Simple and totally qualitative, complete with potential red-green color vision issues, like so many color-coded but non-graphical tables of data we’ve endured.

I’ll leave it to the reader, guided by Figure 2 from Wilkinson’s paper, to work backwards from this colored list to a useful quantitative display. Here is the data, if you’re interested:

## Heat Map Donut Charts

My colleague Debra Dalgleish steered me toward an article about a ‘Hot doughnut’ chart in Excel.

Hmm, very interesting. Eye-catching.

You can take the above and combine it with target values in another concentric ring, add a few labels, and make it really pretty. This is from a companion article, How to create a heatmap doughnut chart.

Despite its attractiveness, at first glance I didn’t think it was very effective. You know, donut charts being even less effective than pie charts. But I sat down and went through at least the preliminary steps of recreating the chart.

Note: I apologize for the use of jpeg images. On one of my monitors they look absolutely horrendous, with terrible artifacts everywhere, but on the new monitor they’re okay. I normally use png images for my charts, but some of the images in this article were only available as jpegs.

## How to Make a Heat Map Donut Chart

Here is the data. It looks unsorted, but I’ll describe the unusual sorting order shortly.

Make a nice donut chart (as if there ever were such a thing).

Recolor the wedges based on value (red at the large end of the values, through orange, yellow, and green, to blue at the small end).

Perhaps we need another legend to clarify the sequence of the color codes?

Remove any size data for the slices, using 1 for each data point’s value. Let’s assume we don’t need sizes, since the colors are encoding the values.

In the previous recolored charts I kept a thin white border on the wedges, so adjacent wedges of the same color don’t just look like one larger wedge. In this chart, such adjacent wedges merge into a single wedge.

Now smudge the colors between the centers of adjacent wedges. I didn’t actually do this; below is a screen shot from the original article. The approach I’d take is to divide the wedge into a number of smaller wedges, and gradually change each mini-wedge’s color to simulate a gradient from the center of one wedge to the center of the next. Start with all blue, change to mostly blue plus a little green, then to still mostly blue plus more green, to mixed blue-green, to green with some blue, to green with just a little blue, to all green.

This chart still needs labels for the wedges, and probably a data table so you can see the values which are obscured by the artistic effects.

The last few charts illustrate the unique sorting. This actually took me a while before I noticed it. The smallest point (blue) is at the top and the largest (red) at the bottom. Some of the points go clockwise from the smallest to the largest, and the others go counter clockwise. If you start in one place, the values go from small to large and back to small, like a sine wave. This provides two “continuous” color paths, so that smearing of colors between one wedge and another doesn’t introduce an intermediate color from the scale.

## What Makes the Heat Map Donut Chart Ineffective?

Before diving into this critique, I want to point out that it is important to experiment with visual techniques. We should display our data using a variety of existing approaches to tease insights from the data. We should also apply new methods that may make it easier to find certain patterns or make the data more approachable by a wider audience.

However, we also need to review our attempts honestly, so we can concentrate on approaches that work and shelve those that do not.

There are a number of features of the heat map donut chart that make it ineffective as a data display method.

### Color Gradient

First, the value of each wedge is only encoded by the color in the very center of the wedge, that is, along the spoke that would connect it to the hub of the chart. Gradients in color generally indicate variations in data, but in this case the gradients are gratuitous artwork. Worse than that, the reader may be fooled into thinking there is real data in the spaces between spokes.

### Effectiveness of Encoding Techniques

A more fundamental problem is illustrated by Figure 2 of Presentation Graphics by Leland Wilkinson. This figure shows William Cleveland’s ranking of different graphical features in terms of how effectively they are for encoding and decoding data.

The heat map uses color to encode values. Cleveland’s hierarchy of graphical elements lists color as the least effective encoding means. Color can be effective to indicate different categories (for example, different lines in a chart), but it is not a good choice for displaying continuously variable numerical data.

### Color Vision

Another reason color is a poor choice is that an estimated 8% of the male population (and only about 0.4% of the female population) find it difficult or impossible to distinguish between certain colors. A companion article, Color Vision Issues with Heat Map Donut Charts, uses these heat map donut charts to investigate how color vision deficiencies interfere with color-based data encoding.

## Chart Busters: Fix the Heat Map Donut Chart

No critique of a graphical display is complete without a description of one or more improved ways to display the same data. My improvements are shown in Chart Busters: Fix the Heat Map Donut Chart.

## Funny Conclusions from a Dual Pie Chart

Earlier this year, I saw a BUSINESS INSIDER CHART OF THE DAY story called Microsoft Is Winning More Developers To Its Mobile Platform.  They showed how the proportions of new apps built by mobile developers were distributed among the four mobile platforms for the second quarter of 2011 and of 2012. Here is a reconstruction of their chart.

Yes, they used side-by-side pie charts with a distant legend. The data isn’t so complicated that this chart actually distorted anything, but it made the information hard to assimilate. They got their data from a nearly identical chart in Flurry Blog’s Microsoft May Be Closer Than It Appears in Android’s Rearview Mirror. I find this headline amusing, implying that Windows Phone was right behind Android in app development. Several other online channels picked up the same story from these two sources, and some even came to the conclusion that Windows Phone had not only Android but also iOS on the ropes.

It’s not easy to compare wedge sizes between separate pies (or even within the same pie), so I tried another approach. I built this stacked area chart. It’s easy to see that Windows Phone has gained and iOS has lost share, and that Windows Phone and Blackberry are nearly insignificant compared to Android and iOS. But we can’t really judge any change in Android without reading the data labels.

The Business Insider conclusion was that Microsoft was winning more developers. They could have shown this clearly with a single area chart, though it would have ignored the context of the rest of the industry. Of course, the data labels provide some context: even 4% is a very small market share.

Probably the best way to show this data is with a slopegraph: a line chart with two categories, in this case, two dates, showing the changes in the values. This chart shows that iOS has the major share, more than twice Android in second place, and that Windows

Here we can see a modest decrease in iOS, matched by a similar increase in Windows and a tiny increase in Android. Blackberry, thanks for playing. Since these changes are only a few percentage points, I’d like to see monthly or at least quarterly data during the year between the two endpoints, before drawing any conclusions.