Multiple Line Charts by Category

The Problem

I encountered a new blog recently called HelpMeViz.com, which was recently established to give the visualization community a forum to discuss their projects, and give and receive advice and feedback about their work.

In Excel: Multiple Lines Across X-Axis Categories, Lars Verspohl described a column chart with columns forming 12-month time series of some variable for nine countries. Lars wanted to plot another variable on this chart as a set of line chart time series. This is his sketch of the chart he envisioned (click on the image to see it full size). Multiple Lines Across Column Chart Maarten Lambrechts suggested using lines for both variables, dispensing with the original column chart. I agreed, and since the description of how I would do it was much too long to fit into a comment, I have written this tutorial.

Data

The data I used in my example is shown below. Since I made up the values, I didn’t bother showing all 118 rows of the worksheet, just enough to see the pattern. The blue shaded range has the values we want to plot. There are 12 monthly values for each country, plus a blank in between countries. The blank leaves a gap between the connected points of one country and the next. The pink shaded range contains data for a dummy series which will anchor the country names along the bottom axis. The X values (6.5, 19.5, etc.) are centered within each country’s monthly points (6.5 is between June and July for Spain, etc.). The Y values are zero so the points are located along the bottom of the chart. The green shaded range contains the country names which will be used as data labels for this dummy series. Multiple Line Charts by Category - Data

The Chart

Select the blue shaded range, and insert an XY Scatter chart, the style with lines and no markers. I’ve already fixed the X axis so it has a tick label (and vertical gridline) every 13 units, from 0 to 9×13=117. Multiple Line Charts by Category - Chart 1

The Labels

Select and copy the pink shaded range, then select the chart, and use Paste Special to add the data to the chart as a new series, by column, with X values (categories) in the first column and series name in the first row. I’ve formatted this new data as round markers without a line, so you can see how it all works. Multiple Line Charts by Category - Chart 2 If you have Excel 2013, select the added points, and add Data Labels, using the Below position, and the Labels from Cells option, selecting the green shaded range with country names for the labels. If you don’t have Excel 2013 (or even if you do) you can use Rob Bovey’s Chart Labeler add-in to add these labels. It’s free, but you’ll need to download and install the Labeler before you can continue. Again, position the labels below the Label series data points, using the labels in the green shaded range. Multiple Line Charts by Category - Chart 3

Clean Up

Hide the Label series by formatting it with no line and no markers. Choose No Labels for the X axis. Delete the Label legend entry by clicking on it once to select the legend and again to select the legend entry, then pressing Delete. Multiple Line Charts by Category - Chart 4 That’s pretty good, and it didn’t take all that long.

Secondary Panel

Most of the time, the two different variables will have much different magnitudes. It is tempting to stick one of the variables on the secondary axis, and let the two series just lie across each other. But there are many more good reasons not to overlap the data than good or bad reasons to overlap them. So we’ll put the secondary variable into a secondary panel of the chart. We’ll start by formatting the Red series so it is on the secondary axis (ignore the fact that Red and Blue values are not so different). I’ve gotten a head start here by giving the plot area a slightly darker gray border, and giving the horizontal gridlines a lighter gray. Multiple Line Charts by Category - Chart 5 Now we need to fix the scaling. The Blue data spans 0 to 0.6, and to fit it into the bottom panel, we will scale the left hand axis from 0 to 1.2. The Red data really spans 0 to 0.4, but I’ll use 0 to 0.6 as its range to make things work out nicely. To make Red fit the top panel, we’ll scale the right hand axis from -0.6 to 0.6. Multiple Line Charts by Category - Chart 6 The scales work out nicely, and the primary horizontal gridlines fit both axes. Sometimes this requires more trial and error. Now we need to neaten up the axes. We don’t want the primary axis ticks to extend across the secondary panel, and vice versa, so we’ll use custom number formats to hide the labels we don’t want. The right hand axis is easy. Format the axis so it has a custom number format of 0.0;;0.0; which has four elements separated by semicolons. The first element tells Excel to use one decimal digit for positive labels. The second element is actually absent, telling Excel not to show negative labels. The third element tells Excel to display zero labels with one decimal digit. The fourth element is also absent, indicating that text labels should not be shown. The left hand axis is a bit trickier. Format the axis so it has a custom number format of [<=0.6]0.0; which has two elements separated by semicolons. The first element has a conditional expression in square brackets which tells Excel to use one decimal digit for labels less than or equal to 0.6. The second element is absent, telling Excel to omit labels that do not meet the first condition. Finally, we need a line to separate the top and bottom panels. Let’s use the horizontal axis, which so far has remained at the bottom of the chart. First, I formatted this axis to match the plot area border. Then I formatted the left hand axis so that the horizontal axis crosses at 0.6. Multiple Line Charts by Category - Chart 7 Separating the series into distinct panels clarifies the data even when they don’t need different axis scales.

Keeping the Column/Line Style

In a follow-up comment to the original article cited at the top of this tutorial, Lars Verspohl indicates that he wants to keep the columns, and not convert everything to lines. This is also possible.

The Data

The column chart data is a simple grid of months by countries, shown in the blue shaded range below. The data for the lines is given in the red shaded range below. I’ll explain the calculation of X values in column N in a moment. Multiple Lines and Columns by Category - Data

The Column Chart

The column chart is easy. Select the blue shaded data, and insert a Clustered Column chart. Switch rows and columns, because Excel by default wants to plot fewer series of more points each. Multiple Lines and Columns by Category - Chart 1 The most tedious part of making this chart is formatting all the graduated shades of blue.

X Values for the Added Lines

I’ll use a subset of the data below to show how to compute the X values. The column chart by default has a gap width of 150%. That means the gap between the clusters of columns is as wide as 1.5 individual columns. This means there is a gap of 0.75 columns to the left and to the right of each cluster, and the entire width of a category is 0.75 columns for the gap on the left + 12 columns for the months + 0.75 columns for the gap on the right, or 13.5 columns. The categories are internally numbered starting at 1 for the first category up to N, the number of categories. These whole numbers are the X values at the center of each cluster (at the vertical yellow lines in the chart below). By extension, the X values between clusters are the halves. Rather than 0 to N, then, the X axis effectively runs from 0.5 to N+0.5, as shown by the numerical labels at the top of the chart below. The first data point for the red line is in the middle of the first column, so it is 0.75 columns for the gap and 0.5 columns for the half of a column to the right of the vertical axis, which itself has an X value of 0.5. We want X to go from 0.5 at the left of the first gap to 1.5 at the right of the first gap. So the first X value is 0.5+(0.75+0.5)/13.5, or 0.59259. The second data point is in the middle of the second column, one column to the right of the first point, so its X value is 0.5+(0.75+0.5+1)/13.5, or 0.66667. The third X value is 0.5+(0.75+0.5+2)/13.5, or 0.74074. Continue to the last X value for the first continuous line, at 0.5+(0.75+0.5+11)/13.5, or 1.40741. There is a blank row between countries, so there will be a gap in the plotted line. The next set of X values is calculated by adding 1 to the first set of values. Continue for the rest of the countries.

Multiple Lines and Columns by Category - Chart 2

When this data is plotted by itself in an XY Scatter plot, the result is shown below. I’ve set the axis scale to 0.5 to 9.5 to match the internal X values in the column chart. A quick comparison of this chart with the column chart indicates that our calculations are correct. Multiple Lines and Columns by Category - Chart 3

Adding the Lines

Adding the lines will take a few steps. First select and copy the red shaded range which includes our meticulous calculations. Then select the chart, and use Paste Special to add the data as a new series, by columns, with X values (category labels) in the first column, and series names in the first row. Multiple Lines and Columns by Category - Chart 4 This temporarily scrunches the existing columns way to the left of the chart, and makes all of the columns very thin. Right click on the skinny red columns, choose Change Series Chart Type, and select the XY Scatter style with lines and no markers. Excel automatically places the XY series on the secondary axis. The secondary X axis (top) runs from 0 to 10, so the red lines are not in alignment with the blue columns. The secondary Y axis (right) is also not in alignment with the primary Y axis (left). Multiple Lines and Columns by Category - Chart 5 The last step is to format the XY Scatter series so it is plotted on the primary axes with the Clustered Column series. This brings all the data into alignment. Multiple Lines and Columns by Category - Chart 6

Peltier Tech Chart Utility

Chart Busters: Fix the Heat Map Donut Chart

I reviewed a colorful yet ineffective graphic in Wow! Heat Map Donut Chart!. No critique of a graphical technique is complete without describing alternative techniques which would be more effective. Starting from the heat map donut chart, I’ll work backwards, removing features that negatively affect chart readability, then forwards, changing to features that improve readability.

Hierarchy of Visual Elements

To guide our redesign of the heat map donut chart, we will refer to Figure 2 of Presentation Graphics by Leland Wilkinson.

Color gradient allows data encoding only along spokes

This figure shows William Cleveland’s ranking of different graphical features in terms of how effectively they are for encoding and decoding data.

Heat Map Donut Chart

Here is the heat map donut chart in all of its glory, including intense primary colors and gradients.

Heat map donut chart

This graphic uses color to encode its data, which according to Wilkinson’s chart is the least effective graphical feature for this purpose. The gradients in color between data points only makes the use of colors more ineffective.

Unsmear Colors

Because of the gradients of color, the data for each category is encoded only along the spoke at the center of that category’s segment of the donut.

Color gradient allows data encoding only along spokes

To use the entire segment to encode the data, to clearly delineate the segments, and to eliminate potentially misleading intermediate colors between spokes, let’s unsmear the colors.

Heat map donut chart with gradients removed

Size Wedges Proportionally

Now that we’re not using color to encode data, we can move up in Cleveland’s hierarchy. Let’s skip volume, since we already know (haven’t you been paying attention for all these years?) that 3D effects distort and mislead. Excel doesn’t even offer 3D donut charts, but there are blah blah professional state of the art yada yada business intelligence solutions that do. We’re using a donut chart anyway, so let’s make the area of the wedges proportional to the underlying data.

Donut chart with proportionally sized wedges

Use Color-Vision-Deficiency-Friendly Colors

Those primary colors are still glaring, and there are multiple points using the same colors. Let’s apply Excel’s dull but useful default color scheme.

Donut chart with color vision friendly palette

A companion article, Color Vision Issues with Heat Map Donut Charts, discusses how color vision deficiencies can hinder readability of certain color schemes, such as the scheme used in the heat map donut chart.

Sort by Size

Since we’re not very good at comparing areas, especially of rotated shapes, we can help our brains rank the data by sorting the points according to value.

Donut chart with sorted data points

Replace Legend with Data labels

The legend forces us to move our eyes back and forth from the donut to the legend, then remember which color corresponds to which category. This slows comprehension and overloads our feeble short term memory. Let’s remove the legend altogether and apply data labels to the wedges.

Donut chart with data labels instead of legend

Put the Hole Back Into the Donut

The donut chart uses area to encode data. The figure above lists angle above area, hmmm. The donut also uses angle to an extent, but there is a big hole where the angles come together. Let’s put the hole back into the donut, and use a pie chart, which uses redundant approaches of area and angle.

Pie chart (area and angle)

Encode Using Length: Bar Chart

Another glance at the figure of hierarchical graphic elements shows that length is more effective than area or angle. How do we get length? In a sense the donut chart uses length, arc length, but the elements not straight, nor do they share a common starting point, so this length isn’t used effectively. Let’s totally scrap the circular chart paradigm, and draw bars proportional in length to the underlying values.

Bar chart (length beats area and angle)

Notice an additional benefit that the bars don’t have to be different colors, and the labels are right there on the axis.

Encode Using Position Along Scale: Dot Plot

Again we check Cleveland’s list. In this context position along nonaligned scales doesn’t make sense. But position along a common scale? But position along a common scale? We have the common scale from the bar chart. Let’s put a symbol at the end of each bar, then hide the bars.

Dot plot (position along a common scale trumps all)

Like the bars in the previous chart, the dots in this plot need not be different colors. In fact, color can be used here in its most effective role, identifying different sets of dots. We could overlay target and actual values with little confusion, particularly if colors were chosen to minimize the problems of color blindness. Using different shapes for the symbols is a redundancy that helps identify the groups of data.

The bar chart and the dot plot are the two most effective ways to display this data quantitatively and clearly. Not only are these charts easy to read, they are easy to create and maintain.

What if We Work Backwards?

Suppose we were unfamiliar with Cleveland’s hierarchy of graphical effectiveness. How could we lead ourselves astray starting from such effective graphs as the bar chart and dot plot? To reinforce the lessons from Cleveland and Wilkinson, let’s explore.

We could color code the bars or dots. Not a terrible idea, but also unnecessary.

Color coded bar chart

Color coded dot plot

Now we’re free to lose sight of the effectiveness of the different encoding methods. We can decide color is all we need to show the data, so we remove the variation in lengths of the bars, or the horizontal positions of the dots.  Do you notice the conversion from a quantitative display to a qualitative one?

Length independent bar chart

Position independent dot plot

Let’s label the bars directly, or put the dots in front of the labels.

Labeled colored bars

Labeled colored dots

The last bulleted list could be made easier in the worksheet (or in PowerPoint or Word!) than in an Excel chart.

Colored bulleted list

We can simplify the labeled bullets or labeled bars by simply coloring the text. Simple and totally qualitative, complete with potential red-green color vision issues, like so many color-coded but non-graphical tables of data we’ve endured.

Labeled colored dots

I’ll leave it to the reader, guided by Figure 2 from Wilkinson’s paper, to work backwards from this colored list to a useful quantitative display. Here is the data, if you’re interested:

Heat map donut chart data

Peltier Tech Chart Utility

Wow! Heat Map Donut Chart!

Heat Map Donut Charts

My colleague Debra Dalgleish steered me toward an article about a ‘Hot doughnut’ chart in Excel.

Heat map donut chart

Hmm, very interesting. Eye-catching.

You can take the above and combine it with target values in another concentric ring, add a few labels, and make it really pretty. This is from a companion article, How to create a heatmap doughnut chart.

Heat map donut charts

Despite its attractiveness, at first glance I didn’t think it was very effective. You know, donut charts being even less effective than pie charts. But I sat down and went through at least the preliminary steps of recreating the chart.

Note: I apologize for the use of jpeg images. On one of my monitors they look absolutely horrendous, with terrible artifacts everywhere, but on the new monitor they’re okay. I normally use png images for my charts, but some of the images in this article were only available as jpegs.

How to Make a Heat Map Donut Chart

Here is the data. It looks unsorted, but I’ll describe the unusual sorting order shortly.

Heat map donut chart data

Make a nice donut chart (as if there ever were such a thing).

Initial donut chart

Recolor the wedges based on value (red at the large end of the values, through orange, yellow, and green, to blue at the small end).

Recolored donut chart

Perhaps we need another legend to clarify the sequence of the color codes?

Recolored donut chart with extra legend

Remove any size data for the slices, using 1 for each data point’s value. Let’s assume we don’t need sizes, since the colors are encoding the values.

Recolored donut chart with resized wedges

In the previous recolored charts I kept a thin white border on the wedges, so adjacent wedges of the same color don’t just look like one larger wedge. In this chart, such adjacent wedges merge into a single wedge.

Recolored donut chart with resized wedges and no division between adjacent wedges

Now smudge the colors between the centers of adjacent wedges. I didn’t actually do this; below is a screen shot from the original article. The approach I’d take is to divide the wedge into a number of smaller wedges, and gradually change each mini-wedge’s color to simulate a gradient from the center of one wedge to the center of the next. Start with all blue, change to mostly blue plus a little green, then to still mostly blue plus more green, to mixed blue-green, to green with some blue, to green with just a little blue, to all green.

Heat map donut chart

This chart still needs labels for the wedges, and probably a data table so you can see the values which are obscured by the artistic effects.

The last few charts illustrate the unique sorting. This actually took me a while before I noticed it. The smallest point (blue) is at the top and the largest (red) at the bottom. Some of the points go clockwise from the smallest to the largest, and the others go counter clockwise. If you start in one place, the values go from small to large and back to small, like a sine wave. This provides two “continuous” color paths, so that smearing of colors between one wedge and another doesn’t introduce an intermediate color from the scale.

What Makes the Heat Map Donut Chart Ineffective?

Before diving into this critique, I want to point out that it is important to experiment with visual techniques. We should display our data using a variety of existing approaches to tease insights from the data. We should also apply new methods that may make it easier to find certain patterns or make the data more approachable by a wider audience.

However, we also need to review our attempts honestly, so we can concentrate on approaches that work and shelve those that do not.

There are a number of features of the heat map donut chart that make it ineffective as a data display method.

Color Gradient

First, the value of each wedge is only encoded by the color in the very center of the wedge, that is, along the spoke that would connect it to the hub of the chart. Gradients in color generally indicate variations in data, but in this case the gradients are gratuitous artwork. Worse than that, the reader may be fooled into thinking there is real data in the spaces between spokes.

Color gradient allows data encoding only along spokes

Effectiveness of Encoding Techniques

A more fundamental problem is illustrated by Figure 2 of Presentation Graphics by Leland Wilkinson. This figure shows William Cleveland’s ranking of different graphical features in terms of how effectively they are for encoding and decoding data.

Color gradient allows data encoding only along spokes

The heat map uses color to encode values. Cleveland’s hierarchy of graphical elements lists color as the least effective encoding means. Color can be effective to indicate different categories (for example, different lines in a chart), but it is not a good choice for displaying continuously variable numerical data.

Color Vision

Another reason color is a poor choice is that an estimated 8% of the male population (and only about 0.4% of the female population) find it difficult or impossible to distinguish between certain colors. A companion article, Color Vision Issues with Heat Map Donut Charts, uses these heat map donut charts to investigate how color vision deficiencies interfere with color-based data encoding.

Chart Busters: Fix the Heat Map Donut Chart

No critique of a graphical display is complete without a description of one or more improved ways to display the same data. My improvements are shown in Chart Busters: Fix the Heat Map Donut Chart.

Peltier Tech Chart Utility

Funny Conclusions from a Dual Pie Chart

Earlier this year, I saw a BUSINESS INSIDER CHART OF THE DAY story called Microsoft Is Winning More Developers To Its Mobile Platform.  They showed how the proportions of new apps built by mobile developers were distributed among the four mobile platforms for the second quarter of 2011 and of 2012. Here is a reconstruction of their chart.

Double pie chart

Yes, they used side-by-side pie charts with a distant legend. The data isn’t so complicated that this chart actually distorted anything, but it made the information hard to assimilate. They got their data from a nearly identical chart in Flurry Blog’s Microsoft May Be Closer Than It Appears in Android’s Rearview Mirror. I find this headline amusing, implying that Windows Phone was right behind Android in app development. Several other online channels picked up the same story from these two sources, and some even came to the conclusion that Windows Phone had not only Android but also iOS on the ropes.

It’s not easy to compare wedge sizes between separate pies (or even within the same pie), so I tried another approach. I built this stacked area chart. It’s easy to see that Windows Phone has gained and iOS has lost share, and that Windows Phone and Blackberry are nearly insignificant compared to Android and iOS. But we can’t really judge any change in Android without reading the data labels.

Stacked area chart

The Business Insider conclusion was that Microsoft was winning more developers. They could have shown this clearly with a single area chart, though it would have ignored the context of the rest of the industry. Of course, the data labels provide some context: even 4% is a very small market share.

Area chart, Windows only

Probably the best way to show this data is with a slopegraph: a line chart with two categories, in this case, two dates, showing the changes in the values. This chart shows that iOS has the major share, more than twice Android in second place, and that Windows

Slope graph

Here we can see a modest decrease in iOS, matched by a similar increase in Windows and a tiny increase in Android. Blackberry, thanks for playing. Since these changes are only a few percentage points, I’d like to see monthly or at least quarterly data during the year between the two endpoints, before drawing any conclusions.

Peltier Tech Chart Utility

Chart Busters: The Economist Doesn’t Read Forbes

The Economist showed changing pre-tax profits among banks from 2007 to 2011 in Bank Profits Head East. They chose to use a pair of donut charts for this. Weaknesses of this approach are the separation of the pairs of values into distinct donuts. This forces the reader to jump from side to side, and ultimately skip the charts and read the values in the labels. The combined chart has leader lines to help steer the reader’s eyes from side to side, but this adds clutter, and the labels push the donuts further apart, making visual comparisons more difficult.

The Economist's Donut Chart Showing Changing Bank Pre-Tax Profits

Who you gonna call? Chart Busters!

In Arrow Charts and Other Alternatives to Multiple Pie Charts on the Forbes magazine web site, Naomi Robbins introduced Arrow Charts as a replacement for double pie charts (and double donuts are at least as bad). I wrote a tutorial on my blog that showed How to Make Arrow Charts in Excel. The technique takes a bit of work, but once you’ve made one arrow chart, you can use it as a template for new values.

I took the example from my arrow chart tutorial and swapped in the Economist’s values:

Arrow chart showing changing bank pre-tax profit.

The first thing I learned from this arrow chart, which I missed in the double donut, is that most regions showed little change, but two regions showed major changes: Asia Pacific gained a huge percentage while Western Europe lost a similar amount. This is a great example of the effectiveness of arrow charts.

Peltier Tech Chart Utility

Peltier Tech Chart Utility

 

Create Excel dashboards quickly with Plug-N-Play reports.