## Charting Survey Results

Cole Nussbaumer wrote about displaying survey results Logic in Order in her Storytelling with Data blog last week. You have to love the goal of her blog, which is “to help rid the world of ineffective graphs, one exploding, 3D pie chart at a time.” She writes extensively about communicating effectively with data.

The point of her article was that you should plot your data in some kind of logical order. If the values are plotted by month, sort the months chronologically, not alphabetically. It’s a good article, with a point I know I’ve often tried to make; go read it.

## Stacked Bar Charts

To illustrate her points, she used sanitized results of a survey that asked users whether they were completely satisfied, very satisfied, somewhat satisfied, not very satisfied, or not satisfied at all with a number of features, or whether they used them at all. I’ve recreated Cole’s first chart below. .

I’m going to let Cole go off on her tangent about presenting such data, while I go off on my own.

I thought, first of all, survey responses should be symmetric, meaning equal numbers of favorable and unfavorable responses plus a neutral response. (I’ve abandoned surveys that forced me to choose sides when I really was neutral.) I’ve changed the vaguely satisfied response to “Neutral”, and shortened the others to “Love”, “Like”, “Dislike”, and “Hate”.

Second, “Did you use a feature?” is a different question from “How did you like a feature?” The first has follow-ups to find out why a feature wasn’t used (didn’t need it, couldn’t find the button, didn’t understand the instructions); the second to find out why a feature was liked or disliked (whether it worked as expected, whether the output was acceptable).

I think we should split out the “Did not use” responses, then recompute the other percentages based on the fraction of users who used each features.

The following chart sorts the features in decreasing order of “Not used”.

The stacked bar panel on the right of the chart shows results from negative on the left to positive on the right, like the number line you learned in elementary school, or maybe high school. The only thing is, the central value of zero, between the negative and the positive responses, isn’t centered on the number line.

## Diverging Stacked Bar Charts

Why not stagger the bars left or right until the neutral bar is centered on zero, as shown below:

This is called a Diverging Stacked Bar Chart, which I first read about in Plotting Likert and Other Rating Scales by Naomi Robbins and Richard Heiberger. I have a module in an upcoming version of my Chart Utility which builds these charts. I’ll also post a tutorial soon.

The following combines the diverging stacked chart with a panel showing the “Not used” data.

There doesn’t seem a strong correlation between “Not used” and the ratings, other than the two most widely used features being the most liked, so it was probably wise to separate the data in this way.

Here is the same chart, plotted by the sum of positive responses.

On the other hand, if I’m concerned with making improvements in a product, It might be more informative to sort by sum of negative responses…

… or even by the percentage of strongly negative responses (“Hate”).

The diverging stacked chart shares a drawback with most stacked charts, including the plain stacked chart at the opening of this article. Since the bars are irregularly stacked, they do not begin at a common baseline, so it is not easy to compare individual values.

## “Converging” Stacked Bar Charts

Let’s take the diverging stacked bar chart, and turn it inside out. Instead of the neutral responses straddling the vertical axis, let’s place the strongest responses on either side of the axis, then the weaker responses further out, and finally split the neutral responses on the outside of the stacks. Since it inverts the Diverging Stacked Bar Chart, I’m calling this the “Converging” Stacked Bar Chart.

When comparing these results, we want to see the strong responses (Love and Hate), then more than the weak responses, we’d most likely want to see the cumulative responses (Love + Like as well as Hate + Dislike). The chart above uses the vertical axis as a baseline for the strong responses as well as for the cumulative responses. Without much effort, we see that the chart above is sorted by the sum of positive responses (Love + Like), the dark blue and medium blue bars.

Since the neutral bars are merely padding the ends of the stacks, let’s just leave them off.

The following chart combines the “Not used” responses with the feature ratings, sorted by increasing usage.

Here the chart is sorted by total positive responses (dark plus light blue).

Now we’ve sorted by total negative responses (red plus orange bars) to help focus on the features that people don’t like so much.

Finally, the chart is sorted by strong negative responses, which is substantially the same as above.

These “converging” charts make it easier to rank the responses by eye. However, if we want to directly compare positive and negative responses, we shouldn’t draw the bars in opposite directions. I’ve written about this weakness of tornado diagrams, and suggested ways to improve such charts.

## Clustered-Stacked Bar Charts

Now let’s start with the “converging” chart above, and fold it on its vertical axis so the positive and negative bars both extend to the right. Of course we offset the bars so they are stacked next to each other without one set obscuring the other. I wrote a tutorial about these Clustered and Stacked Column and Bar Charts, and I already have a Chart Utility that cranks them out.

The chart above still has the neutral responses padding both positive and negative stacks. If we leave off the neutral responses, we can reduce the area needed for the chart while reducing clutter. Obviously the chart is sorted by total positive responses. This data set was easy: the positives far outweigh the negatives; if the responses were closer, the side-by-side stacking here would make it easier to tell whether more people liked or hated certain features.

Here I’ve sorted by total negative responses, and to make the negative values stand out, I’ve used a lighter tint for the fill of the positive bars.

Here I’ve sorted by strong negative responses, and lightened the weak negative responses.

For completeness, here I’ve combined the “Not used” panel with the clustered-stacked bars, sorted by feature usage.

The same sorted by total positive responses…

… by total negative responses…

… and by strong negative responses.

## Dot Plots

There is a problem with legibility in clustered-stacked charts when there are a lot of categories and the bars become thin. If you look back at the first couple of clustered-stacked charts in the preceding section, you may feel that the charts are dense, heavy, dark. This is simply because there are many dark bars in close proximity, and such saturation of ink is overwhelming. Not only does the chart feel heavy, it is difficult to make out all of the series of data, unless some of the colors are lightened to emphasize certain other colors (as in the later charts).

Dot plots solve this issue by only placing a marker at the data point itself, rather than a bar spanning the entire distance from the baseline to the data point. This leaves plenty of refreshing white space, while making all of the data clearly visible, particularly the smaller values. You can read about Dot Plots in Dot Plots: A Useful Alternative to Bar Charts, by Naomi Robbins.

The dot plot below shows Strong Negative (e.g., Hate), Total Negative (Hate + Dislike), Strong Positive (Love), and Total Positive (Love + Like). We really don’t need to see the weak positives and negatives, since the totals are more interesting.

These dot plots are sorted by Total Negative (left) and Strong Negative (right).

Dot plots can be combined in a panel chart with the “Not Used” bar chart seen in the other panel charts. This chart is sorted by “Not Used”.

This dot plot panel chart is sorted by Total Positive. It’s much easier in this chart than in the clustered-stacked chart to see how the negatives are increasing while the positives decrease.

For consistency and because I’m paid by the column-inch, here is a dot plot panel chart sorted by Total Negative…

… and here’s one sorted by Strong Negative.

## Summary

There are numerous ways to evaluate survey responses. I took a sample survey and separated out two types of responses: whether users used features, and how much users liked the features they did use. Since these responses asked different questions and lead to different follow-up questions, it is beneficial to consider them separately. In many cases I plotted the “Not used” responses in a separate panel from the user ratings of the features.

Simple stacked bar charts allow us to show survey responses ranked from worst on the left to best on the right.

Diverging stacked bar charts keep the stacking order the same, and align the bars so that positive responses all lie to the right of a vertical axis, and negative responses lie to the left.

“Converging” stacked bar charts keep the alignment of the diverging bars (negative to the left and positive to the right of a reference line), but locate the strongest responses next to the reference line so that both strong responses and total cumulative responses use the line as a baseline.

Clustered-stacked bar charts stack up positive and negative responses to the right of a common reference line, so that positive and negative responses can be directly compared.

Dot plots show the data in the same way as clustered-stacked bar charts, except there are only markers and not entire bars in the chart. This makes the chart more open and allows easier reading of smaller values.

Panel charts can be used to show different types of data, such as the “Features Not Used” and “How Users Liked Features They Used” panels in these charts.

As always, your choice of chart type depends on your immediate requirements. For example, if I wanted to use these results to drive development of a software package, I might simplify the plots, and only plot “Not used” and either strong negative or total negative. This would show the development team which features need better visibility, so more people would use them, and which features need improvement, so fewer people hate them.

## Clustered and Stacked Column and Bar Charts

Excel has built-in chart types for clustered columns and bars, and for stacked columns and bars. One of the commonest charting questions in online Excel forums is, “How do I make a chart that is both clustered and stacked?”

This article demonstrates a protocol that works in both modern versions of Excel, that is, Excel 2003 and earlier and Excel 2007 and later. The technique is a bit convoluted, and it requires an expanded data layout to get the appropriate appearance. And there’s an additional degree of complexity to get the category labels to line up neatly under or beside the clusters.

For those who need to produce many of these charts, and who don’t have 15 minutes to spend on each one, I have created the Peltier Tech Cluster Stack Chart Utility, a commercial Excel add-in that does the heavy lifting at the click of a button.

## Built-In Column and Bar Charts

Let’s start with this simple data set, which compares budget and actual values for three commodities for two quarters of the year. We want to have clusters for each commodity, with stacked actual values next to stacked budget values within each cluster.

Without any effort or thought we can easily create clustered column or bar charts from this data.

Stacked column and bar charts are just as easy.

It’s not obvious how to combine the chart types. The protocol involves inserting blank rows and cells into the data range of a stacked column or bar chart, and values only appear in some of the places in the chart. The proper arrangement will cluster stacks of values with stacks of zeros separating the clusters.

## Starting the Chart

I’ll leave the original data alone (always a good practice) and create a staging data region which is linked to the original data. The easiest way to do this is to copy the original data, then use Paste Special Link to start building the staging area. We’ll make our chart first, then explore how modifying the data layout changes the chart. In practice, we’ll modify the data first and then make the chart, knowing the effects of data layout on chart appearance.

The first step is to make a stacked column or bar chart from the data in B6:E9.  There are no categories selected (i.e., the commodities are not part of the initial chart), so Excel just uses the counting numbers 1, 2, 3.

Since categories always start from the origin, the bar chart’s category labels go from the bottom up, instead of top down as in the sheet. So the vertical axis has to be formatted to make the categories go in reverse order. Also the value (horizontal) axis has to cross at the maximum category, which is at the bottom now, since the order of categories was reversed.

So that’s only stacked. Let’s adjust the data by inserting some rows.

The stacks of columns/bars are now spread out. Not yet what we want.

But lets stagger the budget data by a row, to move the budget data points off the actual data and onto blank slots in the chart.

Unfortunately the staggering doesn’t happen automatically, so we have to go back and tell Excel what data range to use for the chart. Right click on the chart, then select Select Data or Source Data (the command is version-specific). Click in the Chart Data Range box, and select this whole data range.

One more adjustment to the data. Let’s insert a row at the beginning and end so there’s a space outside of the first and last cluster.

Again, we have to explicitly tell the chart about the updated data range. This is almost what we want.

Reduce the gap between columns/bars to give the chart a clustered appearance: select one series of columns, press Ctrl+1 (numeral one) to open the formatting dialog, and in the first screen you see (“Series Options”) change the entry for Gap Width to zero. Color code the data series to make it clearer which data series are associated.

In practice, it is not necessary to create a chart using the compact data and adjust it after every modification to the data. The correct protocol is to adjust the data, and then make the chart shown here, and proceed with adding labels, below.

Almost done. We need to add the category (cluster) labels. We’ll do this by adding a “dummy” series to the secondary axis, and the secondary axis will have the category labels we want. Add a column to the original data range for the dummy axis series (column F in our example).

Select this added data (F1:F4), and hold Ctrl while selecting the column with our labels (A1:A4), so that both areas are highlighted. Make sure you include the blank top cell in the first column. Copy the range, select the chart, and use paste special (Home tab of the ribbon > Paste dropdown > Paste Special) to add this data to the chart as a new series, in columns, with series name in the first row and category labels in the first column. In other words, use these settings:

In Excel 2003 and earlier, the original labels (1, 2, 3, etc.) remain along the axis, but in 2007, the new labels take their place, even if we hadn’t checked “Replace Existing Categories”.

Since zero value bars have zero height or width, they don’t appear in the chart. Just to show where this new series is added, I’ve temporarily replaced the zeroes in column F with values of 500. The series spans only the first three categories.

If you’re making a stacked-clustered column chart, convert this new series to a line chart type. Sometimes Excel 2007 doesn’t expand the legend enough to show the legend entry for Axis, so I’ve stretched it in this chart.

## Manipulating the Axes

Now format the Axis series to place it onto the secondary axis. To do this, select the Axis series. If you can’t see the Axis series, click the dropdown in the top left of the Chart Tools > Layout or Format tab, and choose the Axis series. Then press Ctrl+1 (numeral one) to open the Format Series dialog. On the first tab, choose Secondary Axis.

Now add the secondary category axis, which is secondary horizontal in 2007 column charts and secondary vertical in 2007 bar charts. This command in on the Chart Tools > Layout tab.

We need to reverse the bar chart’s secondary vertical axis (like we did the primary when we first made the bar chart), and at the same time, make the horizontal axis cross in the automatic position, which generally means at zero or at the minimum.

Now we have to switch the position of the category labels. In the column chart, format the left hand vertical axis so the horizontal axis cross at the maximum, and format the right hand vertical axis so the horizontal axis crosses at the automatic (minimum) position.

In the bar chart, format the bottom horizontal axis so the vertical axis cross at the maximum, and format the top horizontal axis so the vertical axis crosses at the automatic (minimum) position.

### Important – Axis Label Alignment

We want only half a slot on the outside of the first and last clusters, not the full slot shown above, to center each cluster on the commodity category labels. Format the top horizontal axis of the column chart, or the right vertical axis of the bar chart, so the crossing axis is positioned on tick marks. The bottom horizontal axis or the left vertical axis of the chart, which contains the labels we just worked so hard to add, should have the crossing axis positioned between tick marks.

Select the secondary value axis, which is scaled from 0 to 1 (the right vertical axis of the column chart, or the top horizontal axis of the bar chart), and delete it.

Format the primary category axis, which is scaled from 1 to 10 (the top horizontal axis of the column chart, or the right vertical axis of the bar chart), and format it so it has no tick marks or tick labels, and no line type.

Finally, select the Axis legend entry. In Excel 2003 be sure to select the text label, not the legend key (the marker and line). Press Delete. In the column chart, format the Axis series to be invisible (no marker, no line).

That wasn’t so hard, was it? Though it did take a very long time.

## Adding a Line to a Clustered-Stacked Column Chart

It’s relatively easy to overlay a line chart series onto the clustered-stacked column chart. Instead of the column of zeros we used to generate our commodity axis labels, put in the values you want to plot, and add a meaningful column header.

When you go through the process above to add your labels and manipulate the axes, you will end up with data points where you want them. Just don’t bother hiding the series at the end of the process. If you want to show the line on a secondary axis, despite my warnings to the contrary, don’t delete the axis, simply scale it appropriately to your data.

Adding a line to a cluster-stack bar chart is much more complicated, so I will not cover it here.

## Clustered-Stacked Charts in Peltier Tech Charts for Excel 3.0

This tutorial shows how to create Clustered-Stacked Charts, including the specialized data layout needed, and the detailed combination of chart series and chart types required. This manual process takes time, is prone to error, and becomes tedious.

I have created Peltier Tech Charts for Excel 3.0 to create Clustered-Stacked Charts (and many other custom charts) automatically from raw data. This utility, a standard Excel add-in, lays out data in the required layout, then constructs a chart with the right combination of chart types. Charts can be made using data in a wide variety of arrangements, in either vertical or horizontal orientation.

This is a commercial product, tested on thousands of machines in a wide variety of configurations, Windows and Mac, which saves time and aggravation.