Colors Aren’t Needed to Make Data Easier to Read

In The Right Colors Make Data Easier To Read (Harvard Business Review), Sharon Lin and Jeffrey Heer report that using “semantically resonant colors” make charts easier to read. They offer this before and after figure to support their claim, where default colors are shown at left, and semantically resonant colors shown at right.

See what they did there? Apples are green (except of course when they’re red), bananas are yellow (except for the green and brown ones in my kitchen), blueberries are blue, etc. Joking aside, this makes intuitive sense at first.

However, I didn’t think the changed colors made the second chart appreciably easier to read. Part of the problem is that the identification of the colored bars requires repeated eye jumps from the chart to the legend, and the seven colors in this chart use up the three to five to seven chunks of short term memory at my disposal.

I usually advise people to replace the chart’s legend with labels closer to the data points. In this case, category labels right on the axis work well. I’ve also rotated the chart to make use of easy to read horizontal labels. The numerical scale also has a greater resolution, since it uses the width of the chart rather than the shorter height. This is another advantage of a rotated chart, given a wider than tall aspect ratio.

Note that no colors are required to give the chart meaning.

Unless you need to look up a particular value by name, it is usually beneficial to sort the data by a factor other than alphabetically by name. In this case, I’ve applied a sort by decreasing value. With only seven points, looking up a particular fruit isn’t much of a burden despite not being sorted alphabetically.

Colors or shades of color can be useful to highlight particular values. If I were writing a brochure about the growing tangerine market, I could use a darker shade of my fill color to easily show where tangerines fit among their peers.

I realize the authors were using the simple bar chart to illustrate their point about semantically resonant colors. However, more basic best practices for simple bar charts tell us to ignore these colors.

If I wanted to differentiate different data sets in a line chart, of course, this use of colors would be helpful.

Note the use of best practices here. There is no legend, instead, data labels have been applied directly to the data. The labels have also been colored to match the data.

Create a Heat Map Using Excel’s Conditional Formatting

A reader of my post Excel 3D Charts: Charts with No Value asked how to arrange his data to display a variable on a grid, thinking I’d acquiesce to his desire for a 3D chart. My initial thought was “No way”, but I first asked what he was plotting. He was doing an agricultural experimental design with a 2×5 grid of plots planted with various plants, and wanted to see whether there was a positional variation to the results which would not have been seen in a standard ANOVA analysis. His output values typically ranged from 1000 to 5000 pounds per acre.

I thought this could be visualized in 2D without the usual 3D issues, using a heat map. Now I’ll construct such a heat map using Excel’s Conditional Formatting Feature.

I set up a 2 column by 5 row grid, and inserted random numbers between 1000 and 5000.

To represent a 2×5 grid of square plots, I adjusted the rows and columns to be the same size, in this case 50 pixels. This was an assumption on my part, but whatever the dimensions of the actual plots, you can fudge the row and column dimensions to represent these dimensions.

I selected this range, and clicked on Conditional Formatting on the Home tab of the Excel 2013 ribbon, and hovered over Color Scales. As far as I recall, this part of the color scales mechanism worked the same in Excel 2007 and 2010.

There are a dozen built-in color scales, and if this isn’t enough you can pick More Rules.

Below are the five built-in options I’d be most likely to use for such a heat map, highlighted in the popup menu and displayed in theworksheet. These are a red-yellow-green 3-color diverging scale, a red-white-green 3-color diverging scale, a red-white-blue 3-color diverging scale, a white-green 2-color sequential scale, and a yellow-green 2-color sequential scale.

I decided to use the red-white-green diverging scale, because I thought it would be the most color-vision-friendly of the three diverging scales.

To change to a better scale, I selected the range and selected Manage Rules at the bottom of the Conditional Formatting dropdown menu, and the Formatting Rules Manager dialog appeared.

If I’d selected a larger range with multiple sets of rules, like the range shown above with five color scales applied, this dialog would show all of the rules.

I selected the color scale I wanted to change, and clicked Edit Rule, to pop up the Edit Formatting Rule dialog. You can see that the options for this type of rule include 2- and 3-color scales, which are useful for heat maps, as well as data bars and icon sets, which are not.

You can select three colors for minimum, midpoint, and maximum, and you can choose to define these points in several ways. Here I’ve kept the default lowest value, 50th percentile (median), and highest value, but all could be defined by percentiles, values, or formulas.

To select my colors, I went to my favorite source for color schemes at ColorBrewer2.org, by Cynthia A. Brewer at Pennsylvania State University. I decided on a color-vision safe, 9-color, purple-to-green diverging scale, and this is what ColorBrewer showed me. Click on the image to visit this selection at ColorBrewer2.org.

Click on the image to visit this selection at ColorBrewer2.org.

ColorBrewer also lets you export the colors in various ways, including as an array of RGB values:

{118, 42, 131; 153, 112, 171; 194, 165, 207; 231, 212, 232; 247, 247, 247; 217, 240, 211; 166, 219, 160; 90, 174, 97; 27, 120, 55}

I used the second (purple) and eighth (green) elements of this color scheme for the min and max colors, and kept the white central value (the ColorBrewer midpoint is about 5% gray).

Below is the purple-white-green 3-color diverging scale (left), a purple-green 2-color sequential variation (center), and a white-green 2-color sequential variation (right). Note that these intermediate colors are not ColorBrewer’s, but are calculated by Excel.

The three-color purple-to-white-to-green scale seems to show the variation better than the two-color purple-to-green scheme, because the latter doesn’t show much variation within the greens or within the purples. The two-color white-to-green might be the best overall choice, especially if the heat map has to be photocopied.

Color Vision Issues with Heat Map Donut Charts

I’ve reviewed a colorful yet ineffective graphic in Wow! Heat Map Donut Chart!. One of the features that makes this graphic ineffective is its use of colors. While unfriendly to those with normal color vision, the colors used are particularly unfriendly to the 8% of men and 0.4% of women with color vision deficiencies.

Color Vision

In the normal human retina, their are four types of photoreceptors, neurons that detect light. One type, called rods, respond to low light levels, producing monochromatic vision in near darkness. In brighter light, three types of cones respond to incoming blue, red, and green light. The mechanisms of color vision are fascinating and miles beyond the scope of this article. For this discussion it is enough to know that our perception of color is a function of how these color receptors, other retina neurons, and the visual cortex of our brains process the signals from these cones.

Color blindness occurs when one of the three types of cones are partially or totally disabled. The most common color vision deficiencies are deuteranomaly and deuteranopia, respectively partial and total loss of function of the green photoreceptors. Protanomaly and protanopia are the partial and total deficiency of the red sensing neurons. Much less common than either of these conditions are tritanomaly and tritanopia, partial and total loss of activity of the blue cones.

Deficiencies in either the red or the green photoreceptors leads to difficulties distinguishing red and green. Ironically, the most common pair of colors used to indicate good and bad are red and green. Think of traffic lights with red and green lights indicating stop and go, and innumerable bad dashboards with arrays of red and green symbols. Better combinations would be read and blue, or orange and cyan (which must be familiar from the hundreds of Tableau graphics we’ve seen over the last few years).

Most mammals only have two types of photoreceptors, and have the type of color vision that protanopes and deuteranopes experience. Early during primate evolution, a mutation split one of these receptors into the green and red cones that provide richer color vision. Some reptiles, birds, and fish may have four, five, or even more types of cones, extending into the ultraviolet and infrared ranges of the spectrum we can only imagine.

Testing Images for Color Vision Issues

There are numerous programs and web sites that simulate the effects of color vision deficiencies on your images. One popular and easy-to-use site is Vischeck, which allows you to upload images, and uses various algorithms to convert them into representations of how they would look under various types of color blindness. The simulations here were generated using Vischeck.

Comparison of Heat Map Donut Chart Images

Here is the original heat map donut chart, in all of its visual glory.

This is how deuteranopia (green cone deficiency) renders the donut.

This is how protanopia (red cone deficiency) renders the donut.

These two conditions result in similar images: a blob of blue in the top right, and a smear of yellowish brown elsewhere. The red-orange-yellow-green gradients are mostly eradicated, rendering the heat map nearly useless.

Comparison of Pie Chart Images

Here is my depiction of the heat map donut data, in a properly sorted pie chart, using the default Excel 2007/2010 color palette. In full color:

Here is the pie as seen by a deuteranope (simulated).

Here is the pie as seen by a protanope (simulated).

As above, the two color vision deficiencies result in similar simulations. Without a gradient, though, the wedge colors are much more readily distinguished. Labeling the wedges themselves does help greatly, as some color pairs are very close; a legend would not be so helpful.

It turns out that the default Excel 2007/2010 color theme is relatively friendly to viewers with color vision deficiencies.

Comparison of Bar Chart and Dot Plot Images

Here is a bar chart of the same data, shown in regular color vision and simulated deuteranopia (protanopia was omitted because it is less common and the effects are similar).

Here is a dot plot of the same data, shown in regular color vision and simulated deuteranopia (protanopia was omitted because it is less common and the effects are similar).

We can see that color vision deficiencies have no effect on the readability of bar charts and dot plots, because neither rely on color to encode quantitative information.

Recommendations

Graph types such as bar charts and dot plots again prove their superiority over pie and donut charts, in this case by not relying on colors to depict numerical data.

Gradients are particularly problematic when combined with color vision issues. In general, the guidelines for gradients are that they should proceed from a dark shade of a color to a light shade of the same color, or from a dark shade of a color through a light shade (close to white) to a dark shade of a distinct second color. The gradient used in the heat map example proceeded through highly saturated chades of several colors.

Colorbrewer is an online tool designed to help select good color schemes for maps and other graphics. You can select from among different styles of color schemes, and you can choose to generate only color-vision-friendly schemes. In conjunction with a colov-vision-deficiency simulator like Vischeck, you have no excuse for producing hostile graphics like the donut heat map.

Excel Chart Color Update

In Using Colors in Excel Charts I discussed a lot of details about Excel colors, predominantly in Excel 2003, and specifically with charting in mind. I mentioned the Color Brewer, a neat little utility for selecting colors and designing color palettes. In ColorBrewer2.org Mark Harrower of Axis Maps announced Color Brewer 2, which updates the 8-year-old Color Brewer (that’s 56 in dog years, and 80 in web years, as Mark says). The new tool is available at ColorBrewer2.org.

VBA Conditional Formatting of Charts by Series Name

I’ve got a couple of tutorial pages on my web site that deal with conditionally formatted charts. The trick is to set up ranges containing formulas for each of the conditions, so that if the condition for that range is met, the range contains the value to plot, otherwise it contains #N/A, which isn’t plotted in a line or XY chart, or just a blank or zero, which is plotted as a zero-thickness bar or column in a bar or column chart. You can read about them here:

As with conditional formatting in the worksheet, the formatting you can achieve is limited only by your ability to construct formulas. Of course, in the chart, you are not limited to three conditional formats as you are in the worksheet (in Excel 2003 at least; the conditional formatting enhancements in Excel 2007 are mind-boggling).

Although I prefer to use the formulaic approach above, sometimes it makes more sense to use VBA to apply formats. Suppose you have a chart with several series, and the series may be in any order. However, you want to color a category according to its name, so that Halliburton always is colored red and Enron is always blue.

The following protocol allows you to color the series in a chart according to colors you’ve reserved for each series. The range below illustrates the data: range A1:A4 contains a list of the series names, with each cell filled with the desired color for that series’ data in a chart. A6:E10 contains data for Chart 1, and A12:E16 contains data for Chart 2. (The colors in A1:A4 were defined using the Color Brewer utility described in my earlier post, Using Colors in Excel Charts.)

The charts made from the two ranges are not remarkable.

There is a simple VBA procedure that will apply the colors from the designated range to the appropriate seriess, based on their names.

Excel 2003 and earlier:

```Sub ColorBySeriesName()
Dim rPatterns As Range
Dim iSeries As Long
Dim rSeries As Range

Set rPatterns = ActiveSheet.Range("A1:A4")
With ActiveChart
For iSeries = 1 To .SeriesCollection.Count
Set rSeries = rPatterns.Find(What:=.SeriesCollection(iSeries).Name, _
LookAt:=xlWhole)
If Not rSeries Is Nothing Then
.SeriesCollection(iSeries).Interior.ColorIndex = _
rSeries.Interior.ColorIndex
End If
Next
End With
End Sub```

Excel 2007 and later:

```Sub ColorBySeriesName()
Dim rPatterns As Range
Dim iSeries As Long
Dim rSeries As Range

Set rPatterns = ActiveSheet.Range("A1:A4")
With ActiveChart
For iSeries = 1 To .SeriesCollection.Count
Set rSeries = rPatterns.Find(What:=.SeriesCollection(iSeries).Name, _
LookAt:=xlWhole)
If Not rSeries Is Nothing Then
.SeriesCollection(iSeries).Format.Fill.ForeColor.RGB = _
rSeries.Interior.Color
End If
Next
End With
End Sub```

Select a chart, and run the procedure above (press Alt+F8 to bring up the macro dialog, select the procedure, and click Run). The bars will change color according to their series names and the colored table in A1:A4.

To format by category label, see VBA Conditional Formatting of Charts by Category Label.
To format by value, see VBA Conditional Formatting of Charts by Value.