## Create a Heat Map Using Excel’s Conditional Formatting

A reader of my post Excel 3D Charts: Charts with No Value asked how to arrange his data to display a variable on a grid, thinking I’d acquiesce to his desire for a 3D chart. My initial thought was “No way”, but I first asked what he was plotting. He was doing an agricultural experimental design with a 2×5 grid of plots planted with various plants, and wanted to see whether there was a positional variation to the results which would not have been seen in a standard ANOVA analysis. His output values typically ranged from 1000 to 5000 pounds per acre.

I thought this could be visualized in 2D without the usual 3D issues, using a heat map. Now I’ll construct such a heat map using Excel’s Conditional Formatting Feature.

I set up a 2 column by 5 row grid, and inserted random numbers between 1000 and 5000.

To represent a 2×5 grid of square plots, I adjusted the rows and columns to be the same size, in this case 50 pixels. This was an assumption on my part, but whatever the dimensions of the actual plots, you can fudge the row and column dimensions to represent these dimensions.

I selected this range, and clicked on Conditional Formatting on the Home tab of the Excel 2013 ribbon, and hovered over Color Scales. As far as I recall, this part of the color scales mechanism worked the same in Excel 2007 and 2010.

There are a dozen built-in color scales, and if this isn’t enough you can pick More Rules.

Below are the five built-in options I’d be most likely to use for such a heat map, highlighted in the popup menu and displayed in theworksheet. These are a red-yellow-green 3-color diverging scale, a red-white-green 3-color diverging scale, a red-white-blue 3-color diverging scale, a white-green 2-color sequential scale, and a yellow-green 2-color sequential scale.

I decided to use the red-white-green diverging scale, because I thought it would be the most color-vision-friendly of the three diverging scales.

To change to a better scale, I selected the range and selected Manage Rules at the bottom of the Conditional Formatting dropdown menu, and the Formatting Rules Manager dialog appeared.

If I’d selected a larger range with multiple sets of rules, like the range shown above with five color scales applied, this dialog would show all of the rules.

I selected the color scale I wanted to change, and clicked Edit Rule, to pop up the Edit Formatting Rule dialog. You can see that the options for this type of rule include 2- and 3-color scales, which are useful for heat maps, as well as data bars and icon sets, which are not.

You can select three colors for minimum, midpoint, and maximum, and you can choose to define these points in several ways. Here I’ve kept the default lowest value, 50th percentile (median), and highest value, but all could be defined by percentiles, values, or formulas.

To select my colors, I went to my favorite source for color schemes at ColorBrewer2.org, by Cynthia A. Brewer at Pennsylvania State University. I decided on a color-vision safe, 9-color, purple-to-green diverging scale, and this is what ColorBrewer showed me. Click on the image to visit this selection at ColorBrewer2.org.

Click on the image to visit this selection at ColorBrewer2.org.

ColorBrewer also lets you export the colors in various ways, including as an array of RGB values:

{118, 42, 131; 153, 112, 171; 194, 165, 207; 231, 212, 232; 247, 247, 247; 217, 240, 211; 166, 219, 160; 90, 174, 97; 27, 120, 55}

I used the second (purple) and eighth (green) elements of this color scheme for the min and max colors, and kept the white central value (the ColorBrewer midpoint is about 5% gray).

Below is the purple-white-green 3-color diverging scale (left), a purple-green 2-color sequential variation (center), and a white-green 2-color sequential variation (right). Note that these intermediate colors are not ColorBrewer’s, but are calculated by Excel.

The three-color purple-to-white-to-green scale seems to show the variation better than the two-color purple-to-green scheme, because the latter doesn’t show much variation within the greens or within the purples. The two-color white-to-green might be the best overall choice, especially if the heat map has to be photocopied.

## Heat Map Donut Charts

My colleague Debra Dalgleish steered me toward an article about a ‘Hot doughnut’ chart in Excel.

Hmm, very interesting. Eye-catching.

You can take the above and combine it with target values in another concentric ring, add a few labels, and make it really pretty. This is from a companion article, How to create a heatmap doughnut chart.

Despite its attractiveness, at first glance I didn’t think it was very effective. You know, donut charts being even less effective than pie charts. But I sat down and went through at least the preliminary steps of recreating the chart.

Note: I apologize for the use of jpeg images. On one of my monitors they look absolutely horrendous, with terrible artifacts everywhere, but on the new monitor they’re okay. I normally use png images for my charts, but some of the images in this article were only available as jpegs.

## How to Make a Heat Map Donut Chart

Here is the data. It looks unsorted, but I’ll describe the unusual sorting order shortly.

Make a nice donut chart (as if there ever were such a thing).

Recolor the wedges based on value (red at the large end of the values, through orange, yellow, and green, to blue at the small end).

Perhaps we need another legend to clarify the sequence of the color codes?

Remove any size data for the slices, using 1 for each data point’s value. Let’s assume we don’t need sizes, since the colors are encoding the values.

In the previous recolored charts I kept a thin white border on the wedges, so adjacent wedges of the same color don’t just look like one larger wedge. In this chart, such adjacent wedges merge into a single wedge.

Now smudge the colors between the centers of adjacent wedges. I didn’t actually do this; below is a screen shot from the original article. The approach I’d take is to divide the wedge into a number of smaller wedges, and gradually change each mini-wedge’s color to simulate a gradient from the center of one wedge to the center of the next. Start with all blue, change to mostly blue plus a little green, then to still mostly blue plus more green, to mixed blue-green, to green with some blue, to green with just a little blue, to all green.

This chart still needs labels for the wedges, and probably a data table so you can see the values which are obscured by the artistic effects.

The last few charts illustrate the unique sorting. This actually took me a while before I noticed it. The smallest point (blue) is at the top and the largest (red) at the bottom. Some of the points go clockwise from the smallest to the largest, and the others go counter clockwise. If you start in one place, the values go from small to large and back to small, like a sine wave. This provides two “continuous” color paths, so that smearing of colors between one wedge and another doesn’t introduce an intermediate color from the scale.

## What Makes the Heat Map Donut Chart Ineffective?

Before diving into this critique, I want to point out that it is important to experiment with visual techniques. We should display our data using a variety of existing approaches to tease insights from the data. We should also apply new methods that may make it easier to find certain patterns or make the data more approachable by a wider audience.

However, we also need to review our attempts honestly, so we can concentrate on approaches that work and shelve those that do not.

There are a number of features of the heat map donut chart that make it ineffective as a data display method.

First, the value of each wedge is only encoded by the color in the very center of the wedge, that is, along the spoke that would connect it to the hub of the chart. Gradients in color generally indicate variations in data, but in this case the gradients are gratuitous artwork. Worse than that, the reader may be fooled into thinking there is real data in the spaces between spokes.

### Effectiveness of Encoding Techniques

A more fundamental problem is illustrated by Figure 2 of Presentation Graphics by Leland Wilkinson. This figure shows William Cleveland’s ranking of different graphical features in terms of how effectively they are for encoding and decoding data.

The heat map uses color to encode values. Cleveland’s hierarchy of graphical elements lists color as the least effective encoding means. Color can be effective to indicate different categories (for example, different lines in a chart), but it is not a good choice for displaying continuously variable numerical data.

### Color Vision

Another reason color is a poor choice is that an estimated 8% of the male population (and only about 0.4% of the female population) find it difficult or impossible to distinguish between certain colors. A companion article, Color Vision Issues with Heat Map Donut Charts, uses these heat map donut charts to investigate how color vision deficiencies interfere with color-based data encoding.

## Chart Busters: Fix the Heat Map Donut Chart

No critique of a graphical display is complete without a description of one or more improved ways to display the same data. My improvements are shown in Chart Busters: Fix the Heat Map Donut Chart.

## Crosstab Heat Map

Heat maps are a popular way to display varying values in a two-dimensional display. The heat map may be an actual geographic map with regions colored differently according to some variable, for example, population density or electoral results. I showed such maps, also called cartograms or choropleths, in Redrawn Electoral Maps and An Undistorted Election Results Map.

Heat maps are not just used for alternate representations of geographical maps. They can be used to map out sectors of a hard drive, or performance of various companies or industries in the stock market. In this post I’ll show a simple heat map showing relative values within a 2D worksheet grid, a crosstab of days of the week and times of day.

In Columbus Web Analytics Wednesday — Feedback Analysis, Tim Wilson of the Gilligan on Data blog surveyed attendees at a Web Analytics meeting to find out what times are good for holding future meetings. It was a simple enough survey, and there were 21 respondents. Tim broke out the results in a bubble-style chart, which I reproduce below. Obviously the respondents were not limited to one choice.

From Tim’s chart, it is evident that Happy Hour and Dinner time slots work best, and Wednesday is the best night of the week, with Monday, Tuesday, and Thursday all at nearly the same distant second place.

A well known problem with bubble charts is that we don’t perceive areas proportionally to the values they encode. Tim even said:

Compare the size of a “1″ to the size of the “14″ — does it look to you like the larger circle is fourteen times as big as the smaller one? It doesn’t to me.

It doesn’t to me, either. We seem to underestimate the differences when areas are used to encode values. It’s the opposite when diameters are used. I think we overestimate the differences to an even greater degree.

I didn’t even label the smallest data points, because the labels would have obliterated the circles.

The purpose of this chart is to get a quick idea of the survey responses, so the bubble approach isn’t really inappropriate (even the bubble chart based on diameter). Its point is readily understood, and since there is a clear winner, that winner stands out. But there are alternatives, and I decided to see how a heat map would work.

## Heat Maps

Heat maps are pretty easy to generate in Excel 2007 using a worksheet range and conditional formatting. Start with a simple worksheet range.

The Conditional Formatting command is accessed from the Home tab on the Excel 2007 ribbon. Hovering the cursor over the button shows the fancy Super Tooltip below the ribbon.

Clicking on the Conditional Formatting button shows a series of built-in formatting styles plus some style management commands at the bottom. Color Scales has the built-in color gradients. For this data, I selected the first one, which goes from red at the low end, through yellow in the midrange, to green at the high end.

The result is much as expected, but I thought the red and yellow were both a bit overstated. In fact, it would seem better to use a single color that ranges from a low saturation at the low value to a high saturation at the high value. A one-color display is simpler to understand, and does not discriminate against those with color-deficient vision.

I went to Manage Rules on the Conditional Formatting dropdown, which opened the Rules manager.

I selected the Graded Color Scale rule, which I just applied, and clicked Edit Rule.

I switched to a two-color scale, kept the green shade at the high end, but selected a very lightly saturated shade of the same hue for the minimum. (I could have chosen white, but the faintest shade of green makes it obvious that the rule is being applied to the entire range.

The result isn’t too bad. Like the bubble chart, it’s reasonably self-explanatory. With little conscious thought, it is apparent light for low values and darker for higher values. The winning value stands out clearly. Another nice feature of this graphic is that is is very compact.