Nathan Yau of FlowingData asks Can You Improve this Mediocre Statistical Graphic?
So what’s wrong with this chart?
1. Image Clarity. Nathan’s screen shot of the original chart was pretty fuzzy, so I went to the source and captured it again, and that’s what I show above. I don’t thing a sharper image of this chart is what Nathan was thinking.
2. Chart type? The data points are measured discretely every four years, so a line chart is less imperative than for most time series. If a line chart is used, it should have markers for the points to emphasize their discrete nature, and the lines should be straight, not smoothed. Otherwise the chart implies that the data series are continuous.
3. Horizontal Axis. The 5-year tick spacing along the horizontal axis is confusing and misleading. It must be redrawn to accurately label the years where the measurements were made.
4. Color Scheme. In recent campaigns, the colors red and blue have become synonymous with the Democrat and Republican parties, so the use of green shades in this chart misses an opportunity to add understanding.
5. Labeling. The chart title and axis titles were misleading or nonexistent. The series were unlabeled.
I’ve produced the following two charts, the first a clustered column chart, the second a line chart. These may not be perfect, but they are a substantial improvement over the chart Nathan wants to fix.
What else is wrong with the chart? National elections are not decided on a county-by-county basis. The statewide percentages of registered Democratic to Republican voters has not changed as much as the county weightings have changed, and the parties have not switched places in the ranks. Both parties have declined slowly, with Democrats always being around 8-10 percentage points higher. “Other” has remained essentially unchanged, and interestingly enough “No Answer” has doubled during the time period shown.
What is wrong with the voters? A closer look at the numbers shows that in the majority of the years surveyed, more eligible voters were unregistered than were registered with any one party. The first chart shows actual numbers of voters, and we see that the raw numbers of voters who admit to being registered as either Republican or Democrat have held roughly steady over the five primary seasons. Those unregistered and those declining to answer have steadily increased. There must have been a voter registration effort before the 1996 primary season, when nearly 2.5 million people left the ranks of unregistered voters. (Part of the jump may also be due to a revision in the census of total eligible voters.)
Jorge Camoes has proposed a sparkline chart to show the county majority party data. Jorge’s chart shows the difference between the percentage of Democrat-majority counties and 50%. I find his chart potentially confusing because (a) there’s no indication of vertical scale, (b) the difference from 50% requires extra mental processing, (c) mixing of colors for a single series makes in unclear, particularly when one of the colors (red) is often used to denote the party which is not the one shown in the chart.
I made a simple stacked column sparkline, which works better for me. You get a sense of scale, at least, that the full height of the graphic is the full number of counties. The colors are also the conventional Democrat blue and Republican red. I would not want to stack more than two series, especially in a sparkline. Also, thee problem with a sparkline is that a more detailed analysis (as in my updates to this post) requires more series to explain.