# Interactive Parallel Coordinates Chart.

In a recent post in the Junk Charts blog, Kaiser cited a New York Times chart comparing the ranks of several NFL quarterbacks in a handful of offensive categories. Kaiser adjusted the NYT rank chart to get a more quantitative comparison of the athletes. I took the same data and made a parallel coordinates chart.

A parallel coordinates chart, or profile plot, is a useful way to compare several sets of observations (here, one set per quarterback) of a combination of different factors (offensive categories). The entire population, or at least a broad set of data, is generally shown in a subdued color (the gray lines in the chart) while one or more individuals are highlighted for inspection (the blue and red lines).

Stephen Few discusses parallel coordinates charts in Multivariate Analysis Using Parallel Coordinates (html) and Multivariate Analysis Using Parallel Coordinates (pdf).

###### The Data

The data is a simple set of raw statistics, shown in the following table. I used negative numbers for interceptions, so that more interceptions resulted in a lower number. Note that rows 1-3 and columns B-F are not shown (yet).

 A G H I J K L 4 pcyt ypa td int int/td td/int 5 Roethlisberger PIT 65.3 7.81 32 -11 -0.34375 2.9 6 Garrard JAC 64 7.72 18 -3 -0.16667 6 7 Garcia TAM 63.9 7.46 13 -4 -0.30769 3.3 8 Hasselbeck SEA 62.6 7.06 28 -12 -0.42857 2.3 9 Rivers SDG 60.2 6.85 21 -15 -0.71429 1.4 10 Manning NYG 56.1 6.31 23 -20 -0.86957 1.2 11 Young TEN 62.3 6.67 9 -17 -1.88889 0.5 12 Collins WAS 63.8 8.46 5 0 0 #DIV/0!

To normalize the data, I first determined the minimum and maximum of each statistic in the first two rows. The formula in G1 is =MAX(G5:G12), the formula in G2 is =MIN(G5:G12), and these formulas were filled across to column K. I ignored the TD/INT statistic in column L.

 A G H I J K L 1 65.3 8.46 32 0 0 2 56.1 6.31 5 -20 -1.88889 3 4 pcyt ypa td int int/td td/int 5 Roethlisberger PIT 65.3 7.81 32 -11 -0.34375 2.9 6 Garrard JAC 64 7.72 18 -3 -0.16667 6 7 Garcia TAM 63.9 7.46 13 -4 -0.30769 3.3 8 Hasselbeck SEA 62.6 7.06 28 -12 -0.42857 2.3 9 Rivers SDG 60.2 6.85 21 -15 -0.71429 1.4 10 Manning NYG 56.1 6.31 23 -20 -0.86957 1.2 11 Young TEN 62.3 6.67 9 -17 -1.88889 0.5 12 Collins WAS 63.8 8.46 5 0 0 #DIV/0!

I normalized the data in columns G-K in columns B-F. The formula in cell B5 is =(G5-G\$2)/(G\$1-G\$2), and this formula was filled across to column F and down to row 12. This formula returns a value between 0 for the lowest to 1 for the highest in that category.

 A B C D E F G H I J K L 1 65.3 8.46 32 0 0 2 56.1 6.31 5 -20 -1.88889 3 4 pcyt ypa td int int/td pcyt ypa td int int/td td/int 5 Roethlisberger PIT 1 0.697674 1 0.45 0.818015 65.3 7.81 32 -11 -0.34375 2.9 6 Garrard JAC 0.858696 0.655814 0.481481 0.85 0.911765 64 7.72 18 -3 -0.16667 6 7 Garcia TAM 0.847826 0.534884 0.296296 0.8 0.837104 63.9 7.46 13 -4 -0.30769 3.3 8 Hasselbeck SEA 0.706522 0.348837 0.851852 0.4 0.773109 62.6 7.06 28 -12 -0.42857 2.3 9 Rivers SDG 0.445652 0.251163 0.592593 0.25 0.621849 60.2 6.85 21 -15 -0.71429 1.4 10 Manning NYG 0 0 0.666667 0 0.539642 56.1 6.31 23 -20 -0.86957 1.2 11 Young TEN 0.673913 0.167442 0.148148 0.15 0 62.3 6.67 9 -17 -1.88889 0.5 12 Collins WAS 0.836957 1 0 1 1 63.8 8.46 5 0 0 #DIV/0!

I put the values 1 and 2 into cells L1 and L2, and in cell A1 I entered this formula: =INDEX(A\$5:A\$12,\$L1). I filled this formula across to column F and down to row 2. Changing the values in L1 and L2 will display different sets of stats in rows 1 and 2.

 A B C D E F G H I J K L 1 Roethlisberger PIT 1 0.697674 1 0.45 0.818015 65.3 8.46 32 0 0 1 2 Garrard JAC 0.858696 0.655814 0.481481 0.85 0.911765 56.1 6.31 5 -20 -1.88889 2 3 4 pcyt ypa td int int/td pcyt ypa td int int/td td/int 5 Roethlisberger PIT 1 0.697674 1 0.45 0.818015 65.3 7.81 32 -11 -0.34375 2.9 6 Garrard JAC 0.858696 0.655814 0.481481 0.85 0.911765 64 7.72 18 -3 -0.16667 6 7 Garcia TAM 0.847826 0.534884 0.296296 0.8 0.837104 63.9 7.46 13 -4 -0.30769 3.3 8 Hasselbeck SEA 0.706522 0.348837 0.851852 0.4 0.773109 62.6 7.06 28 -12 -0.42857 2.3 9 Rivers SDG 0.445652 0.251163 0.592593 0.25 0.621849 60.2 6.85 21 -15 -0.71429 1.4 10 Manning NYG 0 0 0.666667 0 0.539642 56.1 6.31 23 -20 -0.86957 1.2 11 Young TEN 0.673913 0.167442 0.148148 0.15 0 62.3 6.67 9 -17 -1.88889 0.5 12 Collins WAS 0.836957 1 0 1 1 63.8 8.46 5 0 0 #DIV/0!
 Finally I added two Combo Box controls from Excel's Forms toolbar. I formatted them so both used the range A5:A12 as their input ranges, while one used cell L1 and the other used cell L2 as their cell links. In this way, the two combo boxes control the two sets of data in rows 1 and 2.
###### The Chart

The chart is a simple line chart. I selected the range A4:F12, ran the chart wizard, chose Line Chart in step 1, and in step 2 I made sure the series were plotted by row. Below is the chart in all of Excel's default glory.

I made a number of formatting changes. I removed the border and fill of the plot area, I removed the horizontal gridlines, I removed the legend (prematurely), I added vertical gridlines, I set the value (Y) axis maximum to 1, and I set the category (X) axis so the Y axis crossed at the first category, not between categories. I also formatted all series to display with a light gray line and no markers.

To add the highlighted data to the chart, I copied A1:F2, selected the chart, and used Paste Special to add these rows as new series. I formatted one as a thicker blue line and the other as a thicker red line. I left the original light gray lines for these two highlighted series in the chart. I added back the legend so I could display the names of the highlighted athletes.

I deleted all legend entries except for the last two I added. This is done by selecting the legend, then selecting the text part of the legend entry (not the gray line), and pressing Delete. I moved a few chart elements around, and the interactive chart was finished.

One's first thought using Microsoft Excel might be to use a radar chart to display this multivariate data. Each factor is scaled on the spoke of the radar chart. The radar chart below shows why this is a poor option. It is hard to distinguish the circumferential gridlines from the data, but omitting the gridlines removes any value scale at all. Any values of zero are obscured by the jumble at the center of the chart. Besides the overall clutter, it's not clear that the red line goes from about 0.8 at the top (pcyt), to 0.65 (ypa) and 0.5 (td) to the right of the chart, then to above 0.8 to the left (int and int/td). The parallel coordinates chart, on the other hand, allows clear comparisons of the observations along the parallel vertical lines above each category.

A zip file containing my data and chart are found at QuarterbackStats.zip.