Intelligent Excel 2013 XY Charts

While writing last week’s introductory article about Excel 2013 charting, My First Excel 2013 Chart, I discovered a very cool bit of intelligence built into the function that inserts XY charts.

In prior versions of Excel, if you selected a rectangular range with N columns and inserted an XY chart with data in columns, the chart would use the first column for the X values, and the other columns for Y values for N-1 series. If your series didn’t all use the same X values, you had to define each series separately.

While playing with the Insert Chart dialogs, I discovered that Excel 2013 will offer several additional options if the data is arranged a certain way.

I’ll illustrate this coolness with the following data. There is an 8×8 grid of numerical data, with text labels (Greek letter names) in the row above the grid, and text labels (Latin letters) in the column to the left of the grid. Thedata is better suited for a line or bar chart because of the non-numerical categories in the first row and column.

Dumb data to show smart behavior

On the Insert tab, click on the XY Chart icon and you’ll see this dropdown.

XY Chart dropdown on Excel 2013 Insert ribbon tab

Click on “More Scatter Charts” at the bottom, and you’ll see this dialog.

Excel 2013 XY Charts dialog

There are thumbnails for three options, which will be discussed shortly.

If you mouse over any of the options, you’ll see a preview of how that chart would look. Double clicking on the thumbnail or preview inserts the chart into the worksheet.

Excel 2013 XY Charts dialog with mouseover view

The first XY chart option is what you would get in prior versions of Excel. The first column is used for the X values, and the other N-1 columns are the Y values. In this case, the first column contains text labels, which an XY chart cannot accommodate, so it uses the counting numbers 1, 2, 3, etc. instead.

Excel 2013 Default XY Chart

The second option available in this case shows one of the intelligent behaviors. Excel realizes that the first column doesn’t contain numerical values, so it ignores that column, and makes a chart with the number in the rest of the data range. The first numerical column is used for X, and the rest for Y.

Excel 2013 Smart XY Chart

I tested this with one, two, and three leading columns of non-numerical data, and Excel ignored them all. Presumably it will ignore any arbitrary number of leading columns of text.

That’s pretty smart. But the next trick, as we say here in Baaston, is “wicked smaat”.

Many times I’ve been asked how to select a range with alternating X and Y values and create an XY chart with one series for each pair of columns. In Quick Chart VBA Examples I presented code that parses such a range and spits out the desired chart. In that tutorial I showed how to process several different arrangements of X and Y, including alternating X and Y columns.

If your usable numerical data is contained in an even number of columns (either standalone, or with some data ignored as above), Excel will give you the option to treat the data as alternating columns of X and Y data. The result is this Wicked Smaat XY Chaat:

Excel 2013 Wicked Smart XY Chart

When the chart is selected, there is no highlighted data range, because the chart’s data is “too complicated” to display. This is because the series don’t share their X values.

If you select the first series, however, you can see that it uses the first column of the range for X and the second for Y.

Series 1 highlighted data

And if you select the second series, you can see that it uses the third column of the range for X and the fourth for Y. And so on, for the rest of the series i the chart.

Series 2 highlighted data

I discovered this intelligent behavior by accident, using some arbitrary data from a totally different example. But knowing about this intelligence, I can design a data range to take advantage of it.

Here is an even number of columns of numerical data, alternating X and Y values.

Data intentionally laid out for Excel 2013 Smart XY Chart

Here’s my wicked smart XY chart created using this data. The default formatting has markers but no lines.

Excel 2013 Smart XY Chart with markers only

You can use the Change Chart Type feature to change from a markers-only to a lines-and-markers format.

Excel 2013 Smart XY Chart with markers and lines

A lot of people are going to be very happy with this new functionality.

Chart Busters: Fix the Heat Map Donut Chart

I reviewed a colorful yet ineffective graphic in Wow! Heat Map Donut Chart!. No critique of a graphical technique is complete without describing alternative techniques which would be more effective. Starting from the heat map donut chart, I’ll work backwards, removing features that negatively affect chart readability, then forwards, changing to features that improve readability.

Hierarchy of Visual Elements

To guide our redesign of the heat map donut chart, we will refer to Figure 2 of Presentation Graphics by Leland Wilkinson.

Color gradient allows data encoding only along spokes

This figure shows William Cleveland’s ranking of different graphical features in terms of how effectively they are for encoding and decoding data.

Heat Map Donut Chart

Here is the heat map donut chart in all of its glory, including intense primary colors and gradients.

Heat map donut chart

This graphic uses color to encode its data, which according to Wilkinson’s chart is the least effective graphical feature for this purpose. The gradients in color between data points only makes the use of colors more ineffective.

Unsmear Colors

Because of the gradients of color, the data for each category is encoded only along the spoke at the center of that category’s segment of the donut.

Color gradient allows data encoding only along spokes

To use the entire segment to encode the data, to clearly delineate the segments, and to eliminate potentially misleading intermediate colors between spokes, let’s unsmear the colors.

Heat map donut chart with gradients removed

Size Wedges Proportionally

Now that we’re not using color to encode data, we can move up in Cleveland’s hierarchy. Let’s skip volume, since we already know (haven’t you been paying attention for all these years?) that 3D effects distort and mislead. Excel doesn’t even offer 3D donut charts, but there are blah blah professional state of the art yada yada business intelligence solutions that do. We’re using a donut chart anyway, so let’s make the area of the wedges proportional to the underlying data.

Donut chart with proportionally sized wedges

Use Color-Vision-Deficiency-Friendly Colors

Those primary colors are still glaring, and there are multiple points using the same colors. Let’s apply Excel’s dull but useful default color scheme.

Donut chart with color vision friendly palette

A companion article, Color Vision Issues with Heat Map Donut Charts, discusses how color vision deficiencies can hinder readability of certain color schemes, such as the scheme used in the heat map donut chart.

Sort by Size

Since we’re not very good at comparing areas, especially of rotated shapes, we can help our brains rank the data by sorting the points according to value.

Donut chart with sorted data points

Replace Legend with Data labels

The legend forces us to move our eyes back and forth from the donut to the legend, then remember which color corresponds to which category. This slows comprehension and overloads our feeble short term memory. Let’s remove the legend altogether and apply data labels to the wedges.

Donut chart with data labels instead of legend

Put the Hole Back Into the Donut

The donut chart uses area to encode data. The figure above lists angle above area, hmmm. The donut also uses angle to an extent, but there is a big hole where the angles come together. Let’s put the hole back into the donut, and use a pie chart, which uses redundant approaches of area and angle.

Pie chart (area and angle)

Encode Using Length: Bar Chart

Another glance at the figure of hierarchical graphic elements shows that length is more effective than area or angle. How do we get length? In a sense the donut chart uses length, arc length, but the elements not straight, nor do they share a common starting point, so this length isn’t used effectively. Let’s totally scrap the circular chart paradigm, and draw bars proportional in length to the underlying values.

Bar chart (length beats area and angle)

Notice an additional benefit that the bars don’t have to be different colors, and the labels are right there on the axis.

Encode Using Position Along Scale: Dot Plot

Again we check Cleveland’s list. In this context position along nonaligned scales doesn’t make sense. But position along a common scale? But position along a common scale? We have the common scale from the bar chart. Let’s put a symbol at the end of each bar, then hide the bars.

Dot plot (position along a common scale trumps all)

Like the bars in the previous chart, the dots in this plot need not be different colors. In fact, color can be used here in its most effective role, identifying different sets of dots. We could overlay target and actual values with little confusion, particularly if colors were chosen to minimize the problems of color blindness. Using different shapes for the symbols is a redundancy that helps identify the groups of data.

The bar chart and the dot plot are the two most effective ways to display this data quantitatively and clearly. Not only are these charts easy to read, they are easy to create and maintain.

What if We Work Backwards?

Suppose we were unfamiliar with Cleveland’s hierarchy of graphical effectiveness. How could we lead ourselves astray starting from such effective graphs as the bar chart and dot plot? To reinforce the lessons from Cleveland and Wilkinson, let’s explore.

We could color code the bars or dots. Not a terrible idea, but also unnecessary.

Color coded bar chart

Color coded dot plot

Now we’re free to lose sight of the effectiveness of the different encoding methods. We can decide color is all we need to show the data, so we remove the variation in lengths of the bars, or the horizontal positions of the dots.  Do you notice the conversion from a quantitative display to a qualitative one?

Length independent bar chart

Position independent dot plot

Let’s label the bars directly, or put the dots in front of the labels.

Labeled colored bars

Labeled colored dots

The last bulleted list could be made easier in the worksheet (or in PowerPoint or Word!) than in an Excel chart.

Colored bulleted list

We can simplify the labeled bullets or labeled bars by simply coloring the text. Simple and totally qualitative, complete with potential red-green color vision issues, like so many color-coded but non-graphical tables of data we’ve endured.

Labeled colored dots

I’ll leave it to the reader, guided by Figure 2 from Wilkinson’s paper, to work backwards from this colored list to a useful quantitative display. Here is the data, if you’re interested:

Heat map donut chart data

Comparison of Values from All Hinge and Quartile Methods

This is the fourth of a five part series.

Quartiles for Box Plots

This topic is covered in the companion page Quartiles for Box Plots.

Hinge Techniques for Determining Quartiles

This topic is covered in the companion page Hinges.

Interpolation Methods of Determining Quartiles

This topic is covered in the companion page Quartiles.

Comparison of Values from All Hinge and Quartile Methods

Effect of N

First and third quartiles (or first and second hinges) for N=8 through 15 are tabulated below for all of the quartile determination methods described in the previous sections.

-

Here the hinges for the Tukey (inclusionary) and the Moore and McCabe (exclusionary) methods are plotted. We see that for even N, the methods result in the same hinges, while for odd N, Tukey is closer to the median, and M&M is further from the median.

-

Here the CDF is overlaid on the previous plot of Tukey and M&M. For even N, all techniques agree, while for odd N, the CDF sticks with the method that yields a whole number index.

-

This chart plots the quartile indices for the N+1, N, and N-1 Basis approaches. The N-1 quartiles are closer to the median, the N+1 quartiles are further, and the N are in between. This is the pattern we noticed in the number lines in the previous section.

-

It becomes interesting when we overlay the various hinge techniques on the N+1/N/N-1 plot. We see that the Tukey hinges is bounded by the N-1 and N quartiles.

-

The M&M quartiles are bounded by the N and N+1 quartiles.

-

And the CDF hinges are bounded by N+1 and N-1.

-

Finally, since these quartiles are intended for use in box plots, here are box plots comparing the six techniques, one box plot each for N=8, 9, 10, and 11.

Box Plot showing variation in quartile definitions (N=8)

Box Plot showing variation in quartile definitions (N=9)

Box Plot showing variation in quartile definitions (N=10)

Box Plot showing variation in quartile definitions (N=11)

Doubling the Data Set

Before making any recommendations, let’s see how the techniques compare when we double a data set. For example, if we have a data set of {1,2,3,4,5} and another data set with the same values, but two of each, {1,1,2,2,3,3,4,4,5,5}, we would expect to find the same quartiles. Here is what all the techniques predict.

-

Forget staring at a table of numbers, the predictions are plotted in the following charts. Any pairs that are not vertically aligned have different quartiles for the data set and its double. These unmatched cases are drawn in orange in the charts below and also in the table above.

-

Of all the techniques evaluated, only CDF yields the same quartiles for all cases of a data set and its double.

Techniques Used by Software Packages

The following chart rehashes the difference between the N+1 and N-1 techniques for interpolating quartiles. Microsoft Excel’s legacy QUARTILE function uses the N-1 approach, while Minitab, JMP, and other packages use the N+1 approach. Microsoft added two functions to Excel 2010: QUARTILE.INC, which is based on N-1 and is therefore identical to QUARTILE, and QUARTILE.EXC, which is based on N+1. SAS also offers an N+1 option (see below).

-

The quartiles for the N-1 technique are closer to the median, so the interquartile range (IQR) is smaller, and the limit for identifying outliers, 1.5IQR above Q3 and below Q1, are also closer to the median.

This means that there are likely to be more outliers identified by Excel than by Minitab. This difference in behavior was a mystery to me a decade ago when my employer provided us with Minitab in addition to Excel, but now it’s very clear. (Many more things were a mystery to the brilliant minds I was stuck working with, but that’s a story for another day, when we’re killing time and beers in the pub.)

The next chart shows the two SAS quartile options. The default is CDF (SAS option PCTLDEF = 5), which as we have seen yields identical quartiles for a data set and the same data set with two of each value. SAS also offers the N+1 option (PCTLDEF = 4), which is used by Minitab, JMP, and Excel’s QUARTILE.EXC. SAS also offers three more options (PCTLDEF 1, 2, and 3), which often produce asymmetric median and quartile definitions, because they round to the larger or closer of two values instead of averaging.

-

The two sets of results are slightly different, and the CDF quartiles tend to be closer to the median than the N+1 quartiles. As with Excel’s N-1 results, the CDF will have smaller IQR than N+1, leading to identification of more borderline outliers.

Recommendations

So after all this noise, which quartile definition should you use?

The CDF approach is considered by Langford (Quartiles in Elementary Statistics) to be the all-around “best” approach. It is also the default for the powerful software package SAS, though it doesn’t seem to be used in other packages. This may then be the option of choice.

However, an important consideration is consistency. If others you work with are using Minitab or JMP, you should use the N+1 option for compatibility.

Quartiles in the Peltier Tech Box Plot Utility

This topic is covered in the companion page Quartiles in the Peltier Tech Box Plot Utility.

New Survey: What Spreadsheet Programs Do You Use?

Please participate an improved, small, non-scientific survey about spreadsheet version usage.

My previous survey allowed only one version of Excel to be selected for work and for home. I knew that this was limiting, but the survey seemed easier to set up this way. But a few comments corrected my thinking, and a few responses helpfully had multiple versions entered as “Other”, so I’m going to set that one aside, and offer the following survey. As it turns out, it was not really any harder to set it up. Live and learn. You can select any and all spreadsheet versions that you use.

WordPress sometimes does funny stuff with embedded content like this survey. If the survey does not appear when the page is first loaded, wait a few seconds, then refresh the page.

Survey: What Version of Excel Do You Use?

Update: Survey closed, Results posted

This quick and dirty survey has been closed. Thanks to all who participated. Results will be posted in the near future. In addition, a newer and better version of the survey will be/has been posted at New Survey: What Spreadsheet Programs Do You Use?

The original survey (shown below) allowed one response each for version of Excel used at work and home. This ignores those who use multiple versions, for example, developers who need too make sure their spreadsheets will work on any version of Excel.

Excel Version Usage Survey

My main intention in this survey was to get an idea of how many Excel users have upgraded to “New” versions (2007 and 2010), and how many are still using the “Classic” versions (1997 through 2003). The following chart shows my findings.

Breakdown of Excel versions in use

So about 86% of users have upgraded, while 14% are staying pat. The percentage was almost identical for usage at work and at home. I’m not sure if I’m surprised.

A handful of respondents entered multiple versions into the “Other” boxes in the survey; if these responses included both Classic and New, I didn’t count them. The follow-up survey will try to capture this usage more accurately.

Peltier Tech Chart Utilities for Excel Peltier Tech Chart Utility Peltier Tech Waterfall Chart Utility for Mac Peltier Tech Cluster-Stack Chart Utility for Mac

 

Create Excel dashboards quickly with Plug-N-Play reports.

 

Site by: Dawud Miracle, Business Coach & WordPress Websites