Charting Principles

Multiple Rows or Columns as Chart Series Data

Monday, January 22, 2024 by Jon Peltier Leave a Comment

Monday, January 22, 2024 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

The Problem: Your Y data is in more than a single row or column.

If you try to populate a chart series with 2D data where it isn’t allowed, you’ll encounter this error:

Chart series data error: must be a single cell, row, or column.

The reference is not valid. References for titles, values, sizes, or data labels must be a single cell, row, or column.

This isn’t strictly true. Not one of the objects listed is restricted to a single cell. Any text element in a chart (chart or axis titles, data labels, or shapes) can link to multiple cells, but the linked range must be contiguous and in a single row or column; the same is true for the name of a series. The X values of a chart series can be multiple rows or columns, which produce tiered axis labels such as those shown in LAMBDA Function to Build Three-Tier Year-Quarter-Month Category Axis Labels. The Y values of a chart series must link to data in a single row or column.

The Setup: Y data is in multiple rows or columns.

Here is the problem. The data contains more than one row (below left) or column (below right) but want it to be plotted in a single series. If you select the data and insert a chart, Excel parses the data into two chart series. The series formulas are shown below the charts, with font colors matching the series colors.

Original data in multiple rows or columns produces charts with multiple series.

Let’s try to fix this. First, delete the second series of the chart.

Now try to enter the larger range into the series formula.

Just try to assign a multiple row or column range to a series, I dare you!

Excel rejects the changed formula, with the error message described earlier.

There is an exception to the single row or column rule for Y values. You can specify compound (multiple-area) ranges for Y values, as shown below for our multiple row and multiple column data ranges. The multiple areas in a compound range don’t even all need to be all by row or by column.

Assign compound data ranges to chart series data.

This works pretty well, but I think it’s pretty difficult to understand and maintain.

TOROW and TOCOL to the rescue!

Microsoft has released a plethora of new Dynamic Array functions. Among these are TOROW and TOCOL, which are used to arrange values in a 2D range into a new 1D range, shown below under the data ranges. TOROW and TOCOL produce ranges with the values in the same order, so whether we use one or the other is a matter of preference. There are two series formulas below each chart, showing the ranges produced by TOROW and TOCOL.

There is a problem, however. The charts don’t look the same for original data in rows vs in columns. This is because both TOROW and TOCOL take all the cells in the first row of the original data and append all cells in each successive row. This causes the data to be out of order when performing TOROW or TOCOL on columnar data. We can fix this by transposing the data first.

And now all of our charts are consistent.

You could also construct more complicated formulas with other Dynamic Array functions. For example, if I wanted to turn a multiple row range into a single row, I would use:

=LAMBDA(x,
  LET(
    rx,ROWS(x),
    cx,COLUMNS(x),
    MAKEARRAY(
      1,rx*cx,
      LAMBDA(r,c,
        INDEX(x,INT((c-1)/cx)+1,MOD(c-1,cx)+1)
      )
    )
  )
)(multi-row range)

To convert a multiple-column range into a single column, I would use:

=LAMBDA(x,
  LET(
    rx,ROWS(x),
    cx,COLUMNS(x),
    MAKEARRAY(
      rx*cx,1,
      LAMBDA(r,c,
        INDEX(x,MOD(r-1,rx)+1,INT((r-1)/rx)+1)
      )
    )
  )
)(multi-column range)

I’m sure people can write more efficient formulas than this, but the TOROW and TOCOL formulas are very concise.

Use Names to keep the worksheet clean

We can implement TOROW and TOCOL in Names rather than in the worksheet, and the Names work just fine in the chart SERIES formulas. Go to Formulas > Define Name; for Name type YrowTOROW; for Scope select the current sheet (Data); and for Refers to enter =TOROW(), put the cursor between the parentheses, and select C2:F3; then press Enter.

The four relevant names are:

Name: YrowTOROW
Refers To: =TOROW(Data!$C$2:$F$3)

Name: YrowTOCOL
Refers To: =TOCOL(Data!$C$2:$F$3)

Name: YcolTOROWT
Refers To: =TOROW(TRANSPOSE(Data!$C$2:$D$5))

Name: YcolTOCOLT
Refers To: =TOCOL(TRANSPOSE(Data!$C$2:$D$5))

These all produce the same values in the same order in either horizontal or vertical arrays. The chart SERIES formulas do not care. Notice that we applied the lesson from before, of transposing the columnar data before using TOROW or TOCOL; I’ve appended a T on these Names.

Same result as before. Using Names keeps the worksheet cleaner, but I don’t mind seeing the actual data I’m plotting in my worksheet.

Neat Trick: Double Unary Minus

Last week I learned a new trick. Well, it was new to me, but apparently it has been around for a long time, predating Dynamic Arrays by decades. My colleague Roberto Mensa was showing me some of his recent charting exercises (check them out at E90E50Charts – Excel Charts Gallery) and he showed me this trick.

You can use a double unary minus, that is, a double minus sign, to force an Excel chart to treat a multiple row or column range as a single array. The double minus is used to convert TRUE and FALSE to 0 and 1, to convert text into numeric values, and in this case to convert a range into an array. When a 2D array is passed to a chart series, it combines all rows of the array into a 1D array.

The double minus must be used in a Name in order to work with chart data. You can define the following names for row-based or column-based data:

Name: YrowMINUS
Refers To: =--Data!$C$2:$F$3

Name: YcolMINUST
Refers To: =--TRANSPOSE(Data!$C$2:$D$5)

Define them in the scope of the workbook Data, and as before, transpose the columnar data first. Then you can edit the SERIES formulas to use these Names.

This double minus approach doesn’t take precedence over the Names that use TOROW or TOCOL, of course, it’s just another tool for your toolbox.

Posted: Monday, January 22nd, 2024 under Chart Data, Charting Principles, Dynamic Arrays.
Tags: Chart Data, Dynamic Arrays.
Comments: none

Good Chart Data

Thursday, November 18, 2021 by Jon Peltier 7 Comments

Thursday, November 18, 2021 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

Tips and Tricks

Everybody loves their Excel tips and tricks. I know I’m famous for my extensive collection of Excel charting tricks. But the most important charting trick is Get The Data Right. The secret of successful Excel charting begins and ends with Good Chart Data. I teach it in all of my charting workshops and seminars, so I was surprised that I did not have a blog post dedicated to good chart data. But now I do.

Good Chart Data

What do I mean by “good” data? Of course, “good” has to do with the quality of the data. Where is the data from? How was the data measured or collected? Is the source reliable and trustworthy?

All of this is important, for any data you consume in Excel or in any other program. But right here, “good” data means data that can easily be rendered in a chart, without having to make excessive adjustments to the chart. Good chart data may not be good display data, but data that has been optimized for display, say in that annual shareholder’s report, is almost guaranteed to be bad for charting.

“Good” data has a layout that makes chart creation easy for you, so Excel knows how to partition the data between the important parts of the chart. X values or categories. Y values. Series names.

“Good” data also has as little formatting as possible. Enough formatting to make it readable, but not so much that it causes retina pain.

You can make good charts from bad data, but it will take longer, and you will have to muck around with the chart for a while to make it work. It’s better to spend five minutes with the data now than to spend five hours trying to clean up a chart later.

Tl;dr Good Chart Data

Good chart data has the following characteristics:

The data exists in a contiguous range: no blank rows or columns
The data is minimally and consistently formatted: easy to validate visually
The data is aligned with Y values in columns (shaded blue below)
X values are located in the leftmost column (purple)
Series names are located in the topmost row (red)
The top left cell may provide some magical behavior (gold)
The data may exist in an Excel Table

A Good Tabular Display is not Good Chart Data

What if you want to plot your data but also display it in an optimally formatted way? The good new is, you can. The bad news is, it takes more work. You should have two different data ranges: one arranged for best charting outcomes, the other formatted for visual consumption. Both of these ranges should link to the same original source, to make sure changes are reflected everywhere, and you maintain one version of the data.

Good Chart Data is Contiguous

A good chart data range is contiguous, that is, it is comprised of data in a single block of cells. There are no completely blank rows or columns separating the data into separate areas.

Why is contiguous data important? Because Excel tries to be smart when you click a button. If you select any single cell in the range shown above and click any chart button to insert a chart, Excel tries to determine the extent of your data. Excel will go from the active cell up, down, left, and right until it finds either the edges of the worksheet or a blank row or column, and it will include all of the data within this range for the chart. This is so much easier than having to select the entire data range, especially if the data extends beyond the first visible screen.

Excel does the same hunt for the current region when you convert a range to a Table, or create a Pivot Table, or do any number of other things in Excel.

What About Blank Cells?

It’s perfectly fine to have blank cells within the data range, as long as there are no completely blank rows or columns.

Keep in mind that successful charting means that these “blank” cells should be completely empty. A cell with a formula that returns “” is not blank, because it contains a formula. If you have the formula return NA() instead of “”, the cell will not look blank, it will contain the #N/A error. This looks ugly, but a chart treats most #N/A values as blank cells, while it treats “” as text. Text might be plotted as a zero, or it might mess up an axis.

A cell that contains a space character is also not blank, because it contains that space character. Sometimes people get in the habit of typing the space instead of just skipping the cell. This kind of non-blank cell is hard to fix because both the cell and the formula bar appear blank.

Good Chart Data is Lightly Formatted

You should only use as much formatting as you need to help you quickly scan the data.

The range below is a bit over-formatted. The header row is bold and multi-colored, and the red and yellow are distracting and may eventually cause eyestrain. Bold text and light gray fills might be okay for headers, though they are unnecessary if the headers follow one or more blank rows.

The dark borders on all of the cells break up the appearance of the data range and the dark lines compete for attention with the numbers. The default light gray cell borders are sufficient. I’ve programmed some buttons in my general purpose and charting software to apply light and medium gray borders to selected ranges.

The green shading can be distracting. (See my arbitrary pattern? All cells with 3 are shaded, while b and d are shaded because they have no 3 in their rows. I don’t know why.) If you find shading helps to identify certain values, don’t manually color the cells: use Conditional Formatting instead, so the shading goes away if the condition is no longer met.

Good Chart Data is Not Centered

You should avoid centering your data. It might look “nice”, but centering hides important characteristics of the data. By default, text is left-aligned in cells and numbers are right-aligned. A common feature of “bad” data is numbers stored as text, and centering everything hides this distinction.

Here is a data range with all of its cells centered. There are some small triangular flags in some cells that indicate a possible error, but I think most of us have learned to ignore warnings like this.

When we uncenter the data, it’s easy to detect the cells with numbers stored as text, even if we’ve ignored the green flags.

We can select the flagged cells, click the little warning dropdown, and convert them to numbers.

Now that we’ve converted text to numbers, we still have difficulty parsing the numbers, because they have inconsistent numbers of decimal digits. Some numbers look larger than others, but they merely have a longer tail after the decimal point.

When we apply a consistent format, we can tell that the larger numbers are in fact longer.

It’s now very easy to identify the largest numbers in each column. You might want to check that value of 345 in the “beta” column since it’s 100 times as large as the rest of the column.

Good Chart Data is in Columns

Excel charts can work with data in columns or in rows. You can use either arrangement and sometimes one just works better than the other. If a chart’s source data has more rows than columns, Excel creates the chart with series in columns. A pivot chart always plots the pivot table with series in columns.

It is probably good practice to get used to using data in columns, because of the way a database table is structured. A database has fields and records. A field is a variable or measurement, such as date, eye color, serial number. A record is a single instance of a set of values for these fields.

When printed on paper or viewed onscreen (or imported into Excel), a database table is shown as a grid of rows and columns. Each column is a field or a variable, and each row is a record. The first row of the database table is a header row that contains the field label. Usually one column of the table, often the first column, contains a unique identifier or key for that particular record.

In the table above, we have fields for TLC, alpha, beta, and gamma. We have records for a, b, c, d, e, and f.

Good Chart Data Has a Header Row

Like a database table, chart data should have a header row. When data is plotted with series in columns, this header row is used for series names, that is, the labels that appear in the legend of the chart.

Database tables have one header row only. Chart data ranges usually only have one row, but you can use more, often to good effect.

Good Chart Data Has no Subtotal or Total Rows

Subtotals and totals help to understand data, but they have no place in the source data for a chart. Suppose I have monthly values I want to plot.

If I put subtotal rows into my source data, it breaks up the visual appearance, so it’s hard to scan the individual values for discrepancies.

The quarterly subtotals disrupt the flow of data in the chart, and the much larger magnitude of the subtotals shrinks the monthly values.

Chart Monthly Values and Quarterly Subtotals

If I include yearly totals, the quarterly subtotals shrink, and the monthly values are lost in the weeds.

Chart Monthly Values, Quarterly Subtotals, and Yearly Totals

Note that you can filter the quarterly and yearly categories from the chart in recent versions of Excel. Also note that a pivot chart will only show values from the pivot table and not totals and subtotals.

Good Chart Data Has a Header Column

Just as a database table has a unique key field, the data range for a chart should have a header column identifying each row. This column is generally used for labels or values which are plotted along the X-axis of a chart.

This X-axis column should be the first column, to the left of any Y-axis values. This makes charting easier because Excel looks to the left for X values. But you would be amazed at how many people have trouble plotting their data when they have placed their Y values to the left of their X values. You can always tell Excel which data comes from where, but it is a lot more work, especially if you have to do it repeatedly, again and again, ad nauseam (can you feel the tedium?).

Another trick to help Excel identify your X values is to make the column of X values different from the Y values.

First Column Different: Text

Probably the most common way to make the first column different is by filling it with text. While month names are a component of a date, a list of month names is text, as shown here.

When you select the data range (or one cell inside the data range), Excel uses the text labels in the first column as X-axis (category axis) labels, it uses the other columns as Y-axis (value axis) values, one column per series, and it uses the labels in the header row as series names. Here are line and column charts using this data (and area charts work the same way). Months from the first column are automatically placed along the category axis, headers from the first row are automatically used as legend entries (series names). and each column of Y values is plotted as a distinctly formatted series.

Line and Column Charts Made from Data with Text in First Column

It’s the same with a bar chart, except that the category and value axes are switched. That’s right, in a bar chart, the X-axis is vertical and the Y-axis is horizontal. But the origin of the axes is at the bottom left in all of these charts, so values increase from left to right (like months advance from left to right in a line or column chart). Similarly, months advance from Jan to Jun moving bottom to top in a bar chart (like values increase from bottom to top in a line or column chart).

Bar Chart Made from Data with Text in First Column

Confused? Since the months are listed from top to bottom in the worksheet, it would make sense to plot them from top to bottom in the chart. But there is logic in how Excel does it, and it is also pretty easy to fix when you know how. See Excel Plotted My Bar Chart Upside-Down for the simple technique.

XY Scatter charts are different: they have numerical axes for both X and Y (category and value) axes. When you plug in text for the X values, the chart doesn’t know what to do. Normally text is considered to have a value of zero, but for X values in an XY chart, Excel substitutes the counting numbers 1, 2, 3, up to the number of points. (Excel also does this if no X values have been specified.)

XY Scatter Chart Made from Data with Text in First Column

What if ever have an XY chart with the numbers 1, 2, 3, etc. along the axis, and you know you selected numerical data for the X values? Check the range of X values: there is probably a number stored as text, or an actual text label, somewhere in the range.

First Column Different: Dates

Using dates in the first column is another way to make the first column different. Excel recognizes the date formatting of the cell and parses the column as X values. An added bonus is that line, column, area, and bar charts have a special date type of axis (as opposed to the text axis shown above) that provides unique formatting options. This range has dates in the first column; notice that the dates are not uniformly spaced, but are taken on the 1st and 8th of each month.

Here are line and column charts made from this data. The line chart looks great and shows some of the enhanced date axis features. The data points are not equally spaced but reflect the non-uniform spacing of the data. Also, there is a tick mark and axis label on the first of each month, despite the nonuniform month lengths.

Line and Column Charts Made from Data with Dates in First Column

The column chart is wacky though. To accommodate the nonuniform data spacing, the chart has a slot for each day, even days without data. So each column chart series has to appear within the slot for its given day, and there are lots of days in between. This column chart should really be called a toothpick chart.

You can always change the axis type to text, and the column chart will look normal. But you lose the non-uniform nature of the dates and the first-of-the-month labels.

Line and Column Charts Made from Data with Dates in First Column, but with Text Axis

An XY scatter chart treats dates for X values just like any numerical X values. You can see the non-uniform spacing of the data.

XY Scatter Chart Made from Data with Dates in First Column

But the axis is not labeled as naturally as the line chart above. Using the value axis algorithm, Excel picked an axis that begins at 44,180 and increases to 44,280 in steps of 20. (Excel stores dates as whole numbers starting on 1 January 1900, so 44,180 is 15 December 2020 and 44,280 is 25 March 2021.) You can format the XY chart’s axis to begin on the 1st of a month, and have a tick label on the 1st of the next month. But for a non-trivial number of months, it’s impossible to repeat this pattern with Excel’s default axis labels.

First Column Different: Numbers

If you have numerical X values in the first column, they won’t be different from the numerical Y values in the other columns. You know those values in the first column are years, and I know they are years, and the header label even says “Year”, but Excel simply recognizes them as numbers.

This doesn’t matter for an XY Scatter chart: Excel almost always treats the first column of numbers as X values. This XY chart shows years along the X-axis, as intended.

XY Scatter Chart Made from Data with Numbers in First Column

When Excel created this line chart, it saw the numbers in the first column and decided to plot them as Y values. There are two consequences of this: First, there is a series of values near 2000 floating far above the intended Y values in the chart; second, no X values were specified, so Excel simply used the counting numbers 1, 2, 3, etc.

Line Chart Made from Data with Dates in First Column

You could avoid this by selecting just the Y values when the chart is created, and specifying the X values later. You could also do this with years by entering dates in the first column (1/1/2015, 1/1/2016, etc.) and formatting them using a custom number format of YYYY, which will display just the year numbers. Since the column now is formatted as dates, Excel will plot them the way you want.

You cannot avoid this by formatting the first column of numbers as text, Excel is too smart, and it converts the numeric text as numbers, and therefore as Y values.

First Column Different: Top Left Cell

You may have noticed the label “TLC” in the header row of the first column. TLC stands for Top Left Cell, and it is a little piece of magic for Excel chart data.

One way to make the first column different from the rest is to clear the contents of the Top Left Cell, as shown below.

Data with Numbers in First Column and Top Left Cell Blank

We already know that an XY Scatter chart will plot the first column of numbers as X values with a label in the top left cell. But it also works if the top left cell is blank.

XY Scatter Chart Made Using Data with Numbers in First Column and Top Left Cell Blank

The top left cell works its magic in line (and area, column, and bar) charts. This line chart was generated automatically with the column below the blank top left cell as X values, and the columns below actual labels as Y values. Finally, a way to plot numbers as X values in a line chart without worrying about formats (text or dates).

Line Chart Made Using Data with Numbers in First Column and Top Left Cell Blank

Here is an important difference between XY scatter charts and line charts. Just because you can trick Excel into plotting numbers as X-axis values in a line chart, you can’t trick Excel into plotting those X values as numbers. The next section shows a few different sets of data that will help illustrate this difference.

Spacing and Order of X Values: Numbers and Dates

Evenly Spaced Numbers

Evenly spaced X values seem to be similarly plotted in XY and line charts. In the XY chart, the X-axis begins at zero and extends beyond the highest X value. In the line chart, the first X-axis label is the first X value and the last X-axis label is the last X value, without the padding found in a scatter chart’s default axis values.

Unevenly Spaced Numbers

Unevenly spaced X values are plotted differently in XY and line charts. In the XY chart, the X-axis begins at zero and extends beyond the highest X value, and data points are plotted unevenly, reflecting the unevenness in the X values. In the line chart, the first X-axis label is the first X value and the last X-axis label is the last X value, data points and X-axis labels are uniformly spaced regardless of their apparent numerical values, and there are no X-axis labels for missing X values.

It is obvious that the line chart treats the X values as non-numeric text labels, ignoring any apparent numerical values. This is because the default axis for non-date X values is a text axis (below left). We can change this to a date axis (below right), and we see the data points are now plotted according to their unevenly-spaced numerical values. The first X-axis label is still the first X value and the last X-axis label is still the last X value. (Excel treats the numbers as dates with a format of “D”, so only the day shows.)

Different Treatment of Unevenly Spaced Numbers by Text Axis and Date Axis in a Line Chart

Numbers Out of Order

When numbers are out of order, the above behavior is even more different. The XY scatter chart draws the points and the lines connect them in order, moving left or right as the X values decrease or increase. The line chart shows the points in the order they appear in the data range: the first X-axis label is still the first X value and the last X-axis label is still the last X value, so the labels are out of order.

When we convert the text axis (below left) to a date axis (below right), we see that not only have the points been spaced according to their non-uniform values, but that the dates have been internally sorted prior to plotting. This internal sorting is a feature of charts with a date axis. The first X-axis label is the smallest X value and the last X-axis label is the largest X value, so the labels are in order and the points are plotted left to right.

Different Treatment of Numbers Out of Order by Text Axis and Date Axis in a Line Chart

Evenly Spaced Dates

Let’s look at the same charts with dates instead of regular numbers. The XY scatter chart (left) and line chart (right) plot evenly spaced dates along the X axis in a similar way. The data is evenly spaced, so the data points are plotted evenly. The XY chart extends its X-axis a little bit below to a little bit above its data range, while the line chart uses the earliest date as the first axis label and the latest date as the last axis label.

Unevenly Spaced Dates

The XY scatter and line charts also plot unevenly spaced dates similarly. The points are spaced unevenly to reflect the pattern in the data. As before, the XY chart extends its X-axis a little bit below to a little bit above its data range, while the line chart uses the earliest date as the first axis label and the latest date as the last axis label.

Dates Out of Order

Out of order dates are plotted in an XY scatter chart in the order they appear in the worksheet. The lines connecting the points start at the first point, and move left or right for earlier or later dates. In the line chart, the lines always connect from left to right, reflecting the internal sorting that takes place. As always, the XY chart extends its X-axis a little bit below to a little bit above its data range, while the line chart uses the earliest date as the first axis label and the latest date as the last axis label.

Problem with Numbers and Dates Out of Order

Perhaps it’s convenient that Excel’s line charts sort by date prior to plotting, and there are a few tricks that rely on this internal sorting. But this sorting can also lead to problems with data labels.

Here is a simple data set, with dates, sorted values, and days of the week corresponding to the dates. The chart uses dates as X values and sorted values as Y values, and the days of the week were used to label the points, using the Values as Cells option. The labels are shown in order from Monday through Friday, as expected.

Data Labels work nicely when dates are sorted in the worksheet.

Below is a data set, with the same information as above, but with the dates out of order. The chart looks the same as above, since the X and Y values are sorted by date before plotting. But look closely at the data labels. These were not sorted prior to plotting, and are no longer in order from Monday to Friday.

Data Labels don't sort when out-of-order dates are sorted in the chart.

In addition to this problem with data labels, it is likely that someone who is using the data for other purposes may misinterpret the data because they don’t notice or understand Excel’s internal sorting. For these reasons, it is a best practice to sort the worksheet data before plotting your data.

Thanks to alert reader Jim Chisholm for reminding me of this problem.

Top Left Cell Plus

We’ve seen how a blank top left cell can help Excel to parse data correctly into X and Y values and series names.

A Blank Top Left Cell Helps Excel Parse the Chart Data Range

But this concept of blank cells is more magical than that. Suppose you have two columns of category labels, for years and quarters. Each of these label columns has a blank cell in its header (blank cells are shaded gold). Each year only appears once, next to the first quarter, and there are blank cells next to the other quarters.

Excel sees the two blanks in the top left, and uses both columns as category labels. There are two rows of labels, with quarter labels in the first row and years in the second. Each year label is centered under the corresponding quarter labels, and a vertical tick mark extends from the axis to help delineate the labels.

Multiple Blank Cells in the Category Labels Region of the Chart Data Range Can Produce Multiple-Tier Labels

This is a very nice effect, only available for line, column, area, and bar charts that are using a text style X-axis. It will not work for a date axis or for an XY scatter chart’s X-axis: in those cases you will get the numbers 1, 2, 3, etc.

You may have seen this kind of data arrangement in a Pivot Table, with multiple fields in the rows area, and noticed the multiple-tier category labels in the corresponding Pivot Chart. But you’re not stuck needing a Pivot Table, you can build this data layout yourself. You are not limited to two tiers of labels, either: below you can see a three-tiered axis, and I’ve seen it used for 5 or 6 tiers.

The Multiple-Tier Axis Effect is not Limited to Two Tiers

What if you indicate to Excel that you have two rows of series names? The same effect applies. The North label is combined with the alpha and beta series names, What if you indicate to Excel that you have two rows of series names? The same effect applies. The North label is combined with the alpha and beta series names, while the South labels is combined with the gamma and delta series names, to generate compound names.

Multiple Blank Cells in the Series Names Region of the Chart Data Range Can Produce Compound Series Names

You can combine the effects of multiple-tier category axis labels and compound series names in the same chart, as shown below.

The Multiple-Tier Axis and Compound Series Names Effects Can Be Used Together

Note that these blank cells must be totally blank, and not just look blank like a formula that returns a value of “”. And while a chart can treat #N/A as a blank in a chart’s Y values, #N/A will be treated as a text label, not as a blank, when used in the series name and category labels regions of the data range.

The Select Data Source Dialog Recognizes Good Chart Data

When a chart’s data doesn’t conform to these definitions of “good” data, the Select Source Data dialog shows only a blank for the chart data range. Below the range selection box, you are told “The data range is too complex to be displayed.” This means that the data is irregular. Not all series start and end at the same row, perhaps, or the series have different numbers of points. Perhaps the series names are misaligned from the Y values, or the series are plotted out of order. Anything that prevents Excel from indicating a nice rectangular block of data.

Select Data Source Dialog for Bad Chart Data

When the data does conform to our “good” data definition, the Select Source Data dialog happily displays the address of the data range.

Select Data Source Dialog for Good Chart Data

In fact, the Select Source Data dialog is a bit more forgiving that my rules. If the data is separated by complete blank rows or columns, but otherwise fits within a rectangular range, the dialog shows the addresses of the various areas of the data range.

Select Data Source Dialog for Discontiguous but Otherwise Uniform Rectangular Chart Data

Good Chart Data May Be in a Table

An Excel Table is a special data structure which provide advanced data handling capabilities; I have converted my data into a Table below. The header row has buttons for filtering and sorting of the Table. You can add a total row if desired. There are numerous styles you can apply, some of them approaching hideous; the Table below shows the default style.

There are many benefits to storing your chart data in a Table. The major benefit is that when you add data to the row below or the column to the right of a Table, the Table automatically expands to include the added data. What’s more, any formulas that reference a Table column or row will update automatically if the Table expands or contracts. This includes chart SERIES formulas and the Chart Data Range formula in the Select Source Data dialog. So if your chart uses a Table like this for its source data, when you add rows of data, each series in the chart will add the corresponding points; when you add columns to the Table, your chart will add series. Dynamic charts made easy!

One drawback to using a Table for your chart’s source data is that the header row of a Table can contain no blank cells. This means that a lot of the blank-cell based data parsing, especially the top left cell magic, may not work with a Table. But if your first column(s) are not numerical, Excel will still automatically parse them into X values. And you could still select just part of the Table, create your chart, then manually specify the X values.

Posted: Thursday, November 18th, 2021 under Charting Principles.
Tags: Chart Data.
Comments: 7

Prepare Your Data in a Chart Staging Area

Monday, October 19, 2020 by Jon Peltier Leave a Comment

Monday, October 19, 2020 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

You can spend five minutes fixing up your data, or five hours working on a chart with the wrong data.

A user on the Mr Excel forum asked about creating a chart from unsuited data. He asked, “Is there a way to do this without modifying the data table?”

My reply started with “I know you don’t want to hear this, but your data is in the wrong arrangement.” Fortunately, the data was simple and the arrangement was consistent, so it was easy to create a chart staging area to prepare the data for the chart.

Simple Chart Staging Area

I’ve recreated the data below. Obviously someone went to a lot of trouble laying out the data and formatting it just right. This becomes a problem when people become too attached to their fancy display.

The objective is to plot each product by month, for a single year. The chart needs three lines, one for each product. You can tell that it’s not possible using this data directly.

Original data which needs to be staged for the chart.

In general, one would have to rearrange the data, feed it into a pivot table, and create a pivot chart from that. But sometimes you’re lucky enough to be able to write a few formulas instead.

Here is the chart staging area I was able to construct. I listed the months down the side (N3:N14) and the products across the top (O2 to Q2). I put the year in N2, and highlighted it with light gold so a user knew it was important.

Rather than require the user to type in a new year, and possible type an invalid year, I set up data validation in cell N2. Click Data Validation on the Data tab, select List from the Allow dropdown, then select the years in the first row of the original data range (C2:L2).

Data validation dialog to create a dropdown list in the chart staging area

The magic formula is in cell O3 of the chart staging area. It’s a relatively simple INDEX formula with a few MATCHes to find the right cell of the original data range.

=INDEX($C$3:$L$38,
       MATCH($N3,$A$3:$A$38,0)+MATCH(O$2,$B$3:$B$5,0)-1,
       MATCH($N$2,$C$2:$L$2,0))

I created a dynamic title for the chart with this formula in cell N17:

=N2&" Product Sales by Month"

Finally I selected the data range and inserted a line chart.

Line chart created using data in chart staging area

Note that I have included the blank row below the chart staging area. This adds a blank category to the chart with no data points, creating space for the data labels I added to the series. These labels, with font color to match the data points, make it easier to identify the data than the legend, so I deleted the legend.

To get the chart title into the chart, I selected the chart title, typed an equals sign in the formula bar, then clicked on cell N17 and pressed Enter. I also aligned the title with the edge of the plot area.

Create Chart Staging Area with Power Query

Sometimes (most times?) your data will be too complicated or too irregular to use simple INDEX/MATCH formulas to build a staging area. In the old days we would rely on copy and paste and a lot of one-off lookup formulas, but I’ll show how easy it is to stage this data with Power Query.

Rearrange the Data with Power Query

First select your data (or one cell in the data range) and on the Data tab of the ribbon, in the Get & Transform Data group, click From Table/Range. If the data isn’t already in a Table, you will be prompted to create one. Make sure the data is correctly identified, and check the My Data Has Headers box. The Table is created, and the Power Query editor opens up and shows the data.

I often remove the step of the query that assigns variable types, because I take care of that later. I selected the first two columns, right clicked in the headers, and selected Unpivot Other Columns.

This gives me the four-column data arrangement I will need for a Pivot-Table-based chart staging area. Here is where I changed the data types of the columns: Month and Product to text, Attribute to whole number, and Value to decimal number.

Then I renamed Attribute to Year and Value to Sales.

Finally I dragged the Year column to the first column of the table.

I clicked Close & Load and landed the query into a Table on a new worksheet.

It’s always better to start with an orderly Table as your data, and base your chart data and any tabulated displays on this.

Create a Staging Area with a Pivot Table

From the Table above, I created a Pivot Table on a new worksheet, with Year in the Filters area, Month in the Rows area, Product in the Columns area, and Sales in the Values area.

I created a Pivot Chart, then added a Slicer based on the Year field of the Pivot Table. It’s very easy to select a year, even easier than with the data validation cell dropdown I used in the first approach.

Recreate the Original Data Layout with a Pivot Table

With the data in a well-structured Table, we can create another pivot table to mimic the original data layout. I put Year in the Columns area, Month and Product in the Columns area, and Sales in the Values area. I chose the Repeat All Item Labels in the Report Layout dropdown on the Pivot Table Design tab.

The same effort that went into formatting the original data can be reproduced on this Pivot Table.

Posted: Monday, October 19th, 2020 under Charting Principles.
Tags: data preparation, Power Query.
Comments: none

Microsoft MVP Logo

Bar lengths on a chart, what do they even mean?

Thursday, January 9, 2020 by Jon Peltier 12 Comments

The Misleading Chart

My friend and colleague Patrick Matthews, a former Excel MVP, posted a screenshot of an unusual bar chart on his Facebook page. The chart was taken from What does the public say about impeaching Trump?, the last section of a Washington Post article titled What happens next in the impeachment of President Trump? Patrick’s comment says it all: “Bar lengths on a chart, what do they even mean?”

At the risk of opening a torrent of political comments, I’ve reproduced the chart here.

Take a close look at the bar lengths in the first chart. The 12% bar is over half as long as the 85% bar, where in a bar chart with proportional bars, the 12% bar should be about 1/7 as long. But at least the 49% bar is slightly longer than the 47% bar, and they are in between the 12% and 85% bars. The same holds true of the bar lengths in the second chart.

Someone responded to Patrick’s post, wondering how they came up with those bar lengths. After the analysis in the previous paragraph, I replotted the data, set the axis scales to -100% to +100%, and set the vertical axis to cross at -100% on the horizontal axis. Nailed it!

Guess at the derivation of unusual bar lengths.

Well, not exactly. As I sometimes do, I overanalyzed the charts. I’ve stripped most of the text from the WaPo graphic, replaced the outlines of my charts with red lines, and stretched my charts so they overlaid the WaPo plot.

It turns out that the axis minimum was really -92%, so my wild guess of -100% was pretty good. I’ve set the gridline spacing so that 0% and +92% are shown on the chart, and the far right edge of the plot area is at +100%.

Exact determination of unusual bar lengths.

I don’t think the graphic artist really used an axis minimum of -92%. I’m sure they started with 0%, then decided to fill in some white space by dragging the left edges of each bar while keeping the right edge in place. They filled in the space, all right. But by doing so, they obscured the differences between the values.

It’s the same issue that occurs when people start their axis at a value greater than zero, so the differences between values are accentuated. But now the axis and the bars start well below zero, and the differences are minimized.

Fixing the Chart

My next step was to take my two charts, and set their axis minimum to 0%. These two charts now accurately show the relative percentages.

Bar charts with appropriate bar lengths.

Improving the Chart

Those last two charts were a big improvement. But if we’re expected to compare the values, shouldn’t the bars all be in a single chart? Below I plotted the negative of one set of data, so the bars stretch in opposite directions, the way they do in population pyramids. Let’s call this a diverging bar chart.

Then I remembered why I dislike population pyramids, as I discussed ages ago in Tornado Charts and Dot Plots. It’s hard to compare bars that reach away from each other. It would be easier to compare the values of any two bars if they start at one horizontal position (the vertical axis) and stretch in the same direction (to the right). So I created this clustered bar chart:

An alternative is to plot one set of bars from left to right, and the other from right to left. It’s a converging rather than a diverging bar chart. This makes individual bars more difficult to compare, as in the population pyramid lookalike above. But the white spaces clustered between the colored bars represent the percentages of each category who have no opinion.

What do you think? Not about the topic of the chart, but about the construction of the chart. Do you prefer the diverging bar chart, the clustered bar chart, the converging (stacked) bar chart, or something else entirely?

Posted: Thursday, January 9th, 2020 under Charting Principles.
Tags: Bar Charts.
Comments: 12

Order of Series and Legend Entries in Excel Charts

Tuesday, December 31, 2019 by Jon Peltier 3 Comments

Chart Series and Legends

In Excel charts, series are drawn in a particular order and legend entries are listed in their own particular order, based on series number, series chart type, the axis a series is plotted on, and other features, like axis category order and whether series are stacked. People often ask how to move series within a chart or within a legend. This article should help explain what is even possible.

Single Chart Type

If a chart has one type of series, all on the primary axis, the series are plotted in the order that they are added to the chart. The legend usually lists them in this order as well.

Line Charts

The line charts below illustrate the layering of series. Whether lines with markers or just lines, the second series in red is always plotted in front of the first series in blue. Even to the extent that the red line in the bottom right chart covers the blue markers. The legends all list the blue series first, then the red series.

An interesting inversion occurs if one series has markers and no lines. The markers only series is plotted in front of the line series (top row of charts below) or in front of the lines with markers series (bottom row). In both right-hang charts, the blue markers of the first series or drawn in front of the red series. The legends still list blue first, then red.

XY scatter charts show the same behavior as line charts, with series plotted in order except in the case where one series has markers and no lines.

Column and Bar Charts

In unstacked horizontal bar charts, legend entries appear in the same order that the series appear in the chart. Below left, red bars and legend entries appear before blue.

Below right, the categories have been reversed in the vertical axis. Now blue bars and legend entries appear before red.

Legend entries in bar charts appear in the same order as the data in the chart. When reversing the axis categories switches the positions of the bars, it also reverses the legend.

This swapping of the legend to match the arrangement of the bars helps with interpretation of the chart. You might expect column charts to show the same alignment of legend entries with plotted points.

In the original chart, below left, the blue series 1 columns and legend entry appear before the red series 2 columns and legend entry. Reversing the categories in the horizontal axis does reverse the positions of the bars, but the legend keeps its original order, so the legend entries are now out of sync with the columns.

Legend entries in column charts don't always appear in the same order as the data in the chart. While reversing the axis categories switches the positions of the bars, it does not reverse the legend.

Effect of Stacking

In stacked charts (area, bar, and column charts), each series is stacked on the previous series, so the chart shows totals of all series at each category. The legends are rearranged to list series in the order they appear in the stack. This helps the viewer interpret the plotted data.

Note: You can also stack line chart series, but it’s not a good idea to do so; stacked line charts can be very confusing.

In the unstacked area chart below left, series 1 in blue is plotted first, and series 2 in red is plotted in front, partially obscuring series 1. The legend lists the series in this order.

In the stacked area chart below right, the blue area is plotted first and the red area is stacked on top. The legend is reversed, so that series 2, plotted at higher Y values than series 1, is also listed higher in the legend.

Order of Stacked Area Chart Legend Matches Order of Stacking

In an unstacked bar chart (below left), the legend lists series in the order they appear: red above blue. In a stacked bar chart (below right), the legend also lists series in the order that they are stacked, red to the right of blue.

Order of Stacked Bar Chart Legend Matches Order of Stacking

In an unstacked column chart (below left), the legend lists series in the order they are plotted: blue before red. In a stacked column chart (below right), the legend lists series in the order that they appear, red above blue.

Order of Stacked Column Chart Legend Matches Order of Stacking

Mixed Chart Types (Combination Charts)

The order in which data is plotted in the chart and listed in the legend becomes more complicated when multiple chart types are used in the same chart. The rules are straightforward, but they aren’t documented anywhere, so people get confused.

Series Order by Chart Type

The order that series are plotted in the chart and listed in the legend follows this order of chart types: Area, Column and/or Bar, Line, and XY Scatter. Changing the plot order (by rearranging series in the Select Data Source dialog or by changing the last argument in the Series Formula) will rearrange series within a type, but will not move series out of their plot type order.

Start with this simple data and insert a clustered column chart (usually the default type). Then right click on any series and select Change Series Chart Type from the pop up menu. Change series “Area 1” to an area, keep “Column 2” as a column, and change “Line 3” to a line. The blue area series is drawn behind the other series and listed first in the legend, the red columns are drawn in front of the blue area and listed in the middle, and the gold line is plotted in front of the others and listed last.

Area series are plotted in back of other series, line series are plotted in front, and bar and column series are plotted in between.

Big deal, you may think, that’s the order that the data was arranged in the worksheet. Reverse all that, and the line will be drawn first, behind the others, while the area will be drawn last, obscuring the rest.

Below is the data in reverse order and the resulting column chart. Again, right click on any series and select Change Series Chart Type. Change “Line1” to a line, keep “Column 2” as a column, and change “Area 3” to an area. The gold area series is drawn behind the other series and listed first, the red columns are drawn in front of the gold area and listed next, and the blue line is plotted in front of the others and listed last.

So we see that the chart type dictates the order in which series are drawn and listed, regardless of the order of series data within all of the data in the chart.

We can extend this further to show that an XY Scatter series will be plotted in front of all the other series, regardless of where it falls in the chart source data.

Area series are plotted in back of other series, line series are plotted in front (and XY scatter series in front of lines), and bar and column series are plotted in between.

Primary and Secondary Axes

Earlier I wrote that the order of series by chart type was Area, Column and/or Bar, Line, and XY Scatter. I placed Column and Bar together because if your chart contains both column and bar series, they are plotted in the same layer between areas and lines, and they are listed together in the legend. Their precise order depends on which axis each is assigned to.

Below I’ve added a bar chart series to the first combination chart above. A bar chart series cannot be plotted on the same axis group as another chart type, so in the chart below, the area, column, and line series are plotted on the primary axis, and the bar is plotted on the secondary axis, so that the gray bars are in front of the red columns. Note that the legend order is area first, line last, and column and bar in the middle. Because the column series is on the primary axis, it is listed before the bar series.

Next I’ve added a bar chart series to the second combination chart above. I’ve plotted the bar on the primary axis, while the area, column, and line series are plotted on the secondary axis. Note that the gray bars are behind the red columns. The legend order is still area first, line last, and column and bar in the middle. Because the bar series is on the primary axis, it is listed before the column series.

The following chart has no bar series, but area, column, and line series on each of the primary and secondary axes. All areas are listed first, then all columns, then all lines; within each chart type, the primary series are listed before the secondary series.

Series are plotted and listed in the legend in a particular order, based on series type, axis, and other factors.

Legend ordering can be even more intricate. Below is four series of data and a column chart.

In the chart below left, series 2 and 3 have been moved to the secondary axis. Note that the legend lists series 1 and 4 first for the primary axis, then series 2 and 3.

In the chart below right, series 2 and 3 on the secondary axis have been changed to stacked columns. The legend still lists primary axis series 1 and 4 in that order, but lists secondary series 3 and 2 in the order they re stacked.

Earlier I showed how a line chart series with markers and no line will be plotted in front of a later line chart series with a line (with or without markers). This happens if both series are on the same (e.g., primary) axis.

Below left: red series 2 markers plotted in front of blue series 1 line and markers. Below right: blue series 1 markers plotted in front of red series 2 line and markers.

If the series with markers only is on the primary axis and the series with the line (with or without markers) is on the secondary axis, the primary markers will not be plotted in front of the secondary series.

Below left: blue secondary series 1 plotted in front of red primary axis series 2. Below right: red secondary series 2 plotted in front of blue primary series 1.

Legends with Many Entries

Legends can get complicated when there are many charted series and many entries in the legend.

Legends with Multiple Rows and Columns

The following chart has a legend across the bottom, listing all 8 series in a horizontal row.

If you shrink the width of the chart, eventually the legend will no longer fit. Excel converts the legend to two rows. Shrink the chart further, and the legend will change to three rows, then four.

Excel tries to place the same number of entries into each row: 8 entries in the original legend, then 4+4 entries, then 3+3+2, and finally 2+2+2+2.

It’s a little different with a vertically aligned legend. Below left is the same chart as above, with the legend listing the series along the right edge of the chart. Shrink the chart so that the legend no longer fits, and Excel does not convert it to two columns, instead it simply drops items off the list.

In its automatic legends, Excel doesn’t like the entries to get too close together, but you can manually change the height of the legend, and more entries will fit. In the chart below left, the chart is the same size, and so is the font, but I’ve slightly stretched the legend and all eight entries appear. You can even shrink the legend, and push the legend entries together. (Shrink it too much, and again, you will lose items off the bottom of the list.)

Alternatively, you can widen the legend, and Excel will add a column of legend entries (below right).

If your series names have different lengths but they fit in one row, Excel will position legend entries so the spaces between them are equal.

When you shrink the chart so that the legend reverts to multiple rows, Excel gives all legend entries the same amount of room. Note the wide spaces between the short legend entries below. I’ve used a light gray border on the legend to help illustrate this behavior.

This effect continues with more rows of legend entries.

Partial List of Series

I was working on a research project management dashboard with a client, and he was showing a stacked area chart that showed data about his various projects over time. Like many analyses of this sort, most of the total was due to a small number of items. He wanted to show a list of the top N projects in the legend. He was able to do it once, but then couldn’t remember how he did it.

In my mock-up below, I have eight “projects” stacked up, and I want to show the largest four contributors to the total. These two charts have all eight series listed in the legend.

In the next two charts, I have reduced the sizes of the legends so that half of the legend entries have disappeared. The left chart has the vertical list I want, except it shows the four smallest series. The right chart shows the four largest series, but they are in a horizontal list. But I can start with the horizontal legend to produce the vertical list I want.

Below left, I have moved the legend to the top right corner of the plot area. Below right I have made the legend taller and narrower to force Excel to list the entries in one column.

Finally a little cleanup and I have the desired list of the four largest series.

The client was thrilled when I showed how he could reproduce his top ten list of projects by starting with his legend at the bottom of the chart instead of the right side.

Charting Principles

The Problem: Your Y data is in more than a single row or column.

The Setup: Y data is in multiple rows or columns.

TOROW and TOCOL to the rescue!

Use Names to keep the worksheet clean

Neat Trick: Double Unary Minus

Tips and Tricks

Good Chart Data

Tl;dr Good Chart Data

A Good Tabular Display is not Good Chart Data

Good Chart Data is Contiguous

What About Blank Cells?

Good Chart Data is Lightly Formatted

Good Chart Data is Not Centered

Good Chart Data is in Columns

Good Chart Data Has a Header Row

Good Chart Data Has no Subtotal or Total Rows

Good Chart Data Has a Header Column

First Column Different: Text

First Column Different: Dates

First Column Different: Numbers

First Column Different: Top Left Cell

Spacing and Order of X Values: Numbers and Dates

Evenly Spaced Numbers

Unevenly Spaced Numbers

Numbers Out of Order

Evenly Spaced Dates

Unevenly Spaced Dates

Dates Out of Order

Problem with Numbers and Dates Out of Order

Top Left Cell Plus

The Select Data Source Dialog Recognizes Good Chart Data

Good Chart Data May Be in a Table

Simple Chart Staging Area

Create Chart Staging Area with Power Query

Rearrange the Data with Power Query

Create a Staging Area with a Pivot Table

Recreate the Original Data Layout with a Pivot Table

The Misleading Chart

Fixing the Chart

Improving the Chart

Chart Series and Legends

Single Chart Type

Line Charts

Column and Bar Charts

Effect of Stacking

Mixed Chart Types (Combination Charts)

Series Order by Chart Type

Primary and Secondary Axes

Legends with Many Entries

Legends with Multiple Rows and Columns

Partial List of Series

Other Posts About Legends