Chart Data

Lambda to Unstack Data Range into Multiple Chart Series

Tuesday, February 27, 2024 by Jon Peltier Leave a Comment

Tuesday, February 27, 2024 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

Often your data is stacked, that is, multiple sets of X and Y values are stacked into two columns and identified by a label in another column (below left). When charted, the data is shown as one series (below center). In order to plot it as separate series in a chart (below right), you first need to unstack the data. In a moment I’ll get to a Lambda formula that unstacks data like this.

Stacked Data

The following data set is stacked. Column A has labels identifying which subset each data point (row) belongs in, while columns B and C contain the X and Y values for each point. Plotting the X and Y values results in a single chart series, though sometimes you can tell that the data should be subdivided.

Stacked, sorted data plotted as one series

If the data is randomly sorted (or is that an oxymoron?), you may not even be able to see that the data can be unstacked.

Stacked, unsorted data plotted as one jumbled series

Unstacking Data

This is similar to the problem described in Conditional Formatting of Excel Charts and Conditional XY Charts Without VBA. The splitting of data ranges into multiple series is covered in these articles and also in Split Data Range into Multiple Chart Series without VBA and VBA to Split Data Range into Multiple Chart Series.

For an Excel XY Scatter chart, you need one X column for all chart series and one Y column for each chart series; the X data remains stacked while only the Y data is unstacked. The simple protocol is illustrated below.

New columns (shaded below) are added for each separate series (each set of Y values). The top row (shaded green) has the unique items in the Label column. The rest of the region (shaded orange) uses a formula to put the Y value into the column if the column header matches the value in that row of the Label column; it inserts NA() if the labels do not match, which appears as #N/A in the cell and is not plotted in the chart. The formula in cell D2 and filled down and across to F13 is:

=IF($A2=D$1,$C2,NA())

In modern Dynamic Array Excel, you can fill the new column headers easily with this formula:

=TRANSPOSE(UNIQUE(A2:A13))

You can then enter a single formula in cell D2 to fill the shaded region:

=IF(A2:A13=D1:F1,C2:C13,NA())

Selecting B1:B13 and D1:F13 (use the Ctrl key to select multiple areas) and inserting an XY chart results in the following.

Unstack data into multiple series with worksheet formulas

You can take this simple Dynamic Array approach a bit further and write a single Lambda formula that will populate the entire chart source data range. This means that the chart will dynamically reflect the data range if changing the data results in more labels. The first version of the Lambda I wrote is:

=LAMBDA(input,
  LET(
    headers,TAKE(input,1),
    data,SORT(SORT(DROP(input,1),2),1),
    label,CHOOSECOLS(data,1),
    X,CHOOSECOLS(data,2),
    Y,CHOOSECOLS(data,3),
    labels,TRANSPOSE(UNIQUE(label)),
    split,IF(label=labels,Y,NA()),
    VSTACK(HSTACK(INDEX(headers,2),labels),HSTACK(X,split))
  )
)(A1:C13)

Broken down, the Lambda starts with A1:C13 as its input. The first row of this input is taken as headers, and the first row is dropped, then sorted twice, by column 2 (X) and by column 1 (Labels) to give us the data. The first, second, and third columns of data are chosen to be variables labels, X, and Y. A variable called labels is defined as the unique list of items in the label column. A variable called split generates an array of either Y values or #N/A. Finally, the requisite pieces are stacked together and output into the worksheet.

Diarmuid Early, a colleague on LinkedIn, told me I could use a single SORT command with an array of sort columns, which leads to this streamlined Lambda:

=LAMBDA(input,
  LET(
    headers,TAKE(input,1),
    data,SORT(DROP(input,1),{1,2}),
    label,CHOOSECOLS(data,1),
    X,CHOOSECOLS(data,2),
    Y,CHOOSECOLS(data,3),
    labels,TRANSPOSE(UNIQUE(label)),
    split,IF(label=labels,Y,NA()),
    VSTACK(HSTACK(INDEX(headers,2),labels),HSTACK(X,split))
  )
)(A1:C13)

The screenshot below shows this Lambda formula in cell J1 and the chart that plots the entire spill range J1# (or J1:M13).

Unstack data into multiple series with Lambda formula

You can paste the Lambda formula (excluding the A1:C13 argument and its brackets) into a new Name definition, shown below.

Now the formula can be called anywhere in the workbook as a regular Excel function called SplitXYbyLabel().

Peltier Tech Articles About Conditional Formatting of Excel Charts

Posted: Tuesday, February 27th, 2024 under LAMBDA Function.
Tags: Chart Data, Conditional Charts, LAMBDA Function.
Comments: none

Multiple Rows or Columns as Chart Series Data

Monday, January 22, 2024 by Jon Peltier Leave a Comment

Monday, January 22, 2024 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

The Problem: Your Y data is in more than a single row or column.

If you try to populate a chart series with 2D data where it isn’t allowed, you’ll encounter this error:

Chart series data error: must be a single cell, row, or column.

The reference is not valid. References for titles, values, sizes, or data labels must be a single cell, row, or column.

This isn’t strictly true. Not one of the objects listed is restricted to a single cell. Any text element in a chart (chart or axis titles, data labels, or shapes) can link to multiple cells, but the linked range must be contiguous and in a single row or column; the same is true for the name of a series. The X values of a chart series can be multiple rows or columns, which produce tiered axis labels such as those shown in LAMBDA Function to Build Three-Tier Year-Quarter-Month Category Axis Labels. The Y values of a chart series must link to data in a single row or column.

The Setup: Y data is in multiple rows or columns.

Here is the problem. The data contains more than one row (below left) or column (below right) but want it to be plotted in a single series. If you select the data and insert a chart, Excel parses the data into two chart series. The series formulas are shown below the charts, with font colors matching the series colors.

Original data in multiple rows or columns produces charts with multiple series.

Let’s try to fix this. First, delete the second series of the chart.

Now try to enter the larger range into the series formula.

Just try to assign a multiple row or column range to a series, I dare you!

Excel rejects the changed formula, with the error message described earlier.

There is an exception to the single row or column rule for Y values. You can specify compound (multiple-area) ranges for Y values, as shown below for our multiple row and multiple column data ranges. The multiple areas in a compound range don’t even all need to be all by row or by column.

Assign compound data ranges to chart series data.

This works pretty well, but I think it’s pretty difficult to understand and maintain.

TOROW and TOCOL to the rescue!

Microsoft has released a plethora of new Dynamic Array functions. Among these are TOROW and TOCOL, which are used to arrange values in a 2D range into a new 1D range, shown below under the data ranges. TOROW and TOCOL produce ranges with the values in the same order, so whether we use one or the other is a matter of preference. There are two series formulas below each chart, showing the ranges produced by TOROW and TOCOL.

There is a problem, however. The charts don’t look the same for original data in rows vs in columns. This is because both TOROW and TOCOL take all the cells in the first row of the original data and append all cells in each successive row. This causes the data to be out of order when performing TOROW or TOCOL on columnar data. We can fix this by transposing the data first.

And now all of our charts are consistent.

You could also construct more complicated formulas with other Dynamic Array functions. For example, if I wanted to turn a multiple row range into a single row, I would use:

=LAMBDA(x,
  LET(
    rx,ROWS(x),
    cx,COLUMNS(x),
    MAKEARRAY(
      1,rx*cx,
      LAMBDA(r,c,
        INDEX(x,INT((c-1)/cx)+1,MOD(c-1,cx)+1)
      )
    )
  )
)(multi-row range)

To convert a multiple-column range into a single column, I would use:

=LAMBDA(x,
  LET(
    rx,ROWS(x),
    cx,COLUMNS(x),
    MAKEARRAY(
      rx*cx,1,
      LAMBDA(r,c,
        INDEX(x,MOD(r-1,rx)+1,INT((r-1)/rx)+1)
      )
    )
  )
)(multi-column range)

I’m sure people can write more efficient formulas than this, but the TOROW and TOCOL formulas are very concise.

Use Names to keep the worksheet clean

We can implement TOROW and TOCOL in Names rather than in the worksheet, and the Names work just fine in the chart SERIES formulas. Go to Formulas > Define Name; for Name type YrowTOROW; for Scope select the current sheet (Data); and for Refers to enter =TOROW(), put the cursor between the parentheses, and select C2:F3; then press Enter.

The four relevant names are:

Name: YrowTOROW
Refers To: =TOROW(Data!$C$2:$F$3)

Name: YrowTOCOL
Refers To: =TOCOL(Data!$C$2:$F$3)

Name: YcolTOROWT
Refers To: =TOROW(TRANSPOSE(Data!$C$2:$D$5))

Name: YcolTOCOLT
Refers To: =TOCOL(TRANSPOSE(Data!$C$2:$D$5))

These all produce the same values in the same order in either horizontal or vertical arrays. The chart SERIES formulas do not care. Notice that we applied the lesson from before, of transposing the columnar data before using TOROW or TOCOL; I’ve appended a T on these Names.

Same result as before. Using Names keeps the worksheet cleaner, but I don’t mind seeing the actual data I’m plotting in my worksheet.

Neat Trick: Double Unary Minus

Last week I learned a new trick. Well, it was new to me, but apparently it has been around for a long time, predating Dynamic Arrays by decades. My colleague Roberto Mensa was showing me some of his recent charting exercises (check them out at E90E50Charts – Excel Charts Gallery) and he showed me this trick.

You can use a double unary minus, that is, a double minus sign, to force an Excel chart to treat a multiple row or column range as a single array. The double minus is used to convert TRUE and FALSE to 0 and 1, to convert text into numeric values, and in this case to convert a range into an array. When a 2D array is passed to a chart series, it combines all rows of the array into a 1D array.

The double minus must be used in a Name in order to work with chart data. You can define the following names for row-based or column-based data:

Name: YrowMINUS
Refers To: =--Data!$C$2:$F$3

Name: YcolMINUST
Refers To: =--TRANSPOSE(Data!$C$2:$D$5)

Define them in the scope of the workbook Data, and as before, transpose the columnar data first. Then you can edit the SERIES formulas to use these Names.

This double minus approach doesn’t take precedence over the Names that use TOROW or TOCOL, of course, it’s just another tool for your toolbox.

Posted: Monday, January 22nd, 2024 under Chart Data, Charting Principles, Dynamic Arrays.
Tags: Chart Data, Dynamic Arrays.
Comments: none

Unlink Chart Data

Thursday, June 23, 2022 by Jon Peltier 2 Comments

Thursday, June 23, 2022 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

There are occasions when you may want to break the link between a chart and its underlying data. Maybe you copied the chart from another workbook, and you no longer have access to that workbook. Maybe you want to avoid the headaches that may arise from pasting a chart into PowerPoint or another program. Maybe you’re just tired of seeing this warning when you open the file:

Security Warning: Automatic update of links has been disabled

There are several ways to disconnect your chart from its data source.

Chart Data

First let’s review chart data. I’ve written a lot about chart data, including

Good Chart Data – The definitive description
The Excel Chart SERIES Formula – also definitive
Change Series Formula – Improved Routines
How to Edit Series Formulas
Simple VBA Code to Manipulate the SERIES Formula and Add Names to Excel Chart Series
Edit Series Formulas

Below is a simple chart. A series is selected so the SERIES formula appears in the formula bar and the ranges in the formula are highlighted in the worksheet.

A chart, its SERIES formula, and its data highlighted in the worksheet

The SERIES formula looks like this:

=SERIES(Sheet1!$C$2,Sheet1!$B$3:$B$8,Sheet1!$C$3:$C$8,1)

The arguments in the formula describe the sources of the data.

=SERIES(Series_Name,X_Values,Y_Values,Plot_Order)

Series_Name can be a link to a worksheet range, text (enclosed in double quotes), or blank.
X_Values can be a link to a worksheet range, an array enclosed in curly braces, or blank (and the chart will use 1, 2, 3, … for its X values).
Y_Values can be a link to a worksheet range, or an array enclosed in curly braces.
Plot_Order is an whole number between 1 and the number of series in the chart, signifying the order in which the series is drawn (complicated by chart type and axis group).

The cell addresses in the SERIES formula always use absolute references, such as $A$1, not relative references, like A1. But if you manually type relative references and press Enter, they will be converted to absolute references. The addresses always include the worksheet name.

If the chart links to data in another open Excel workbook, the SERIES formula includes the workbook name in square brackets before the worksheet name.

=SERIES('[Data Source.xlsm]Sheet1'!$C$2,'[Data Source.xlsm]Sheet1'!$B$3:$B$8,'[Data Source.xlsm]Sheet1'!$C$3:$C$8,1)

If the chart links to data in a closed Excel workbook, the SERIES formula includes the path, then the workbook name in square brackets, and finally the worksheet name.

=SERIES('C:\Long Path[Data Source.xlsx]Sheet1'!$C$2,'C:\Long Path[Data Source.xlsx]Sheet1'!$B$3:$B$8,'C:\Long Path[Data Source.xlsx]Sheet1'!$C$3:$C$8,1)

Copy a Picture of the Chart

One way to represent an unlinked chart is to copy a picture of the chart, then paste it where desired.

Select the chart, then on the Home tab of Excel’s ribbon, under the Copy dropdown, select Copy as Picture…

Copy Picture command in the Excel Ribbon

… then select the appropriate options (usually Bitmap instead of Picture; I haven’t been able to figure out the difference between on screen vs. as printed) …

Then go to the other application, and Paste.

The disadvantage to this technique is that the pasted picture is no longer an Excel chart. You can no longer format any of the chart elements (rescale the axes, change marker styles or colors, etc.). Therefore, this method is unsuitable for use within Excel.

Change the Cell References to Hard-Coded Values

You can unlink chart data and still retain the actual chart with its formatting capabilities by editing the SERIES formula. Recall that the series formula in our first chart above was:

=SERIES(Sheet1!$C$2,Sheet1!$B$3:$B$8,Sheet1!$C$3:$C$8,1)

where the arguments referred to various links to the series data

=SERIES(Series_Name,X_Values,Y_Values,Plot_Order)

Select the series so that the SERIES formula appears in the formula bar, click in the formula bar so that the cursor is in the formula, and press F9. This keystroke converts references in the formula to their values:

=SERIES("My Data",{"Jan","Feb","Mar","Apr","May","Jun"},{93,76,116,286,225,327},1)

Series_Name becomes "MyData", X_Values becomes {"Jan","Feb","Mar","Apr","May","Jun"}, and Y_Values becomes {93,76,116,286,225,327}. Plot_Order is unchanged, of course, because it can only be a number, never a reference.

Press Esc to revert to the formula with references, or press Enter to keep the formula with hard-coded values.

If you select just one of the references in the formula, the F9 key only converts that reference to its value. These SERIES formulas are all valid:

=SERIES("My Data",Sheet1!$B$3:$B$8,Sheet1!$C$3:$C$8,1) =SERIES(Sheet1!$C$2,{"Jan","Feb","Mar","Apr","May","Jun"},Sheet1!$C$3:$C$8,1) =SERIES(Sheet1!$C$2,Sheet1!$B$3:$B$8,{93,76,116,286,225,327},1) =SERIES(Sheet1!$C$2,{"Jan","Feb","Mar","Apr","May","Jun"},{93,76,116,286,225,327},1)

Automate with VBA

Any repetitive task that you can do manually, VBA can do faster with much less tedium.

Simple VBA Algorithm

To unlink chart data from all series in the active chart, simply run this code:

Sub DelinkChartFromData0()
  If ActiveChart Is Nothing Then
    MsgBox "Select a chart and try again", vbExclamation, "No Active Chart"
  Else
    Dim srs As Series
    For Each srs In ActiveChart.SeriesCollection
      ' Convert X Values to arrays of values
      srs.XValues = srs.XValues
      ' Convert Y Values to arrays of values
      srs.Values = srs.Values
      ' Convert series name to text
      srs.Name = srs.Name
    Next srs
  End If
End Sub

More Flexible Code

That’s nice enough, but I like to adjust a sub like this by including an argument, so I can pass in the object I want to process from any entry point. The function corresponding to this is

Sub DelinkChartFromData1(cht As Chart)
  Dim srs As Series
  For Each srs In cht.SeriesCollection
    ' Convert X Values to arrays of values
    srs.XValues = srs.XValues
    ' Convert Y Values to arrays of values
    srs.Values = srs.Values
    ' Convert series name to text
    srs.Name = srs.Name
  Next srs
End Sub

To process the active chart, I would call it with this entry point:

Sub DelinkActiveChartFromData1()
  If Not ActiveChart Is Nothing Then
    DelinkChartFromData1 ActiveChart
  End If
End Sub

To process all charts on the active sheet, I would use this:

Sub DelinkAllChartsFromData1()
  Dim chob As ChartObject
  For Each chob In ActiveSheet.ChartObjects
    DelinkChartFromData1 chob.Chart
  End If
End Sub

To select on or more charts to process, and ignore the rest, I use this approach:

Sub DelinkSelectedChartsFromData1()
  If Not ActiveChart Is Nothing Then
    DelinkChartFromData1 ActiveChart
  ElseIf TypeName(Selection) = "DrawingObjects" Then
    Dim shp As Shape
    For Each shp In Selection.ShapeRange
      If shp.HasChart Then
        DelinkChartFromData1 shp.Chart
      End If
    Next
  End If
End Sub

In fact, the last sub is all I need, since it does the active chart if there is one, replacing DelinkActiveChartFromData1, and it does any selected charts, so I could select all charts and run it to mimic DelinkAllChartsFromData1.

Fix the Date Axis

I was working on an example to show that it works for chart with lots of data points, and I happened to use dates as my X values.

I used DelinkSelectedChartsFromData1 and got the resulting chart.

The SERIES formula is pretty long, but is nowhere near the limit (see Excel Chart Series Size Limits).

=SERIES("Value",{36526,36557,36586,36617,36647,36678,36708,36739,36770,36800,36831,36861,36892,36923,36951,36982,37012,37043,37073,37104,37135,37165,37196,37226,37257,37288,37316,37347,37377,37408,37438,37469,37500,37530,37561,37591,37622,37653,37681,37712,37742,37773,37803,37834,37865,37895,37926,37956,37987,38018,38047,38078,38108,38139,38169,38200,38231,38261,38292,38322,38353,38384,38412,38443,38473,38504,38534,38565,38596,38626,38657,38687,38718,38749,38777,38808,38838,38869,38899,38930,38961,38991,39022,39052,39083,39114,39142,39173,39203,39234,39264,39295,39326,39356,39387,39417,39448,39479,39508,39539,39569,39600,39630,39661,39692,39722,39753,39783,39814,39845,39873,39904,39934,39965,39995,40026,40057,40087,40118,40148,40179,40210,40238,40269,40299,40330,40360,40391,40422,40452,40483,40513,40544,40575,40603,40634,40664,40695,40725,40756,40787,40817,40848,40878,40909,40940,40969,41000,41030,41061,41091,41122,41153,41183,41214,41244,41275,41306,41334,41365,41395,41426,41456,41487,41518,41548,41579,4160 9,41640,41671,41699,41730,41760,41791,41821,41852,41883,41913,41944,41974,42005,42036,42064,42095,42125,42156,42186,42217,42248,42278,42309,42339,42370,42401,42430,42461,42491,42522,42552,42583,42614,42644,42675,42705,42736,42767,42795,42826,42856,42887,42917,42948,42979,43009,43040,43070,43101,43132,43160,43191,43221,43252,43282,43313,43344,43374,43405,43435,43466,43497,43525,43556,43586,43617,43647,43678,43709,43739,43770,43800,43831,43862,43891,43922,43952,43983,44013,44044,44075,44105,44136,44166,44197,44228,44256,44287,44317,44348,44378,44409,44440,44470,44501,44531,44562,44593,44621,44652,44682,44713},{1.59180141468432,2.98367529941487,3.53968832672474,4.166896463019,5.49680350931305,6.60295325255014,7.92084805050608,8.42969525036707,9.03114848703803,10.5276619316506,11.6955893543266,12.7967815520366,13.9389611218954,14.7527663401269,15.056329735329,16.5993323777924,17.3515548037874,18.5751579754327,19.5310520490484,20.1024924707731,21.0585061447489,22.5723596205116,23.6423776067453,24.1433918431237,25 .8809999743178,26.9737430334888,27.2520490776847,28.9524027272893,29.1845029748314,30.9521946135505,31.6902106518488,32.6795741650195,33.0489731513874,34.3260591677559,35.6901951915975,36.5728985726155,37.399073960589,38.3469907111962,39.9779540070637,40.6458619797685,41.8270884435173,42.4800724464723,43.5366272985535,44.8875676117536,45.2053138791033,46.1944737321369,47.3914095233474,48.8633128550379,49.5985742034472,50.7164885501005,51.653002976414,52.173798695412,53.797369677567,54.336065160202,55.2589150631074,56.4371639546982,57.3504724537478,58.7576090168706,59.8397486773469,60.5570284284911,61.2384044080596,62.8073951248627,63.0734139584437,64.023944739976,65.4569502217557,66.441196439216,67.8040167607499,68.4732296280772,69.383544349419,70.6736709532193,71.8772550111681,72.7285185169436,73.0304159679703,74.4543215360901,75.6139703970569,76.6795157883573,77.9474917376374,78.8675574588127,79.2434751623046,80.8059856927601,81.9149434410426,82.0822571567436,83.1165049108337,84.5766331546624,85.0485197930 954,86.4516756787662,87.103472072234,88.6166607931416,89.5569751902823,90.2997637601853,91.4861009853597,92.738947366043,93.1212971468925,94.0166978760269,95.1888503720746,96.8377506737342,97.3036391294494,98.126217895739,99.339467893605,100.259054804445,101.840983590986,102.741718964783,103.082626950564,104.362149881911,105.760866364373,106.396030094412,107.226463877175,108.638218224396,109.379526574316,110.109027223762,111.420326031626,112.61771263781,113.468694206656,114.619213431217,115.278679680971,116.466011457757,117.537068146506,118.753523891166,119.704656141103,120.131070166454,121.822525894342,122.977159915521,123.640235472236,124.383000359254,125.608871736392,126.188489945214,127.414144604553,128.39601244846,129.115889297518,130.08797401699,131.159077879218,132.410010962127,133.535396508137,134.718791752223,135.443733080568,136.661870239714,137.29210822438,138.55656753413,139.308552070942,140.27987742426,141.456569779806,142.774325544055,143.243355927806,144.505167482867,145.111184143168,146.30748 3083571,147.464847220224,148.218427246658,149.769670436945,150.776656088281,151.515371748972,152.408346863735,153.29435331721,154.857749604455,155.212400832703,156.31398296817,157.430827877581,158.068214294863,159.246613771057,160.346285238108,161.703341756362,162.939983684318,163.581827103078,164.41963298477,165.632147153497,166.697413290467,167.83618109404,168.878480595484,169.817382059171,170.125826262302,171.174106119149,172.182319412083,173.683778271622,174.350253903931,175.280205756401,176.628020353313,177.602771096537,178.468667385597,179.645508668238,180.131396317678,181.258745156011,182.238164094744,183.821618545389,184.261328363069,185.932899850116,186.187494356367,187.586391685274,188.120772297618,189.794395110081,190.509727284739,191.179753691424,192.54771311589,193.831688905835,194.638416224214,195.906316127413,196.845527552348,197.02489756276,198.382679905726,199.860935825007,200.422827058144,201.110300744418,202.209577950068,203.499633489411,204.12638600329,205.314473205528,206.095829209215,20 7.070675561553,208.875077186077,209.071848934263,210.37641200788,211.34927750657,212.666535668482,213.400957572161,214.438684951166,215.464275837814,216.580366174425,217.797909408301,218.167376086905,219.366623356776,220.548163602681,221.316902416248,222.044068195978,223.356501739512,224.984206101239,225.14746583972,226.275382045765,227.310864151405,228.851337545147,229.175137589426,230.520419824664,231.361806010337,232.175186942816,233.450560796329,234.269397102542,235.272652848491,236.560089595496,237.571235289495,238.499139612489,239.133135844129,240.582849816789,241.827775202926,242.833982805547,243.988020788098,244.077263106469,245.085954088788,246.671199871053,247.086129757983,248.030390822573,249.043987276631,250.071893428251,251.525524796095,252.600631102785,253.754358498755,254.665241202158,255.625959629045,256.98261131318,257.320204395814,258.444509485282,259.533974191495,260.492799645349,261.603579334917,262.194476107706,263.615554634444,264.987806666296,265.830581271325,266.39664304882,267.294282 876198,268.348630086833,269.253293622165,270.082557692269},1)

Unfortunately, the X axis tick labels have lost their date formatting. In the sub, the srs.XValues = srs.XValues converted the input dates into numbers, because internally Excel stores dates as the number of days since 1 January 1900. Easy enough to apply the date format manually.

Unlinked Chart with dates on axis as category labels

Apparently, the nice spacing that comes with an actual date axis is gone. The chart above has dates, but the axis is a simple category axis. I’ll have to adjust the procedure to prevent a date axis from being changed into a category axis. Before I convert the links to values, I will apply the number format to the axis ticks. Then if it is a date axis, I will make sure the chart treats it as one; this is complicated by the fact that Excel often applies the date category type automatically based on the data.

Sub DelinkChartFromData2(cht As Chart)
  Dim iGrp As XlAxisGroup
  For iGrp = xlPrimary To xlSecondary
    Dim ax As Axis
    Set ax = cht.Axes(xlCategory, iGrp)
    With ax
      ' apply formats
      .TickLabels.NumberFormat = .TickLabels.NumberFormat
      If IsDateAxis(ax) Then
        ' apply date type
        ax.CategoryType = xlTimeScale
      End If
    End With
  Next
  Dim srs As Series
  For Each srs In cht.SeriesCollection
    ' Convert X Values to arrays of values
    srs.XValues = srs.XValues
    ' Convert Y Values to arrays of values
    srs.Values = srs.Values
    ' Convert series name to text
    srs.Name = srs.Name
  Next srs
End Sub

This function tests whether an axis is a date category type. The BaseUnit property is undefined unless the axis is a date axis.

Function IsDateAxis(ax As Axis) As Boolean
  If ax.Type = xlCategory Then
    Dim vTest As Variant
    On Error Resume Next
    vTest = ax.BaseUnit
    IsDateAxis = (Err.Number = 0)
    On Error GoTo 0
  End If
End Function

The corresponding entry point that I call the above with is familiar:

Sub DelinkSelectedChartsFromData2()
  If Not ActiveChart Is Nothing Then
    DelinkChartFromData2 ActiveChart
  ElseIf TypeName(Selection) = "DrawingObjects" Then
    Dim shp As Shape
    For Each shp In Selection.ShapeRange
      If shp.HasChart Then
        DelinkChartFromData2 shp.Chart
      End If
    Next
  End If
End Sub

The resulting chart is now indistinguishable from the original:

Unlink the Chart and Axis Titles

It’s easy to link many of a chart’s text elements to a worksheet range. Select the text element, click in the formula bar, type = and click on the cell or range containing the text you want displayed. The result is a link formula like =Sheet1!$A$1, and the text element updates dynamically to display whatever is in the reference. This works for the chart title, axis titles, data labels, and textboxes and other shapes that contain text.

If you’re delinking the chart’s data, you probably want to delink the titles in the chart. A simple VBA routine to do just that is shown below. For each possible title, see if the formula begins with an equals sign (if not, the formula just shows the text), and if so, replace the title’s text with the title’s text.

Sub UnlinkTitles()
  If Not ActiveChart Is Nothing Then
    With ActiveChart
      If .HasTitle Then
        If Left$(.ChartTitle.Formula, 1) = "=" Then
          ' convert chart title link to text
          .ChartTitle.Text = .ChartTitle.Text
        End If
      End If
      Dim iAx As XlAxisGroup
      For iAx = xlCategory To xlSeriesAxis
        Dim iGrp As XlAxisType
        For iGrp = xlPrimary To xlSecondary
          If .HasAxis(iAx, iGrp) Then
            With .Axes(iAx, iGrp)
              If .HasTitle Then
                If Left$(.AxisTitle.Formula, 1) = "=" Then
                  ' convert axis title link to text
                  .AxisTitle.Text = .AxisTitle.Text
                End If
              End If
            End With
          End If
        Next
      Next
    End With
  End If
End Sub

Let’s merge this into our last Delink The Chart routine (I’ve also included the test for a date axis:

Sub DelinkChartFromData3(cht As Chart)
  With cht
    If .HasTitle Then
      If Left$(.ChartTitle.Formula, 1) = "=" Then
        ' convert chart title link to text
        .ChartTitle.Text = .ChartTitle.Text
      End If
    End If
    Dim iAx As XlAxisType
    For iAx = xlCategory To xlSeriesAxis
      Dim iGrp As XlAxisGroup
      For iGrp = xlPrimary To xlSecondary
        If .HasAxis(iAx, iGrp) Then
          Dim ax As Axis
          Set ax = .Axes(iAx, iGrp)
          With ax
            If .HasTitle Then
              If Left$(.AxisTitle.Formula, 1) = "=" Then
                ' convert axis title link to text
                .AxisTitle.Text = .AxisTitle.Text
              End If
            End If
            If iAx = xlCategory Then
              ' apply formats
              .TickLabels.NumberFormat = .TickLabels.NumberFormat
              If ax.Type = xlCategory Then
                Dim vTest As Variant
                On Error Resume Next
                vTest = ax.BaseUnit
                If (Err.Number = 0) Then
                  ' apply date type
                  ax.CategoryType = xlTimeScale
                End If
                On Error GoTo 0
              End If
            End If
          End With
        End If
      Next
    Next
    Dim srs As Series
    For Each srs In .SeriesCollection
      ' Convert X Values to arrays of values
      srs.XValues = srs.XValues
      ' Convert Y Values to arrays of values
      srs.Values = srs.Values
      ' Convert series name to text
      srs.Name = srs.Name
    Next srs
  End With
End Sub

And we’ll call it using the familiar entry point:

Sub DelinkSelectedChartsFromData3()
  If Not ActiveChart Is Nothing Then
    DelinkChartFromData3 ActiveChart
  ElseIf TypeName(Selection) = "DrawingObjects" Then
    Dim shp As Shape
    For Each shp In Selection.ShapeRange
      If shp.HasChart Then
        DelinkChartFromData3 shp.Chart
      End If
    Next
  End If
End Sub

Posted: Thursday, June 23rd, 2022 under Data Techniques.
Tags: Chart Data, SERIES Formula.
Comments: 2

Excel Chart Series Size Limits

Monday, March 28, 2022 by Jon Peltier 2 Comments

Monday, March 28, 2022 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

How many points can I plot in each series of my chart? How large of a VBA array can I plot in my chart? Two good questions, which I’ll investigate here.

A colleague emailed me asking about the VBA array size limit for plotting in a chart. He said he thought the limit was 32,000 points, but couldn’t find any official documentation of this, and his trials only worked for half that many points. I couldn’t find any documentation of any limit on how large a VBA array can be used to populate a chart.

tl;dr

The number of points in a chart series populated by worksheet ranges is limited by available memory, as the spec states. This limit can be greater, perhaps much greater, than the number of cells in a worksheet column.

The number of points in a chart series populated by a VBA array is 32,000 if the array is a 2-dimensional vertical array. The limit drops to 16,384 if the array is a 1-dimensional horizontal array.

VBA Arrays as Chart Series Data

I’ll start with the VBA question.

If you generate data in VBA using arrays, you can plot this data in two ways:

Put the arrays into a worksheet, and plot the ranges that contain the data;
Put the arrays directly into the chart.

Below is a simple VBA procedure that generates small arrays for the X and Y data, creates a scatter chart with one series, and puts the arrays into the .Values and .XValues properties of the series.

Sub ChartWithVBAArrays()
  ' declare arrays
  Dim X(1 To 10) As Variant
  Dim Y(1 To 10) As Variant
  
  ' populate arrays
  Dim i As Long
  For i = 1 To 10
    X(i) = i
    Y(i) = i
  Next
  
  ' create the chart
  Dim cht As Chart
  Set cht = ActiveSheet.Shapes.AddChart2(240, xlXYScatter).Chart
  
  ' populate the chart
  With cht.SeriesCollection.NewSeries
    .Name = "VBA Arrays"
    .Values = Y
    .XValues = X
  End With
End Sub

The resulting chart (Scatter with Markers and No Lines) looks like this.

Scatter chart with ten points, based on two ten-element VBA arrays.

Here is what the SERIES formula looks like:

=SERIES("VBA Arrays",{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},1)

The arrays are simply shown as comma-separated lists of values, enclosed in curly braces.

Easy enough. Now let’s see how much data we can put into those arrays.

VBA to Explore Series Length

I made two procedures: PlotManyPoints which had two arguments, the number of points to (try to) plot, and the chart to plot those points in; and TEST_PlotManyPoints, which pops up an InputBox asking me how many points I wanted to plot.

Sub TEST_PlotManyPoints()
  Dim cht As Chart
  ' in case I forget to select the chart
  If Not ActiveChart Is Nothing Then
    Set cht = ActiveChart
  Else
    Set cht = ActiveSheet.ChartObjects(1).Chart
  End If
  ' ask how many points
  Dim n As Long
  n = Application.InputBox("How many points?", , , , , , , 1)
  ' run the procedure below
  PlotManyPoints n, cht
End Sub

Sub PlotManyPoints(n As Long, cht As Chart)
  Dim x As Variant, y As Variant
  ReDim x(1 To n), y(1 To n)
  ' build arrays
  Dim i As Long
  For i = 1 To n
    x(i) = i
    y(i) = i
  Next
  ' apply arrays to chart
  With cht.SeriesCollection(1)
    .Values = y
    .XValues = x
    Dim Points As Long
    Points = .Points.Count
  End With
  ' report results
  ActiveSheet.Range("B3:B4").Value2 = WorksheetFunction.Transpose(Array(n, Points))
End Sub

The chart was already present, so the program simply replaced the chart’s existing data with the new data, and it recorded in the worksheet how many points were in the arrays and how many points were plotted in the chart.

I started small, with 10 points. Here is the output worksheet with an XY Scatter Chart with Lines and No Markers.

Output of VBA array testing for 10 points.

Then I tried 100 points, 1000 points, 10,000 points…

Output of VBA array testing for 10,000 points.

A brief aside while I obsess about the SERIES formula…

I’ve written extensively about Excel’s chart SERIES formula:

The chart SERIES formula keeps showing all of the X and Y values in their arrays for longer than I expected. Here is the SERIES formula for 10 points:

=SERIES(,{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},1)

Here it is for 100 points:

=SERIES(,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100},{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100},1)

Here it is for, well, I won’t show you 1000 points, but the series formula showed all 1000 X and Y values. In fact, the series formula was complete all the way up to 1042 points. The formula was 8221 characters long, slightly more than the stated limit of 8192 in Microsoft’s documentation. If I try to edit the formula and hit Enter, Excel warns me:

Don't try to enter a formula with more than 8192 characters, even if Excel did it first.

When I plot 1039 points, the SERIES formula has 8191 characters. I can edit this formula without changing its length, and Excel will accept the change.

Beyond 1042 points, Excel starts leaving off characters in the array of Y values, then of the X values, to keep to the almost-limit of 8221. At 2000 points, the end of the 8221-character formula looks “broken”:

,1852,1853,1854,1855,1856,1857,1858,1859,1},{1,2,3,4,5,6,7,8,9,}

The X values are truncated in the middle of point 1860, the Y values after point 9, and we’ve lost the final plot order argument and closing parenthesis. Despite the lack of fidelity of the series formula display, however, Excel plods, I mean, plots along.

After reopening a workbook that has such “broken” formulas, the formulas can no longer be read. If you select the series, Excel leaves the formula bar blank, and shows this message.

You can still format the series and the rest of the chart. And you can still access all 32,000 data points in VBA using the .XValues and .Values properties of the series object.

In the old days (Excel 2003), the limit on the SERIES formula was 1024 characters, each of the four arguments was limited to one-fourth of this, and if the SERIES formula was “broken” by having too many characters, Excel would refuse to plot anything. The limit using this methodology was 81 points.

Excel 2003 had a much lower limit on VBA arrays in chart series. Also, it was way uglier.

If your data had more than one or two digits, the limit quickly became smaller. Try more than the limit and Excel gave you a nasty error.

Excel 2003 wouldn't even try to overload the SERIES formula.

Can you believe I opened up Excel 2003 to try this? I keep a virtual machine around with Excel 2003 just to try stuff like this, because I sure can’t rely on my memory. And by the way, get offa my lawn!

Back to the main topic

When I tried 100,000 points, I hit another limit: the arrays contained 100,000 points, but the chart only accepted 16,384 of them.

Output of VBA array testing for 100,000 points: only 16,384 were plotted.

So my result validates what my colleague told me: a VBA array can populate much less than the 32,000 points that we unofficially think is the limit.

But that number 16,384 is not a nice round number like 32,000. In fact, it looks like a power of 2. It happens to be 2^14, which is also the number of columns in an Excel worksheet.

A one-dimensional array such as I used in this code is a horizontal array. Perhaps the chart can only accept a horizontal array with as many values as there are columns.

Smarter VBA to Explore Series Length

We can make a two-dimensional array in VBA readily enough. I’ve modified my earlier routine PlotManyPoints to do just that. My arrays x and y are dimensioned with 1 to n rows and 1 to 1 columns.

Sub PlotManyPoints(n As Long, cht As Chart)
  Dim x As Variant, y As Variant
  ' 2-dimensional vertical arrays
  ReDim x(1 To n, 1 To 1), y(1 To n, 1 To 1)
  ' build arrays
  Dim i As Long
  For i = 1 To n
    x(i, 1) = i
    y(i, 1) = i
  Next
  ' apply arrays to chart
  With cht.SeriesCollection(1)
    .Values = y
    .XValues = x
    Dim Points As Long
    Points = .Points.Count
  End With
  ' report results
  ActiveSheet.Range("B3:B4").Value2 = WorksheetFunction.Transpose(Array(n, Points))
End Sub

I ran this for 10, 100, 1000, 10,000, and finally 100,000 points, and I hit the limit of 32,000 points when I tried to add 100,000.

Output of vertical VBA array testing for 100,000 points: only 32,000 were plotted.

So 32,000 points is the limit for VBA arrays as chart series source data. If you need more, you could do one of two things:

Put the longer arrays into a worksheet, and plot the ranges that contain this data;
Break the arrays into smaller 32,000-element arrays, and use these smaller arrays to populate separate series in the chart. Even with the limit of 255 series per chart (which has not changed), you’re allowed 8,160,000 points.

Where Does 32,000 Come From?

So where does this VBA array limit of 32,000 come from? Microsoft’s own Excel specifications and limits informs us that Excel 2007 had a limit of 32,000 points per chart series, but the limit for Excel 2010 and later is “Limited by available memory”. I can’t find anything other than hearsay and speculation about the limit on VBA arrays.

I suspect that somewhere in Excel’s source code, there is still a hard-coded 32,000 chart series limit that nobody can find. This limit is ignored for worksheet data but is still enforced intentionally or otherwise for VBA data.

File Sizes for 32,000 Points

One reason people cite for using VBA arrays instead of worksheet ranges is that the arrays result in smaller workbooks. I made four workbooks to investigate this. One workbook contained data for 32,000 points, but no chart. One workbook contained the data and a chart that plotted the data. One workbook contained no data but had a chart that was copied from one of the first two workbooks, so it was still linked to the workbook with the data. The last workbook had no data, and a chart populated with 32,000 row VBA arrays. The results are summarized in the table below.

The workbook containing only the data needs 413 kb of storage. Add a chart, and the workbook increases to 721 kb. Interestingly enough, if the chart is alone in the workbook and links to a different workbook, the file size is about the same as if its parent workbook contained the data, 724 kb. Most interesting, the workbook with only the VBA-populated chart, with its “broken” and hidden SERIES formula, was smallest of all, at 316 kb.

So the conventional wisdom is correct, and a VBA-populated chart requires a smaller workbook than a chart that links to worksheet data. In any case, these are not enormous files, and I feel safer with the data accessible in the worksheet rather than hidden by VBA.

Worksheet Ranges as Chart Series Data

What does “Limited by available memory” mean?

To finish my analysis, I thought it would be informative to see how many regular worksheet data points I could squeeze into a chart series.

I set up data in columns A and B of a worksheet that start with 1 in row 1 and continue down to 1,048,576 in row 1,048,576 (the last row in the worksheet). I plotted this in an XY chart (line and no markers) and got all of the points into the chart.

Scatter chart containing data for all 1,048,576 rows of the worksheet.

Note that hard-coded values work much nicer than formulas in these million-plus-row ranges. It takes a long time to calculate such a lot of cells.

So the limit appears to be 1,048,576 points.

Multiple-Area Worksheet Ranges as Chart Series Data

Then I thought it might be possible to extend this. Let’s examine the following ranges and charts, and apply some SERIES formula magic.

I can plot A1:B4, shown in the first chart below.

Data and charts showing one series, two series, and one series combining both sets of data.

The SERIES formula looks like this:

=SERIES(Sheet1!$B$1,Sheet1!$A2:$A4,Sheet1!$B2:$B4,1)

I can add the data in D1:E4 as a second series (orange) in the middle chart, with the following formula:

=SERIES(Sheet1!$E$1,Sheet1!$D2:$D4,Sheet1!$E2:$E4,2)

But I can also combine the X and Y values to create a single series using both sets of data, as shown in this series formula and the third chart below:

=SERIES("A,D vs B,E",(Sheet1!$A$2:$A$4,Sheet1!$D$2:$D$4),(Sheet1!$B$2:$B$4,Sheet1!$E$2:$E4$),1)

The X values are Sheet1!$A$2:$A$4 and Sheet1!$D$2:$D$4, comma-separated and enclosed in parentheses. Likewise, the Y values are Sheet1!$B$2:$B$4 and Sheet1!$E$2:$E$4, also comma-separated and enclosed in parentheses.

Let’s add more full columns of X and Y data.

Data for scatter chart of two series, both using all 1,048,576 rows of the worksheet.

I’ve already plotted the first series above with this formula:

=SERIES("A vs B",Sheet1!$A:$A,Sheet1!$B:$B,1)

I can add another series with the data in columns D and E with this formula:

=SERIES("D vs E",Sheet1!$D:$D,Sheet1!$E:$E,2)

Which produces this chart:

Scatter chart of two series, both using all 1,048,576 rows of the worksheet.

But as I’ve shown above, I can combine the two sets of X and Y data into one series’ worth of X and Y data, using this formula:

=SERIES("A,D vs B,E",(Sheet1!$A:$A,Sheet1!$D:$D),(Sheet1!$B:$B,Sheet1!$E:$E),1)

When I first entered this SERIES formula, I got an error, and I thought that I’d found a hard limit. But I went back and fixed a typo, and it all worked. I ended up with this chart:

Scatter chart of one series, using all 1,048,576 rows of the worksheet twice.

So I don’t know what the limit is, except that it is in fact “Limited by available memory,” and this limit is more than the number of rows in a worksheet column. I leave it as an exercise for the reader to determine how many full columns they can squeeze into the SERIES formula.

When the data is comprised of static values in cells, it isn’t too slow to build and display these charts. If you have formulas that calculate the X and Y values, though, this calculation takes a minute or more each time, and it becomes very tedious while you wait for Excel to start responding again.

File Sizes for Whole Columns of Points

I tabulated file sizes for 32,000 points above to show that populating charts with VBA arrays can reduce file sizes. For completeness here are file sizes needed for charting full columns of data. These are large files. I’ll let you draw any further conculsions.

Posted: Monday, March 28th, 2022 under Data Techniques.
Tags: Chart Data, SERIES Formula, VBA.
Comments: 2

Good Chart Data

Thursday, November 18, 2021 by Jon Peltier 7 Comments

Thursday, November 18, 2021 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

Tips and Tricks

Everybody loves their Excel tips and tricks. I know I’m famous for my extensive collection of Excel charting tricks. But the most important charting trick is Get The Data Right. The secret of successful Excel charting begins and ends with Good Chart Data. I teach it in all of my charting workshops and seminars, so I was surprised that I did not have a blog post dedicated to good chart data. But now I do.

Good Chart Data

What do I mean by “good” data? Of course, “good” has to do with the quality of the data. Where is the data from? How was the data measured or collected? Is the source reliable and trustworthy?

All of this is important, for any data you consume in Excel or in any other program. But right here, “good” data means data that can easily be rendered in a chart, without having to make excessive adjustments to the chart. Good chart data may not be good display data, but data that has been optimized for display, say in that annual shareholder’s report, is almost guaranteed to be bad for charting.

“Good” data has a layout that makes chart creation easy for you, so Excel knows how to partition the data between the important parts of the chart. X values or categories. Y values. Series names.

“Good” data also has as little formatting as possible. Enough formatting to make it readable, but not so much that it causes retina pain.

You can make good charts from bad data, but it will take longer, and you will have to muck around with the chart for a while to make it work. It’s better to spend five minutes with the data now than to spend five hours trying to clean up a chart later.

Tl;dr Good Chart Data

Good chart data has the following characteristics:

The data exists in a contiguous range: no blank rows or columns
The data is minimally and consistently formatted: easy to validate visually
The data is aligned with Y values in columns (shaded blue below)
X values are located in the leftmost column (purple)
Series names are located in the topmost row (red)
The top left cell may provide some magical behavior (gold)
The data may exist in an Excel Table

A Good Tabular Display is not Good Chart Data

What if you want to plot your data but also display it in an optimally formatted way? The good new is, you can. The bad news is, it takes more work. You should have two different data ranges: one arranged for best charting outcomes, the other formatted for visual consumption. Both of these ranges should link to the same original source, to make sure changes are reflected everywhere, and you maintain one version of the data.

Good Chart Data is Contiguous

A good chart data range is contiguous, that is, it is comprised of data in a single block of cells. There are no completely blank rows or columns separating the data into separate areas.

Why is contiguous data important? Because Excel tries to be smart when you click a button. If you select any single cell in the range shown above and click any chart button to insert a chart, Excel tries to determine the extent of your data. Excel will go from the active cell up, down, left, and right until it finds either the edges of the worksheet or a blank row or column, and it will include all of the data within this range for the chart. This is so much easier than having to select the entire data range, especially if the data extends beyond the first visible screen.

Excel does the same hunt for the current region when you convert a range to a Table, or create a Pivot Table, or do any number of other things in Excel.

What About Blank Cells?

It’s perfectly fine to have blank cells within the data range, as long as there are no completely blank rows or columns.

Keep in mind that successful charting means that these “blank” cells should be completely empty. A cell with a formula that returns “” is not blank, because it contains a formula. If you have the formula return NA() instead of “”, the cell will not look blank, it will contain the #N/A error. This looks ugly, but a chart treats most #N/A values as blank cells, while it treats “” as text. Text might be plotted as a zero, or it might mess up an axis.

A cell that contains a space character is also not blank, because it contains that space character. Sometimes people get in the habit of typing the space instead of just skipping the cell. This kind of non-blank cell is hard to fix because both the cell and the formula bar appear blank.

Good Chart Data is Lightly Formatted

You should only use as much formatting as you need to help you quickly scan the data.

The range below is a bit over-formatted. The header row is bold and multi-colored, and the red and yellow are distracting and may eventually cause eyestrain. Bold text and light gray fills might be okay for headers, though they are unnecessary if the headers follow one or more blank rows.

The dark borders on all of the cells break up the appearance of the data range and the dark lines compete for attention with the numbers. The default light gray cell borders are sufficient. I’ve programmed some buttons in my general purpose and charting software to apply light and medium gray borders to selected ranges.

The green shading can be distracting. (See my arbitrary pattern? All cells with 3 are shaded, while b and d are shaded because they have no 3 in their rows. I don’t know why.) If you find shading helps to identify certain values, don’t manually color the cells: use Conditional Formatting instead, so the shading goes away if the condition is no longer met.

Good Chart Data is Not Centered

You should avoid centering your data. It might look “nice”, but centering hides important characteristics of the data. By default, text is left-aligned in cells and numbers are right-aligned. A common feature of “bad” data is numbers stored as text, and centering everything hides this distinction.

Here is a data range with all of its cells centered. There are some small triangular flags in some cells that indicate a possible error, but I think most of us have learned to ignore warnings like this.

When we uncenter the data, it’s easy to detect the cells with numbers stored as text, even if we’ve ignored the green flags.

We can select the flagged cells, click the little warning dropdown, and convert them to numbers.

Now that we’ve converted text to numbers, we still have difficulty parsing the numbers, because they have inconsistent numbers of decimal digits. Some numbers look larger than others, but they merely have a longer tail after the decimal point.

When we apply a consistent format, we can tell that the larger numbers are in fact longer.

It’s now very easy to identify the largest numbers in each column. You might want to check that value of 345 in the “beta” column since it’s 100 times as large as the rest of the column.

Good Chart Data is in Columns

Excel charts can work with data in columns or in rows. You can use either arrangement and sometimes one just works better than the other. If a chart’s source data has more rows than columns, Excel creates the chart with series in columns. A pivot chart always plots the pivot table with series in columns.

It is probably good practice to get used to using data in columns, because of the way a database table is structured. A database has fields and records. A field is a variable or measurement, such as date, eye color, serial number. A record is a single instance of a set of values for these fields.

When printed on paper or viewed onscreen (or imported into Excel), a database table is shown as a grid of rows and columns. Each column is a field or a variable, and each row is a record. The first row of the database table is a header row that contains the field label. Usually one column of the table, often the first column, contains a unique identifier or key for that particular record.

In the table above, we have fields for TLC, alpha, beta, and gamma. We have records for a, b, c, d, e, and f.

Good Chart Data Has a Header Row

Like a database table, chart data should have a header row. When data is plotted with series in columns, this header row is used for series names, that is, the labels that appear in the legend of the chart.

Database tables have one header row only. Chart data ranges usually only have one row, but you can use more, often to good effect.

Good Chart Data Has no Subtotal or Total Rows

Subtotals and totals help to understand data, but they have no place in the source data for a chart. Suppose I have monthly values I want to plot.

If I put subtotal rows into my source data, it breaks up the visual appearance, so it’s hard to scan the individual values for discrepancies.

The quarterly subtotals disrupt the flow of data in the chart, and the much larger magnitude of the subtotals shrinks the monthly values.

Chart Monthly Values and Quarterly Subtotals

If I include yearly totals, the quarterly subtotals shrink, and the monthly values are lost in the weeds.

Chart Monthly Values, Quarterly Subtotals, and Yearly Totals

Note that you can filter the quarterly and yearly categories from the chart in recent versions of Excel. Also note that a pivot chart will only show values from the pivot table and not totals and subtotals.

Good Chart Data Has a Header Column

Just as a database table has a unique key field, the data range for a chart should have a header column identifying each row. This column is generally used for labels or values which are plotted along the X-axis of a chart.

This X-axis column should be the first column, to the left of any Y-axis values. This makes charting easier because Excel looks to the left for X values. But you would be amazed at how many people have trouble plotting their data when they have placed their Y values to the left of their X values. You can always tell Excel which data comes from where, but it is a lot more work, especially if you have to do it repeatedly, again and again, ad nauseam (can you feel the tedium?).

Another trick to help Excel identify your X values is to make the column of X values different from the Y values.

First Column Different: Text

Probably the most common way to make the first column different is by filling it with text. While month names are a component of a date, a list of month names is text, as shown here.

When you select the data range (or one cell inside the data range), Excel uses the text labels in the first column as X-axis (category axis) labels, it uses the other columns as Y-axis (value axis) values, one column per series, and it uses the labels in the header row as series names. Here are line and column charts using this data (and area charts work the same way). Months from the first column are automatically placed along the category axis, headers from the first row are automatically used as legend entries (series names). and each column of Y values is plotted as a distinctly formatted series.

Line and Column Charts Made from Data with Text in First Column

It’s the same with a bar chart, except that the category and value axes are switched. That’s right, in a bar chart, the X-axis is vertical and the Y-axis is horizontal. But the origin of the axes is at the bottom left in all of these charts, so values increase from left to right (like months advance from left to right in a line or column chart). Similarly, months advance from Jan to Jun moving bottom to top in a bar chart (like values increase from bottom to top in a line or column chart).

Bar Chart Made from Data with Text in First Column

Confused? Since the months are listed from top to bottom in the worksheet, it would make sense to plot them from top to bottom in the chart. But there is logic in how Excel does it, and it is also pretty easy to fix when you know how. See Excel Plotted My Bar Chart Upside-Down for the simple technique.

XY Scatter charts are different: they have numerical axes for both X and Y (category and value) axes. When you plug in text for the X values, the chart doesn’t know what to do. Normally text is considered to have a value of zero, but for X values in an XY chart, Excel substitutes the counting numbers 1, 2, 3, up to the number of points. (Excel also does this if no X values have been specified.)

XY Scatter Chart Made from Data with Text in First Column

What if ever have an XY chart with the numbers 1, 2, 3, etc. along the axis, and you know you selected numerical data for the X values? Check the range of X values: there is probably a number stored as text, or an actual text label, somewhere in the range.

First Column Different: Dates

Using dates in the first column is another way to make the first column different. Excel recognizes the date formatting of the cell and parses the column as X values. An added bonus is that line, column, area, and bar charts have a special date type of axis (as opposed to the text axis shown above) that provides unique formatting options. This range has dates in the first column; notice that the dates are not uniformly spaced, but are taken on the 1st and 8th of each month.

Here are line and column charts made from this data. The line chart looks great and shows some of the enhanced date axis features. The data points are not equally spaced but reflect the non-uniform spacing of the data. Also, there is a tick mark and axis label on the first of each month, despite the nonuniform month lengths.

Line and Column Charts Made from Data with Dates in First Column

The column chart is wacky though. To accommodate the nonuniform data spacing, the chart has a slot for each day, even days without data. So each column chart series has to appear within the slot for its given day, and there are lots of days in between. This column chart should really be called a toothpick chart.

You can always change the axis type to text, and the column chart will look normal. But you lose the non-uniform nature of the dates and the first-of-the-month labels.

Line and Column Charts Made from Data with Dates in First Column, but with Text Axis

An XY scatter chart treats dates for X values just like any numerical X values. You can see the non-uniform spacing of the data.

XY Scatter Chart Made from Data with Dates in First Column

But the axis is not labeled as naturally as the line chart above. Using the value axis algorithm, Excel picked an axis that begins at 44,180 and increases to 44,280 in steps of 20. (Excel stores dates as whole numbers starting on 1 January 1900, so 44,180 is 15 December 2020 and 44,280 is 25 March 2021.) You can format the XY chart’s axis to begin on the 1st of a month, and have a tick label on the 1st of the next month. But for a non-trivial number of months, it’s impossible to repeat this pattern with Excel’s default axis labels.

First Column Different: Numbers

If you have numerical X values in the first column, they won’t be different from the numerical Y values in the other columns. You know those values in the first column are years, and I know they are years, and the header label even says “Year”, but Excel simply recognizes them as numbers.

This doesn’t matter for an XY Scatter chart: Excel almost always treats the first column of numbers as X values. This XY chart shows years along the X-axis, as intended.

XY Scatter Chart Made from Data with Numbers in First Column

When Excel created this line chart, it saw the numbers in the first column and decided to plot them as Y values. There are two consequences of this: First, there is a series of values near 2000 floating far above the intended Y values in the chart; second, no X values were specified, so Excel simply used the counting numbers 1, 2, 3, etc.

Line Chart Made from Data with Dates in First Column

You could avoid this by selecting just the Y values when the chart is created, and specifying the X values later. You could also do this with years by entering dates in the first column (1/1/2015, 1/1/2016, etc.) and formatting them using a custom number format of YYYY, which will display just the year numbers. Since the column now is formatted as dates, Excel will plot them the way you want.

You cannot avoid this by formatting the first column of numbers as text, Excel is too smart, and it converts the numeric text as numbers, and therefore as Y values.

First Column Different: Top Left Cell

You may have noticed the label “TLC” in the header row of the first column. TLC stands for Top Left Cell, and it is a little piece of magic for Excel chart data.

One way to make the first column different from the rest is to clear the contents of the Top Left Cell, as shown below.

Data with Numbers in First Column and Top Left Cell Blank

We already know that an XY Scatter chart will plot the first column of numbers as X values with a label in the top left cell. But it also works if the top left cell is blank.

XY Scatter Chart Made Using Data with Numbers in First Column and Top Left Cell Blank

The top left cell works its magic in line (and area, column, and bar) charts. This line chart was generated automatically with the column below the blank top left cell as X values, and the columns below actual labels as Y values. Finally, a way to plot numbers as X values in a line chart without worrying about formats (text or dates).

Line Chart Made Using Data with Numbers in First Column and Top Left Cell Blank

Here is an important difference between XY scatter charts and line charts. Just because you can trick Excel into plotting numbers as X-axis values in a line chart, you can’t trick Excel into plotting those X values as numbers. The next section shows a few different sets of data that will help illustrate this difference.

Spacing and Order of X Values: Numbers and Dates

Evenly Spaced Numbers

Evenly spaced X values seem to be similarly plotted in XY and line charts. In the XY chart, the X-axis begins at zero and extends beyond the highest X value. In the line chart, the first X-axis label is the first X value and the last X-axis label is the last X value, without the padding found in a scatter chart’s default axis values.

Unevenly Spaced Numbers

Unevenly spaced X values are plotted differently in XY and line charts. In the XY chart, the X-axis begins at zero and extends beyond the highest X value, and data points are plotted unevenly, reflecting the unevenness in the X values. In the line chart, the first X-axis label is the first X value and the last X-axis label is the last X value, data points and X-axis labels are uniformly spaced regardless of their apparent numerical values, and there are no X-axis labels for missing X values.

It is obvious that the line chart treats the X values as non-numeric text labels, ignoring any apparent numerical values. This is because the default axis for non-date X values is a text axis (below left). We can change this to a date axis (below right), and we see the data points are now plotted according to their unevenly-spaced numerical values. The first X-axis label is still the first X value and the last X-axis label is still the last X value. (Excel treats the numbers as dates with a format of “D”, so only the day shows.)

Different Treatment of Unevenly Spaced Numbers by Text Axis and Date Axis in a Line Chart

Numbers Out of Order

When numbers are out of order, the above behavior is even more different. The XY scatter chart draws the points and the lines connect them in order, moving left or right as the X values decrease or increase. The line chart shows the points in the order they appear in the data range: the first X-axis label is still the first X value and the last X-axis label is still the last X value, so the labels are out of order.

When we convert the text axis (below left) to a date axis (below right), we see that not only have the points been spaced according to their non-uniform values, but that the dates have been internally sorted prior to plotting. This internal sorting is a feature of charts with a date axis. The first X-axis label is the smallest X value and the last X-axis label is the largest X value, so the labels are in order and the points are plotted left to right.

Different Treatment of Numbers Out of Order by Text Axis and Date Axis in a Line Chart

Evenly Spaced Dates

Let’s look at the same charts with dates instead of regular numbers. The XY scatter chart (left) and line chart (right) plot evenly spaced dates along the X axis in a similar way. The data is evenly spaced, so the data points are plotted evenly. The XY chart extends its X-axis a little bit below to a little bit above its data range, while the line chart uses the earliest date as the first axis label and the latest date as the last axis label.

Unevenly Spaced Dates

The XY scatter and line charts also plot unevenly spaced dates similarly. The points are spaced unevenly to reflect the pattern in the data. As before, the XY chart extends its X-axis a little bit below to a little bit above its data range, while the line chart uses the earliest date as the first axis label and the latest date as the last axis label.

Dates Out of Order

Out of order dates are plotted in an XY scatter chart in the order they appear in the worksheet. The lines connecting the points start at the first point, and move left or right for earlier or later dates. In the line chart, the lines always connect from left to right, reflecting the internal sorting that takes place. As always, the XY chart extends its X-axis a little bit below to a little bit above its data range, while the line chart uses the earliest date as the first axis label and the latest date as the last axis label.

Problem with Numbers and Dates Out of Order

Perhaps it’s convenient that Excel’s line charts sort by date prior to plotting, and there are a few tricks that rely on this internal sorting. But this sorting can also lead to problems with data labels.

Here is a simple data set, with dates, sorted values, and days of the week corresponding to the dates. The chart uses dates as X values and sorted values as Y values, and the days of the week were used to label the points, using the Values as Cells option. The labels are shown in order from Monday through Friday, as expected.

Data Labels work nicely when dates are sorted in the worksheet.

Below is a data set, with the same information as above, but with the dates out of order. The chart looks the same as above, since the X and Y values are sorted by date before plotting. But look closely at the data labels. These were not sorted prior to plotting, and are no longer in order from Monday to Friday.

Data Labels don't sort when out-of-order dates are sorted in the chart.

In addition to this problem with data labels, it is likely that someone who is using the data for other purposes may misinterpret the data because they don’t notice or understand Excel’s internal sorting. For these reasons, it is a best practice to sort the worksheet data before plotting your data.

Thanks to alert reader Jim Chisholm for reminding me of this problem.

Top Left Cell Plus

We’ve seen how a blank top left cell can help Excel to parse data correctly into X and Y values and series names.

A Blank Top Left Cell Helps Excel Parse the Chart Data Range

But this concept of blank cells is more magical than that. Suppose you have two columns of category labels, for years and quarters. Each of these label columns has a blank cell in its header (blank cells are shaded gold). Each year only appears once, next to the first quarter, and there are blank cells next to the other quarters.

Excel sees the two blanks in the top left, and uses both columns as category labels. There are two rows of labels, with quarter labels in the first row and years in the second. Each year label is centered under the corresponding quarter labels, and a vertical tick mark extends from the axis to help delineate the labels.

Multiple Blank Cells in the Category Labels Region of the Chart Data Range Can Produce Multiple-Tier Labels

This is a very nice effect, only available for line, column, area, and bar charts that are using a text style X-axis. It will not work for a date axis or for an XY scatter chart’s X-axis: in those cases you will get the numbers 1, 2, 3, etc.

You may have seen this kind of data arrangement in a Pivot Table, with multiple fields in the rows area, and noticed the multiple-tier category labels in the corresponding Pivot Chart. But you’re not stuck needing a Pivot Table, you can build this data layout yourself. You are not limited to two tiers of labels, either: below you can see a three-tiered axis, and I’ve seen it used for 5 or 6 tiers.

The Multiple-Tier Axis Effect is not Limited to Two Tiers

What if you indicate to Excel that you have two rows of series names? The same effect applies. The North label is combined with the alpha and beta series names, What if you indicate to Excel that you have two rows of series names? The same effect applies. The North label is combined with the alpha and beta series names, while the South labels is combined with the gamma and delta series names, to generate compound names.

Multiple Blank Cells in the Series Names Region of the Chart Data Range Can Produce Compound Series Names

You can combine the effects of multiple-tier category axis labels and compound series names in the same chart, as shown below.

The Multiple-Tier Axis and Compound Series Names Effects Can Be Used Together

Note that these blank cells must be totally blank, and not just look blank like a formula that returns a value of “”. And while a chart can treat #N/A as a blank in a chart’s Y values, #N/A will be treated as a text label, not as a blank, when used in the series name and category labels regions of the data range.

The Select Data Source Dialog Recognizes Good Chart Data

When a chart’s data doesn’t conform to these definitions of “good” data, the Select Source Data dialog shows only a blank for the chart data range. Below the range selection box, you are told “The data range is too complex to be displayed.” This means that the data is irregular. Not all series start and end at the same row, perhaps, or the series have different numbers of points. Perhaps the series names are misaligned from the Y values, or the series are plotted out of order. Anything that prevents Excel from indicating a nice rectangular block of data.

Select Data Source Dialog for Bad Chart Data

When the data does conform to our “good” data definition, the Select Source Data dialog happily displays the address of the data range.

Select Data Source Dialog for Good Chart Data

In fact, the Select Source Data dialog is a bit more forgiving that my rules. If the data is separated by complete blank rows or columns, but otherwise fits within a rectangular range, the dialog shows the addresses of the various areas of the data range.

Select Data Source Dialog for Discontiguous but Otherwise Uniform Rectangular Chart Data

Good Chart Data May Be in a Table

An Excel Table is a special data structure which provide advanced data handling capabilities; I have converted my data into a Table below. The header row has buttons for filtering and sorting of the Table. You can add a total row if desired. There are numerous styles you can apply, some of them approaching hideous; the Table below shows the default style.

There are many benefits to storing your chart data in a Table. The major benefit is that when you add data to the row below or the column to the right of a Table, the Table automatically expands to include the added data. What’s more, any formulas that reference a Table column or row will update automatically if the Table expands or contracts. This includes chart SERIES formulas and the Chart Data Range formula in the Select Source Data dialog. So if your chart uses a Table like this for its source data, when you add rows of data, each series in the chart will add the corresponding points; when you add columns to the Table, your chart will add series. Dynamic charts made easy!

One drawback to using a Table for your chart’s source data is that the header row of a Table can contain no blank cells. This means that a lot of the blank-cell based data parsing, especially the top left cell magic, may not work with a Table. But if your first column(s) are not numerical, Excel will still automatically parse them into X values. And you could still select just part of the Table, create your chart, then manually specify the X values.

Posted: Thursday, November 18th, 2021 under Charting Principles.
Tags: Chart Data.
Comments: 7