Dynamic Arrays

Multiple Rows or Columns as Chart Series Data

Monday, January 22, 2024 by Jon Peltier Leave a Comment

Monday, January 22, 2024 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

The Problem: Your Y data is in more than a single row or column.

If you try to populate a chart series with 2D data where it isn’t allowed, you’ll encounter this error:

Chart series data error: must be a single cell, row, or column.

The reference is not valid. References for titles, values, sizes, or data labels must be a single cell, row, or column.

This isn’t strictly true. Not one of the objects listed is restricted to a single cell. Any text element in a chart (chart or axis titles, data labels, or shapes) can link to multiple cells, but the linked range must be contiguous and in a single row or column; the same is true for the name of a series. The X values of a chart series can be multiple rows or columns, which produce tiered axis labels such as those shown in LAMBDA Function to Build Three-Tier Year-Quarter-Month Category Axis Labels. The Y values of a chart series must link to data in a single row or column.

The Setup: Y data is in multiple rows or columns.

Here is the problem. The data contains more than one row (below left) or column (below right) but want it to be plotted in a single series. If you select the data and insert a chart, Excel parses the data into two chart series. The series formulas are shown below the charts, with font colors matching the series colors.

Original data in multiple rows or columns produces charts with multiple series.

Let’s try to fix this. First, delete the second series of the chart.

Now try to enter the larger range into the series formula.

Just try to assign a multiple row or column range to a series, I dare you!

Excel rejects the changed formula, with the error message described earlier.

There is an exception to the single row or column rule for Y values. You can specify compound (multiple-area) ranges for Y values, as shown below for our multiple row and multiple column data ranges. The multiple areas in a compound range don’t even all need to be all by row or by column.

Assign compound data ranges to chart series data.

This works pretty well, but I think it’s pretty difficult to understand and maintain.

TOROW and TOCOL to the rescue!

Microsoft has released a plethora of new Dynamic Array functions. Among these are TOROW and TOCOL, which are used to arrange values in a 2D range into a new 1D range, shown below under the data ranges. TOROW and TOCOL produce ranges with the values in the same order, so whether we use one or the other is a matter of preference. There are two series formulas below each chart, showing the ranges produced by TOROW and TOCOL.

There is a problem, however. The charts don’t look the same for original data in rows vs in columns. This is because both TOROW and TOCOL take all the cells in the first row of the original data and append all cells in each successive row. This causes the data to be out of order when performing TOROW or TOCOL on columnar data. We can fix this by transposing the data first.

And now all of our charts are consistent.

You could also construct more complicated formulas with other Dynamic Array functions. For example, if I wanted to turn a multiple row range into a single row, I would use:

=LAMBDA(x,
  LET(
    rx,ROWS(x),
    cx,COLUMNS(x),
    MAKEARRAY(
      1,rx*cx,
      LAMBDA(r,c,
        INDEX(x,INT((c-1)/cx)+1,MOD(c-1,cx)+1)
      )
    )
  )
)(multi-row range)

To convert a multiple-column range into a single column, I would use:

=LAMBDA(x,
  LET(
    rx,ROWS(x),
    cx,COLUMNS(x),
    MAKEARRAY(
      rx*cx,1,
      LAMBDA(r,c,
        INDEX(x,MOD(r-1,rx)+1,INT((r-1)/rx)+1)
      )
    )
  )
)(multi-column range)

I’m sure people can write more efficient formulas than this, but the TOROW and TOCOL formulas are very concise.

Use Names to keep the worksheet clean

We can implement TOROW and TOCOL in Names rather than in the worksheet, and the Names work just fine in the chart SERIES formulas. Go to Formulas > Define Name; for Name type YrowTOROW; for Scope select the current sheet (Data); and for Refers to enter =TOROW(), put the cursor between the parentheses, and select C2:F3; then press Enter.

The four relevant names are:

Name: YrowTOROW
Refers To: =TOROW(Data!$C$2:$F$3)

Name: YrowTOCOL
Refers To: =TOCOL(Data!$C$2:$F$3)

Name: YcolTOROWT
Refers To: =TOROW(TRANSPOSE(Data!$C$2:$D$5))

Name: YcolTOCOLT
Refers To: =TOCOL(TRANSPOSE(Data!$C$2:$D$5))

These all produce the same values in the same order in either horizontal or vertical arrays. The chart SERIES formulas do not care. Notice that we applied the lesson from before, of transposing the columnar data before using TOROW or TOCOL; I’ve appended a T on these Names.

Same result as before. Using Names keeps the worksheet cleaner, but I don’t mind seeing the actual data I’m plotting in my worksheet.

Neat Trick: Double Unary Minus

Last week I learned a new trick. Well, it was new to me, but apparently it has been around for a long time, predating Dynamic Arrays by decades. My colleague Roberto Mensa was showing me some of his recent charting exercises (check them out at E90E50Charts – Excel Charts Gallery) and he showed me this trick.

You can use a double unary minus, that is, a double minus sign, to force an Excel chart to treat a multiple row or column range as a single array. The double minus is used to convert TRUE and FALSE to 0 and 1, to convert text into numeric values, and in this case to convert a range into an array. When a 2D array is passed to a chart series, it combines all rows of the array into a 1D array.

The double minus must be used in a Name in order to work with chart data. You can define the following names for row-based or column-based data:

Name: YrowMINUS
Refers To: =--Data!$C$2:$F$3

Name: YcolMINUST
Refers To: =--TRANSPOSE(Data!$C$2:$D$5)

Define them in the scope of the workbook Data, and as before, transpose the columnar data first. Then you can edit the SERIES formulas to use these Names.

This double minus approach doesn’t take precedence over the Names that use TOROW or TOCOL, of course, it’s just another tool for your toolbox.

Posted: Monday, January 22nd, 2024 under Chart Data, Charting Principles, Dynamic Arrays.
Tags: Chart Data, Dynamic Arrays.
Comments: none

Improved Excel Lambda Moving Average

Tuesday, February 28, 2023 by Jon Peltier 9 Comments

Tuesday, February 28, 2023 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

I recently wrote about how I wrote an Excel Lambda Moving Average formula. I started with internet searches, which led to formulas which didn’t work or formulas which were too complicated for me to understand. After a brief and amusing foray into ChatGPT, I built myself a workable formula that worked by generating a running sum, subtracting an earlier point’s running sum from the current point’s running sum, and dividing by the number of points. This worked fine for a simple N-point moving average, but it broke down for N-day moving averages where some days in the data set may be missing. And there are probably better ways to do the moving average without calculating running sums.

Jon’s Previous Moving Average Formula

I developed the following LAMBDA moving average formula, as described in my previous post:

=LAMBDA(datarange,numpoints,
  LET(
    runsum,SCAN(,datarange,LAMBDA(a,b,a+b)),
    BYROW(
      SEQUENCE(ROWS(datarange)),
      LAMBDA(x,
      IF(
        x<numpoints,
        NA(),
        (INDEX(runsum,x)-IF(x=numpoints,0,INDEX(runsum,x-numpoints)))
        /numpoints)
      )
    )
  )
)

Let me partially deconstruct the formula below, showing internal calculations leading to the moving average. The LAMBDA takes the data range and numpoints, the number of points in the moving average, as arguments. The first column contains 25 values in the data range. Column 2 is the element number x of the output array which is created using BYROW. The third column contains the running sum, created using SCAN. The fourth column is the same running sum offset by numpoints. The fifth column shows the moving average calculation, which is the running sum minus the running sum numpoints points ago divided by numpoints. Where x is less than numpoints, the result is #N/A, because there aren’t enough points to divide by numpoints. Where x equals numpoints, zero is subtracted from the running sum, since the running sum of zero points is zero.

Here is how it looks in action. The data range is in B5:B29, the number of points being averaged is in C2, and the moving range formula is entered into cell C5 and spills down to C29. The first several calculated values are #N/A, until we have as many points as we are averaging (the number in C2).

Improvement #1: CHOOSEROWS

I exchanged several comments on my earlier post with a smart reader named Henk-Jan van Well, who suggested several improvements to the moving average algorithm, starting with CHOOSEROWS.

CHOOSEROWS allows you to select which rows of an array to use. CHOOSEROWS(array,3) returns the 3rd row of that array. CHOOSEROWS(array,{2,4}) returns the 2nd and 4th rows. Our approach uses SEQUENCE to generate a list of row numbers to return, and we simply use AVERAGE to, well, average these rows.

=LAMBDA(datarange,numpoints,
  MAKEARRAY(
    ROWS(datarange),,
    LAMBDA(r,c,
      IF(
        r<numpoints,
        NA(),
        AVERAGE(
          CHOOSEROWS(
            datarange,
            SEQUENCE(numpoints,,1+r-numpoints)
          )
        )
      )
    )
  )
)

Here is a deconstruction of the formula. Again, the LAMBDA moving average takes the data range and numpoints, the number of points in the moving average, as arguments. The data range is in the first column, the output row number r is in the first row (I’m only showing the first 18 points, but you get the idea). For the first 4 points (less than numpoints), the function returns #N/A. After that, CHOOSEROWS selects the indicated values from the data range. These are averaged in the last row, which is returned by the function.

The calculations and chart look the same as above.

CHOOSEROWS-Based Lambda Moving Average Formula

Improvement #2: Calculate Averages for First Points

Henk-Jan also mentioned an averaging scheme used by some econometricians for points that occur before the official number of points to average. I don’t like that definition of averaging early points, but I have an alternative, which simply averages however many points there are. This actually simplifies our formula.

=LAMBDA(datarange,numpoints,
  MAKEARRAY(
    ROWS(datarange),,
    LAMBDA(
      r,c,
      AVERAGE(
        CHOOSEROWS(
          datarange,
          SEQUENCE(
            MIN(r,numpoints),,
            MAX(1,1+r-numpoints)
          )
        )
      )
    )
  )
)

Here is a deconstruction of the new formula. The LAMBDA takes the data range and numpoints as arguments. The data range is in the first column, the output row number r is in the first row (only showing the first 18 points). CHOOSEROWS selects the indicated values from the data range; note that fewer than numpoints are selected for the first few columns. The selected values are averaged in the last row, which is returned by the function.

The results look the same as before, except that the moving average begins right at the first point.

Compute Averages for Initial Points in Series

Improvement #3: TAKE and DROP

The array functions TAKE and DROP allow you to keep or remove rows and columns from the beginning and end of an array. This is simpler than generating a list of row numbers to pass into CHOOSEROWS to define a moving array to average.

For each row r of the moving average array we create, we TAKE the first r rows, then we DROP all but numpoints from the beginning (and if r is less than numpoints, we don’t DROP any rows).

=LAMBDA(datarange,numpoints,
  MAKEARRAY(ROWS(datarange),1,
    LAMBDA(r,c,
      LET(
        movingdatarange,
        DROP(
          TAKE(datarange,r),
          MAX(0,r-numpoints)
         ),
        AVERAGE(movingdatarange)
      )
    )
  )
)

I’ve only deconstructed a few points below. The LAMBDA moving average takes the data range (which is r rows tall) and numpoints as arguments. The function uses TAKE to take the first r rows, then uses DROP to drop the first r–numpoints rows, or zero if r<numpoints. For point 3, the function takes the first 3 rows, then drops zero rows, and returns the average of these three values. For point 10, the function takes the first 10 rows, drops 5 rows (r–numpoints = 10-5), and averages these 5 (numpoints) rows. For point 23, the function takes the first 23 rows, drops 18 rows (r–numpoints = 23-5), and averages these 5 (numpoints) rows.

The result looks the same, with the moving range starting right at the beginning. The LAMBDA formula may be a bit shorter, but it also seems more reliable when the number of points may vary (as when averaging by dates below).

TAKE- and DROP-Based Lambda Moving Average Formula

An easier alternative is the following, where we TAKE the first r rows for each array element, then TAKE the last numpoints rows of this. If you try to TAKE more than the number of rows in an array, you simply get the entire array.

=LAMBDA(datarange,numpoints,
  MAKEARRAY(ROWS(datarange),1,
    LAMBDA(r,c,
      LET(
        movingdatarange,
        TAKE(
          TAKE(datarange,r),
          -numpoints
         ),
        AVERAGE(movingdatarange)
      )
    )
  )
)

Again I’ve only deconstructed a few points below. The LAMBDA takes the data range (which is r rows tall) and numpoints as arguments. The function uses TAKE to take the first r rows, then uses TAKE again to take the last numpoints rows; if r<numpoints then TAKE only takes as many rows as are available. For point 3, the function takes the first 3 rows, then takes all available rows, and returns the average of these three values. For point 10, the function takes the first 10 rows, takes the last 5 (numpoints) of these rows, and averages these rows. For point 23, the function takes the first 23 rows, then keeps the last 5 (numpoints) of these rows, and averages these rows.

Same results, but the function is a few characters shorter and may be easier to understand.

Major Enhancement: Moving Average by Date

If a range has values for all dates, then a moving average by date is the same as a moving average by point. A seven-day moving average is a seven-point moving average. But if some dates are missing, then a seven-point moving average will average points from more than seven days. I want to average only data which falls within a seven-day window, which means averaging fewer than seven points if some days are missing. The screenshot below illustrates the problem. There will be a missing value when a measurement is not made on a given day. Note for example, that 1/9/2023 and 1/10/2023 are missing.

Here is the LAMBDA formula, which takes the dates, the data values to be averaged, and numdays, the number of days to average, as inputs.

=LAMBDA(daterange,datarange,numdays,
  MAKEARRAY(ROWS(datarange),1,
    LAMBDA(r,c,
      LET(
        firstdate,
        XMATCH(
          INDEX(daterange,r)-numdays+1,
          daterange,
          1),
        movingdatarange,
        DROP(
          TAKE(datarange,r),
          MAX(0,firstdate-1)
        ),
        AVERAGE(movingdatarange)
      )
    )
  )
)

In the first section of the deconstruction below I show row number (tan-shaded), the end date for that average, which is the date for that row number, the allowed start date, which is the end date minus numdays+1, and the actual start date, which is the earliest date on or after the allowed start date. The blue shaded dates show which rows have different allowed and actual start dates.

In the second section of the deconstruction, I show dates (gold-shaded) and values in the first two columns. In the first row is the row number r of the output array (tan-shaded). The remaining columns show the dates that fall between the allowed start date and end date for each row. The green-shaded columns show which columns contain fewer than numdays dates. Not very many of the rows actually average numdays values.

The LAMBDA moving average formula uses the TAKE/DROP approach from the previous section to average the appropriate values.

Below is the output of the formula.

Here is comparison of a 7-point (orange line) and a 7-day (blue line) moving average. The blue shaded cells show where the two are not the same.

This is the case when I track my weight: while traveling or when I wake up too late or am just too busy in the morning to weigh myself, then I don’t have a row for the date I’ve missed. This formula treats such occasions nicely.

Sample Workbook

You can click on this link to download a workbook that contains these examples: Improved Excel Lambda Moving Average.xlsx.

Excel Lambda Moving Average

Wednesday, February 15, 2023 by Jon Peltier 10 Comments

Wednesday, February 15, 2023 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

Moving Averages

When averaging time-series data, you often want to smooth out peaks and valleys. A moving average is an easy way to smooth your data. When I track my weight, for example, I use a 7-day moving average. This smooths out peaks associated with weekends when I might go out to eat and enjoy a beer or two.

The image below shows 25 random data points and a five-point moving average. The points were generated with this Dynamic Array formula in cell B5:

=RANDARRAY(25,,0,10,TRUE)

and the moving average was calculated with this formula in cell C5, filled down to C29:

=IF(
  COUNT(OFFSET($B5,0,0,-$C$2,1))=$C$2,
  AVERAGE(OFFSET($B5,0,0,-$C$2,1)),
  NA()
)

When building large, complicated LAMBDA formulas, it has become common to enhance the readability of the formulas with line feeds (use Alt+Enter to insert a line feed in the Formula Bar) and spaces. I find it helps with older formulas as well.

25 data points and a 5-point moving average

This moving average formula needs to be placed in each row of the moving average range. If it’s in a Table, that’s no big deal, because adding more data will automatically fill the formula into added Table rows.

But the data was the result of a Dynamic Array formula in just cell B5, and the output spilled down as far as the formula required. It would be nice to build a Dynamic Array formula for moving average which is written just in cell C5 but spills down as far as the Dynamic Array it averages.

There are many formulas you can use to calculate a moving average, using variations of INDEX and OFFSET formulas. Incidentally, if you don’t need the moving average values in the worksheet, you can use an Excel chart’s trendline feature to display the moving average.

Internet Search for Dynamic Array Moving Average

I tried my hand at writing my own formula and got stuck almost immediately. I searched Bingle to see what I could find.

I found a lot of possible answers. The ones that seemed easy didn’t work. The ones that worked were very complicated, and I really didn’t understand them very well. I finally settled on one from Lambda Moving Average – calculate rolling sum in Excel (and much more). The original version of this function included a parameter that lets you choose whether to calculate a moving average or other moving statistical functions. I cleaned out all the other functions and was left with the moving average below:

=LAMBDA(x,window,
  LET(
    _x, x,
    _w, window,
    _thk, LAMBDA(x, LAMBDA(x)),
    _fn, _thk(LAMBDA(x, AVERAGE(x))),
    _i, SEQUENCE(ROWS(x)),
    _s, SCAN(
      0,
      _i,
      LAMBDA(a, b, IF(b < _w, NA(), _thk(MAKEARRAY(_w, 1, LAMBDA(r, c, INDEX(_x, b - _w + r))))))
    ),
    _out, SCAN(0, _i, LAMBDA(a, b, _fn()(INDEX(_s, b, 1)()))),
    _out
  ) 
)

As I said, it works fine, but I don’t really understand how it works. I’m reluctant to include it in a project for a client if I don’t grok it, but I’ve implemented it in some of my own workbooks. It seems to run slowly, probably because for each element of the original array, it generates a subset of that array to calculate an average. For an array with hundreds of points, that adds up to hundreds of smaller arrays.

I also went to ChatGPT to see what it could tell me. It showed me lots of code samples and several formulas that resembled Excel formulas. But many formulas had errors, and no formula that had no errors returned a moving average.

Running Sums?

I couldn’t wrap my tired, old brain around the algorithm above, but sometimes my brain gets bored, looks out the window, and surprises me with what it comes up with. And my approach isn’t rocket science, but it is less cumbersome than creating multitudes of arrays which all include partial duplicates of the original array’s values. I can calculate the running sum of the original array, subtract the running sum from an earlier row, and divide by the number of points, and I’ll have my moving averages.

First, I’ll show how it works. Here’s my data in the second column below. For reference I’ve inserted a sequence number in the first column. The gold-shaded range in the third column contains the running sum of the data in the second column. The fourth column contains the same shaded running sum, offset by 5 rows so I can compute my 5-point moving average. The first four cells of a 5-point moving average are not calculated; I’ve entered #N/A so they are not plotted.

Delta is the difference between the two running sums, that is, the intermediate moving sum, and Average is Delta divided by 5.

The first 5-point average is calculated for the 5th value, where the running sum is 39: 39 divided by 5 is 7.8.

The second calculation is for the 6th point, where the running sum is 46. But we only want the sum of points 2 through 6, so we subtract the running sum for point 1, which is 8. (46-8)/5 is 7.6. And so on.

Now let’s get it into a single formula.

Building the Formula

The following range illustrates the steps toward building the formula; several steps are just me learning how some new Excel functions work. Column B contains the original formula, and column C is my old-style formula-in-every-cell to calculate the moving average. I’ve hidden columns D:I.

Column J spits out the original data range. This isn’t necessary in the final formula, but I was gaining confidence with BYROW. BYROW works by defining an array, passing it row by row into a LAMBDA function, and building an array of the results of that LAMBDA for each row. The array passed in is a simple SEQUENCE, from 1 to the number of rows in the original data range. Each element of the sequence is passed as x into LAMBDA, which returns the xth element of the data range.

=LET(
  datarange,B5#,
  BYROW(
    SEQUENCE(ROWS(datarange)),
    LAMBDA(x,INDEX(datarange,x))
  )
)

The running sum in column K is easy to calculate with the new Dynamic Array helper functions. Like BYROW, SCAN passes each element of an array into a LAMBDA function, which calculates each element of the output array, Again, this isn’t strictly necessary, but I was learning about SCAN.

The first argument of SCAN is the starting value (zero since it’s missing), the second is the original data range. LAMBDA accepts the starting value a and the data value b then applies the function a+b to generate the output value. This output becomes the new starting value a, which is added to the next data value b, etc.

=LET(
  datarange,B5#,
  SCAN(,datarange,LAMBDA(a,b,a+b))
)

The running sum in column L is a bit more convoluted, but it’s leading to my ultimate formula. SCAN is used as above to generate the running sum, but instead of spitting it out into the worksheet, it is stored in the name runsum. Then BYROW is used to return each element of runsum.

=LET(
  datarange,$B$5#,
  runsum,SCAN(,datarange,LAMBDA(a,b,a+b)),
  BYROW(
    SEQUENCE(ROWS(datarange)),
    LAMBDA(x,INDEX(runsum,x))
  )
)

If I can get the xth element of runsum, I can also get the element numpoints before that and subtract it. If x is less than numpoints, this corresponds to an early point which displays #N/A. If x is equal to numpoints, it’s the first calculated value, and there is no running sum to subtract, so I subtract zero. After subtracting to get the intermediate sum, I divide by numpoints to get the average.

=LET(
  datarange,$B$5#,
  numpoints,$C$2,
  runsum,SCAN(,datarange,LAMBDA(a,b,a+b)),
  BYROW(
    SEQUENCE(ROWS(datarange)),
    LAMBDA(x,
    IF(
      x<numpoints,
      NA(),
      (INDEX(runsum,x)-IF(x=numpoints,0,INDEX(runsum,x-numpoints)))
      /numpoints)
    )
  )
)

I can rewrite this as a LAMBDA:

=LAMBDA(datarange,numpoints,
  LET(
    runsum,SCAN(,datarange,LAMBDA(a,b,a+b)),
    BYROW(
      SEQUENCE(ROWS(datarange)),
      LAMBDA(x,
      IF(
        x<numpoints,
        NA(),
        (INDEX(runsum,x)-IF(x=numpoints,0,INDEX(runsum,x-numpoints)))
        /numpoints)
      )
    )
  )
)($B$5#,$C$2)

Note that the LAMBDA formula above ends with ($B$5#,$C$2), which is how these arguments are entered into the formula. To make this a reusable function, copy the formula without these arguments and their parentheses. Go to Formulas tab > Define Name. Enter a function name and short description, paste the formula into the Refers To box, and press Enter.

Using the Define Name method is suboptimal. The entire LAMBDA function cannot be viewed at once, and you lose the line feeds and white space. You could also use the Advanced Formula Environment, a free add-in from Microsoft.

You can now use this formula throughout the workbook using this simple syntax:

=MovingAverage(B5#,C2)

Improvements

I exchanged several comments on my earlier post with a smart reader named Henk-Jan van Well, who suggested several improvements to the moving average algorithm, starting with CHOOSEROWS, then evolving to TAKE and DROP. This led to simplified and more robust formulas, and also to a moving average by date which dealt nicely with missing dates, i.e., it averaged values within seven days rather than averaging seven consecutive values that may span more than seven days. See the improved post at Improved Excel Lambda Moving Average.

More About Dynamic Arrays, LET, and LAMBDA

Posted: Wednesday, February 15th, 2023 under LAMBDA Function.
Tags: Dynamic Arrays, LAMBDA Function, moving average.
Comments: 10

LAMBDA Function to Build Three-Tier Year-Quarter-Month Category Axis Labels

Tuesday, November 29, 2022 by Jon Peltier Leave a Comment

Tuesday, November 29, 2022 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

A certain data layout can produce a chart axis which divides and subdivides the categories into logical subcategories, such as years, quarters, and months in the following chart.

Excel Line Chart with Three-Tiered Year-Quarter-Month Category Axis Labels

Generally, this data layout must be produced by hand, because it relies on an arrangement of filled and blank cells to help Excel parse the data into subcategories. In this post I’ll show how a LAMBDA formula can build the range for you.

Bulgaria Excel Days 2022

Earlier this month I had the honor and pleasure of participating in Excel Days 2022, held in Sofia, Bulgaria. On one day I held my Advanced Excel Charting Masterclass for a group of about 25; on another I presented a conference session entitled “The Best Excel Charting Tips and Tricks” to a crowd of 300 or more. That’s me standing in front of the largest LED screen I’ve ever used.

Jon Peltier presenting at Excel Days 2022 in Sofia, Bulgaria.

Chart Tip/Trick: Top-Left Cell

One of the tips was that using a blank top-left cell helps Excel parse the X values, Y values, and series names (highlighted purple, blue, and red respectively in the screenshot below), especially if the X values and series names consist of numbers (years, for example). The top-left cell, or TLC, is filled with light gold.

A blank top-left cell helps Excel parse chart source data

Many people know about this TLC trick.

Chart Tip/Trick: Category Axis Grouping

What I call TLC+ extends this approach to identify multiple columns to use as X values, using blank cells to identify grouping of categories and subcategories. This example groups months into quarters. Again, blank cells are filled with light gold, and the axis categories and subcategories are highlighted with the same color.

Grouped chart axis labels created with a partially populated column

Many people don’t know about grouping axis categories in this way, although they’ve likely seen it in a pivot chart that has multiple fields in the rows area of its pivot table. So this trick was new to much of my audience.

TLC+ doesn’t stop there. The next example shows months grouped into quarters, and quarters grouped into years. I’ve applied the same highlighting as above.

Multiple chart axis grouping levels are possible

This grouping can be extended much farther; the limit is not with Excel’s technical capability but more with the legibility of the output. Once I built a chart with 9 or 10 levels of grouping in its category axis. By this point, the chart consisted of a very wide category axis and very little room for data in the plot area.

Not-Just-For-Charts Tip/Trick: Custom Number Formats for Months

Speaking of legibility, those month abbreviations are difficult to read, since they have been rotated upward. But you can use a one-letter abbreviation for each month, by entering an actual date in the cells, and applying the custom number format below. One “m” in the custom format string results in a one- or two-digit month number; two results in a two-digit month number with a leading zero if necessary. Use three m’s for the three-letter month abbreviation, and four m’s for the full month name. Finally, five m’s will give you just the first initial of each month. This may be confusing if only one or two months are abbreviated this way, since there are multiple months for J, M, and A. But when a whole year of months is abbreviated like this, people recognize the months.

This “mmmmm” trick amazed one audience member so much that she tested whether “ddddd” would produce the one-letter abbreviations of the days of the week. She reported that, sadly, it does not.

This is the same data as above, with one-letter month abbreviations that improve legibility of the axis labels.

Cleaned up month labels in grouped category axis

Most Important Chart Tip/Trick?

Those are three great tips: top-left cell, category label grouping, and the custom number format for one-letter month abbreviations. But what is the most important charting trick?

As I am energetically explaining below, the most important charting trick is to get the data right before you make your chart.

Jon Peltier explaining the need for good chart data at Excel Days 2022 in Sofia, Bulgaria

Dude, Where’s My LAMBDA?

I promised a formula to construct a grouped data range for a chart’s category axis. Here is how I built my LAMBDA.

TEXTSPLIT

My initial approach was to build a string of years, quarters, and months delimited by commas and semicolons, then use the recently introduced TEXTSPLIT function to break that into the required grid.

I typed this long string into cell B2 (note the semicolons before the first quarter and before the first month)

2021,,,,,,,,,,,,2022,,,,,,,,,,,;Q1,,,Q2,,,Q3,,,Q4,,,Q1,,,Q2,,,Q3,,,Q4,,;J,F,M,A,M,J,J,A,S,O,N,D,J,F,M,A,M,J,J,A,S,O,N,D

and this formula in cell B5 produced the grouped dates in B5:D28 (with a blue border intended to resemble the Dynamic Array formula highlighting).

=TRANSPOSE(TEXTSPLIT(B2,",",";",FALSE))

Some people like to use spaces to make formulas easier to read; Excel ignores the spaces.

=TRANSPOSE(TEXTSPLIT(B2 , "," , ";" , FALSE))

Some will even write the formula on multiple lines for clarity, with spaces to provide indentation; Excel ignores the line feeds as well as the spaces. You can press Alt+Enter to insert a line feed within a formula. I’m not sure extra spaces and line feeds are necessary for this relatively simple formula, but they will help in a little while.

=TRANSPOSE(
  TEXTSPLIT(
    B2 ,
    "," ,
    ";" ,
    FALSE
  )
)

The chart was a test that the output would work for my axis groupings.

Grouped labels can be created using TEXTSPLIT

My initial thought was to input an initial year and number of years into my LAMBDA, construct the long, delimited string, then TEXTSPLIT it. But I did a little experimenting first.

Build the Pieces

I used the DATE function to produce a list of months. DATE(year,month,day) produces a numerical date for month/day/year (or day/month/year outside the US). To get sequential months, I used the SEQUENCE function in the month argument of the DATE function.

All I needed was the initial month, as shown below. Using year 2021 and months 1 through 12, I get January through December of 2021. Then month 13 spills into the next year, giving me January of 2022. This makes it easier than having to calculate when the year turned over.

The DATE(year,month,day) function for 24 months

The screenshot below shows my approach. I listed the months in column B. In column D, I listed the year, but only if the month was January. In column E I listed the quarter, but only for the first month of each quarter. In column F I listed the first initial of each month using the “mmmmm” custom number format.

Column G holds some dummy data, so I could test out a chart, and it worked nicely.

Building the chart axis grouping formula

Thanks to Leonid who pointed out a typo in one of these formulas in the original figure.

Put the Pieces Together

I needed to condense all of this into one formula, so I started with the following LET function. I input the first year and number of years, then calculated my list of dates. Then I did a bit of arithmetic to calculate quarter number from the month: January is quarter number 1, February is 1-1/3, March is 1-2/3, April is 2, etc.

The CHOOSE function’s syntax the way I used it is something like this:

CHOOSE({column 1, column 2, column 3},
  formula for column 1,
  formula for column 2,
  formula for column 3)

The CHOOSE function is a great way to build up a multiple-column Dynamic Array.

=LET(
  firstyear,2021,
  numyears,2,
  dates,DATE(firstyear,SEQUENCE(12*numyears),1),
  qtr,(MONTH(dates)+2)/3,
  CHOOSE(
    {1,2,3},
    IF(MONTH(dates)=1,TEXT(dates,"yyyy"),""),
    IF(qtr=INT(qtr),"Q"&qtr,""),
    TEXT(dates,"mmmmm")
  )
)

This milti-line, indented formula is much easier to read than the no-white-spaces version.

This formula produces the desired output.

Output of the chart axis grouping formula

There are a few caveats for my non-US readers. First, the format strings for year (“Y”) and month (“M”) may be different in your regional version of Excel. Second, the argument separator in Excel formulas is a comma (,) for the US, but a semicolon (;) in many non-US editions, and I’ve heard of other characters being used. Third, array structures like {1,2,3} use a comma (,) and semicolon (;) respectively as column and row separators in the US, but in other regions, these may be reversed or even replaced by other characters.

Finally the LAMBDA

I rearranged my LET to produce the following LAMBDA function. When used in a cell like this, the inputs to the LAMBDA are enclosed in parentheses after the LAMBDA’s closing parenthesis. The LAMBDA produces output identical to the LET output above.

=LAMBDA(
  firstyear,numyears,
  LET(
    dates,DATE(firstyear,SEQUENCE(12*numyears),1),
    qtr,(MONTH(dates)+2)/3,
    CHOOSE(
      {1,2,3},
      IF(MONTH(dates)=1,TEXT(dates,"yyyy"),""),
      IF(qtr=INT(qtr),"Q"&qtr,""),
      TEXT(dates,"mmmmm")
    )
  )
)
(2021,2)

Here is how to convert the LAMBDA into a reusable user-defined function. Click Define Name on the Formulas tab to open the New Name dialog. Give the function a descriptive name, document it with a comment, then enter the LAMBDA formula (without the trailing inputs) into the Refers To box, and click Enter.

To simplify creating, editing, and deploying LAMBDA functions, Microsoft has introduced the Advanced Formula Environment. You can get it from Insert > Get Add-Ins. I have not used it here.

When you start typing the LAMBDA name into a cell, Excel’s IntelliSense suggests the full name of the function, and shows the comment, just like any built-in function.

Click the tab key and Excel fills in the function name with the opening parenthesis, and shows what arguments are expected.

I’ve used my LAMBDA function in cell B3 below, and the output works just fine in the chart. I’ve hard-coded 2020 and 3 as the starting year and number of years, but I could have linked to cells with these values.

TLC Reprise

Here I am showing how a chart looks when the top-left cell is not blank.

Jon Peltier at Excel Days 2022 in Sofia, Bulgaria

Power Query Alternative

My colleague Mark Proctor likes this axis technique, and built a query in Power Query that will take monthly date and generate the Year-Quarter-Month data structure to produce the tiered axis in his charts. Read about it in his Excel Off The Grid blog in How to create chart data with Power Query.

More LAMBDA Functions

Dynamic Charts Using Dynamic Arrays

Thursday, March 25, 2021 by Jon Peltier 4 Comments

I recently wrote Dynamic Array Histogram, a tutorial showing how to build a histogram with a normal curve overlay. It worked great, except that the chart was hard-coded to the worksheet ranges with the data. When I changed the Dynamic Array formulas, the spill ranges changed size, and I had to manually adjust the chart data ranges. Not too big a deal for just a few series, but still an inconvenience.

Dynamic Array with One Column

Here is what the data looks like. Cell D7 contains the following formula, which spills the numbers from 161 to 175 into the highlighted range D7:D21.

=SEQUENCE(H4+1-G4,1,G4,1)

Histogram data from several Dynamic Arrays

I’ve written many tutorials about Dynamic Charts (see the list at the end of this article). Ordinarily I would generate some Names (aka dynamic range names) using a formula like this

=OFFSET(Sheet1!$D$6,1,0,COUNT(Sheet1!$D$6:$D$100),1)

then add the Names to the chart.

But we don’t need to use an OFFSET or other function to determine the size of the range of data. It’s a Dynamic Array, which we can reference using a hash sign (#): The entire spill range of the Dynamic Array formula in D7 is referenced by D7#.

We still need to use Names in the dynamic chart, so click Define Name on Excel’s Formulas tab, give our Name a good name (xWeights), for Scope select Sheet1, and use =Sheet1!$D$7# as the Refers To formula. If you click in the Name textbox and back in the Refers To box, the range defined by the formula will be highlighted as shown.

Defining a Name based on a Dynamic Array

Similarly we had Dynamic Array formulas in cells E7 and F7, which spilled into E7# and F7#. We define Names yCount and yCurve using =Sheet1!$E$7# and =Sheet1!$F$7#. These are the Names we will use in our dynamic chart.

Press Ctrl+F3 to open the Name Manager and see all of the Names. As above, clicking in the Refers To box will highlight the range defined by the formula.

A note about naming Names. I use a prefix of x or y for chart data which will be used as X or Y values. I used to use a prefix of cht, but a fluky behavior in Excel 2007 and more recent versions changed that. If the name of a Name begins with R or C (think Row or Column), the name cannot easily be used in the series formula. If your language of Excel is not English, you can’t use the letters that the words for Row and Column begin with in your language.

Start by selecting the range of data (D6:F21), including the header row. And note that the top left cell D7 is blank, to help Excel determine that the first column is X values. Insert a column chart, then right click on the Curve series, choose Change Series Chart Type, and select a line style.

When the chart is selected, the data range is highlighted.

Select the Count series (the blue columns). The data is highlighted, and this SERIES formula appears in the formula bar.

=SERIES(Sheet1!$E$6,Sheet1!$D$7:$D$21,Sheet1!$E$7:$E$21,1)

This means the series name is in Sheet1!$E$6, the X values are in Sheet1!$D$7:$D$21, the Y values are in Sheet1!$E$7:$E$21, and it is the first series in the chart.

Static chart using Histogram data addresses

Right in the formula bar, change Sheet1!$D$7:$D$21 to Sheet1!xWeights and Sheet1!$E$7:$E$21 to Sheet1!yCount. Click Enter, so the SERIES formula becomes

=SERIES(Sheet1!$E$6,Sheet1!xWeights,Sheet1!yCount,1)

The worksheet data is still highlighted.

Dynamic chart using Histogram data Names

Repeat the procedure with the Curve series, changing the formula from

=SERIES(Sheet1!$F$6,Sheet1!$D$7:$D$21,Sheet1!$F$7:$F$21,2)

=SERIES(Sheet1!$F$6,Sheet1!xWeights,Sheet1!yCurve,2)

Dynamic Array with Multiple Columns

In the Dynamic Array Histogram tutorial, I showed how to make a multiple-column Dynamic Array from a single formula. This makes defining our Names more complicated, but only slightly so.

The Dynamic Array formula in cell D4 spills into multiple rows and columns.

The formula =D4# (entered into cell I4) spills into the same size range.

But I can use INDEX to return a portion of the Dynamic Array in D4#. For example, INDEX(D4#,1,1) returns the cell in the first row and first column of D4#.

Referencing one cell of the Dynamic Array

To get the first column, I specify the row as zero, in INDEX(D4#,0,1). I could leave out the zero altogether as long as I have the right amount of commas: INDEX(D4#,,1).

Referencing one column of the Dynamic Array

If I’d wanted just the first row, I use zero (or a blank) for the column index: INDEX(D4#,1,0) or INDEX(D4#,1,).

Referencing one row of the Dynamic Array

As it turns out, I could use INDEX(D4#,0,0) or INDEX(D4#,,) to reference the entire array.

I define my Names as above, name: xWeights, scope: Sheet2, refers to: =INDEX(Sheet2!$D$4#,0,1).

Defining a Name based on part of a Dynamic Array

Define yCount and yCurve as =INDEX(Sheet2!$D$4#,0,1) and =INDEX(Sheet2!$D$4#,0,1), create and format the chart, and plug in the Names for the hard-coded addresses.

Update: Dynamic Charts Without Names

A year or so after I posted this article, Microsoft released an enhancement to Excel that made Dynamic-Array-driven charts themselves dynamic. If all of the data in the chart comes from a single Dynamic Array formula, the chart’s source data will change size to match the Dynamic Array’s spill range. This means we can select the original Dynamic Array and insert our chart, ignore the need to create Names for the X and Y values, and the chart will dynamically change its source data range as the Dynamic Array changes.

Dynamic Charts

Here are more articles on the Peltier Tech blog that talk about dynamic charts.

More About Dynamic Arrays, LET, and LAMBDA

Posted: Thursday, March 25th, 2021 under Dynamic Arrays, Dynamic Charts.
Tags: Dynamic Arrays, Dynamic Charts.
Comments: 4

Dynamic Arrays

The Problem: Your Y data is in more than a single row or column.

The Setup: Y data is in multiple rows or columns.

TOROW and TOCOL to the rescue!

Use Names to keep the worksheet clean

Neat Trick: Double Unary Minus

Jon’s Previous Moving Average Formula

Improvement #1: CHOOSEROWS

Improvement #2: Calculate Averages for First Points

Improvement #3: TAKE and DROP

Major Enhancement: Moving Average by Date

Sample Workbook

More Articles About Dynamic Arrays, LET, and LAMBDA

Moving Averages

Internet Search for Dynamic Array Moving Average

Running Sums?

Building the Formula

Improvements

More About Dynamic Arrays, LET, and LAMBDA

Bulgaria Excel Days 2022

Chart Tip/Trick: Top-Left Cell

Chart Tip/Trick: Category Axis Grouping

Not-Just-For-Charts Tip/Trick: Custom Number Formats for Months

Most Important Chart Tip/Trick?

Dude, Where’s My LAMBDA?

TEXTSPLIT

Build the Pieces

Put the Pieces Together

Finally the LAMBDA

TLC Reprise

Power Query Alternative

More LAMBDA Functions

Read More About Axis Labels

More Articles About Axis Scales

Dynamic Array with One Column

Dynamic Array with Multiple Columns

Update: Dynamic Charts Without Names

Dynamic Charts

More Histogram Articles on Peltier Tech Blog

More About Dynamic Arrays, LET, and LAMBDA