Combination Charts

Pareto Charts in Excel

Thursday, January 25, 2024 by Jon Peltier Leave a Comment

Thursday, January 25, 2024 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

A Pareto Chart is a horizontal or vertical bar chart with its data sorted in descending order. The largest items (categories) in the chart are listed first for emphasis. A line is usually overlaid on the bar chart, showing cumulative sums or percentages of the total. The intent of the chart is to show which category contains the most items, for example, which part of a machine incurs the most failures, or which point in the supply chain suffers the most delays.

The Pareto Principle, named for Italian economist Vilfredo Pareto, is based on the observation that most of the effects of an action come from a small amount of the causes. This is often called the 80-20 rule, implying that 80% of the failures come from 20% of the defects, or that 80% of one’s sales come from 20% of one’s customers, or pretty much any 80-20 metaphor you can come up with.

I think, though, that people are too obsessed with the magical 80:20 ratio. A deeper understanding is that “the relationship between inputs and outputs is not balanced” [Investopedia]. The Pareto diagram is a convenient, easy tool to show which inputs of a process have a greater effect on its outputs.

I’m sure that in most processes the top 20% of the inputs affect maybe 40% of the outputs, and 80% of the results are influenced by 60% of the causes.

But the exact proportions aren’t critical. If the first input has the greatest impact on the output, it’s clear: if it’s good, get more of it; if it’s bad, get rid of it.

Pareto Data

The data for a Pareto Chart is fairly straightforward, consisting of a list of items and a measure of the importance, frequency, or impact of each item. This measure could be a number of occurrences, total costs or revenues, etc. Below it is simply a count. Generally, the data is sorted from highest frequency to lowest.

Pareto Chart Data - Sorted Counts by Category

Sometimes the data is not compiled, but exists in a list, perhaps by date or lot number, as shown in columns B and C below. You could get a list of the items in column C using

=UNIQUE(C3:C141)

but that list will be sorted in order of the item’s appearance in the list. However, you can construct a more complicated formula in cell E3 to get what you want:

=LET(input,C3:C141,
     list,UNIQUE(input),
     counts,COUNTIF(input,list),
     table,HSTACK(list,counts),
     output,SORT(table,2,-1),
     output)

An alternative is to build a Pivot Table from the uncompiled list. Put the List field into the Rows area and Values area (as Count of List) of the pivot table, then sort List in descending order by Count of List.

Depending on your data and your requirements, you could build any number of different formulas or use Pivot Tables or Power Query to generate Pareto-ready data.

Native Excel Pareto Chart

Using compiled data shown above, with counts by category (unsorted data works fine), you can insert a native Pareto Chart. Go to the Insert tab and click the Pareto Chart icon shown below.

The result is the following chart.

Native Excel Pareto Chart: Almost Useful

It’s pretty quick, you don’t need to sort the data before you make the chart, and you don’t need to calculate your cumulative percentages.

This chart is almost useful, but like any of the charts added to Excel since 2016, it has some limitations. Most annoying is that, while you can add data labels to the bars (counts), you can’t add data labels to the percentage line. You also can’t add markers to the percentage line, and I generally use markers, so I know precisely where the data is. You can’t change the vertical axis tick label spacing on either axis. You can’t link the chart title to text in a cell. And there is probably more that you can do with a regular chart that you can’t do with the native Pareto.

If you want these features in your Pareto chart, follow these instructions to build your own.

A Simple “Pareto Chart” with Counts Only

The simplest homemade Pareto Chart is a simple column or bar chart, with the data sorted from highest to lowest. The data is simple, as shown above: a list of categories and values, sorted with the highest values first. Select the data and insert a column chart.

A column chart serves as a simple Pareto Chart.

We can tell just from this chart that Alpha is the highest category and Beta the second highest. For a simple analysis, that might be sufficient. If you need percentages, you can add formulas to compute them in column D. The formula in cell D3, filled down to D8, is

=C3/SUM($C$3:$C$8)

Select the series of columns, click the plus icon beside the chart, click the right “arrow” beside Data Labels, and choose More Options.

Select Outside End for Label Position. Select Value From Cells for Label Contains, and in the small dialog that pops up, select D3:D8, which contains the percentages. Then uncheck the Value box.

Data Labels - Values from Cells - Select Range

I’ve also applies some formatting. I set the column chart Gap Width to 0, so the columns are touching. I like to use a transparent fill color for these columns, which allows the gridlines to show through the columns. I kept the default blue color but applied 40% transparency to the fill, and I used the same blue for the border. I also used the dark red standard color for the data labels.

The simplest Pareto Chart: a column chart with percentage labels

I can see that Alpha accounts for 29% of the total. But the labels are not the cumulative percentages, so I must dust off my tired old brain and determine that Alpha plus Beta account for 62%.

Of course, that’s not what most people would consider an “official” Pareto Chart. If you’re one of those people, read on.

DIY Pareto Chart: Counts and Cumulative Percentages

It’s easy to build your own “official” Pareto Chart. It’s really a simple combination chart, with columns on the primary axis and cumulative percentages on the secondary axis. But you need to sort your data and calculate your percentages. In the data range below, the formula in cell D3, which is filled down to D8, is

=SUM(C$3:C3)/SUM(C$3:C$8)

Pareto Chart Data - Sorted Counts and Cumulative Percentages by Category

Select this range (or just one cell within this range) and insert a clustered column chart (below left).

Right click on either series, and choose Change Series Chart Type from the pop-up menu. In the Change Chart Type dialog (shown below these charts), select Line With Markers for the cumulative percentage axis, and select the corresponding Secondary Axis box.

Building a Pareto Chart - Column Chart to Combination Chart

Change Series Chart Type to Make a Combination Column-Line Chart

Format the series as in the previous example. Set gap width for the columns to 0, so the bars are touching. I used the default blue fill with 40% transparency, so the gridlines show through, and I used the same blue without transparency for the borders. For the percentages, I used the dark red standard color and a 2-pixel (1.5 point) line thickness for the line and the marker border, and a white marker fill (below left).

Format the axes. I set the maximum on the secondary (percentage) axis to 100%. I also set the maximum of the primary axis to 50, because the gridlines correspond to the tick labels on either side of the chart (I’m not always so lucky).

Building a Pareto Chart - Combination Chart to Pareto Chart

This is probably the Pareto chart style that most people expect. Read on to see how to implement other styles.

You can scale the primary vertical axis (left side) so the minimum is zero and the maximum is the total counts (139 in our chart). Since the gridlines won’t line up, I’ve removed them, added visible vertical axis lines with tickmarks, and used a light gray for the plot area border. I like to see the bars and markers vertically aligned like this. In this case, it shows that none of the columns are very tall compared to the others.

A Pareto Chart with synchronized vertical axes.

Pareto Chart Option: Counts and Cumulative Counts

A somewhat simpler Pareto chart shows counts as columns and cumulative counts as a line with markers, both on the primary axis. The data is simple:

Pareto Chart Data - Sorted Counts and Cumulative Counts by Category

Select B2:D8 and insert a column chart, then change the chart type of the cumulative count series to a line with markers.

Apply the desired formatting, and you have a simple Pareto chart with counts and cumulative counts.

Pareto Chart Option: Percentages and Cumulative Percentages

Another simple Pareto chart option plots percentages as columns and cumulative percentages as a line with markers. The data is shown below.

Pareto Chart Data - Sorted Percentages and Cumulative Percentages by Category

Select B2:B8 and hold Ctrl while selecting D3:E8 so a multiple area range is selected (ignoring the counts), and insert a column chart, then change the chart type of the cumulative percent series to a line with markers.

Format the series, and you have a simple Pareto chart with percentages and cumulative percentages.

Pareto Chart Option: Stepped or Waterfall

Another Pareto chart option plots the data as a staircase or waterfall of counts. You need to compute counts as before, with a column of values for a blank series which is the cumulative total prior to the current row.

Pareto Chart Data - Sorted Counts for a Stepped Chart by Category

Select B2:D8 (ignoring cumulative percentages) and insert a stacked column chart. Format the (blue) blank series with no border and no fill and format the (orange) count series as our count columns in previous charts.

Building a Pareto Chart - Stacked Column Chart to Stepped/Waterfall Chart

Pareto Chart Option: Diagonal Lines

Some purists feel that a true Pareto chart should show its cumulative lines with points between the count columns instead of centered on the columns. For this we need two blocks of data, since the line chart will have one more point than the column chart. The line chart data’s extra point is the initial zero value; the letters a, b, c are merely placeholders for this illustration.

Pareto Chart Data - Sorted Counts and Cumulative Percentages for a chart with Diagonal Lines by Category

Select B3:C9 and insert a column chart. Copy E2:F9, select the chart, and use Paste Special to add the data as a new series in columns, with series names in the first row and categories in the first column. It is added as a second set of columns.

Paste Special dialog for adding data to a chart

Right click on either series and select Change Series Chart Type from the pop up menu. Change the cumulative percent series to a line with markers and check its Secondary Axis box. Click the plus sign floating beside the chart, select Axes, and check the secondary horizontal axis box.

Building a Pareto Chart -
Formatting the Combination Chart

Format the secondary horizontal axis (top of the chart), change Axis Position to On Tick Marks (below left), then change Label Position to None and change the line color to No Line (below right).

Finally, format the columns as before. The secondary horizontal axis settings above and the column gap width of zero aligns the markers and columns horizontally.

You can adjust the scale of the primary value axis (left side) so the minimum is zero and the maximum is the total counts, which will align the markers and columns vertically.

Stepped Pareto Chart with Vertical Lines and Markers Between Columns

If you don’t want a diagonal line to run through the first column (some purists don’t), simply clear the cell with 0% (or if it is generated formulaically, insert NA() instead of 0).

Pareto Chart with Vertical Lines and Markers Between Columns

Here is a variation with diagonal lines and between-column markers overlaid on the Pareto step chart, showing that the diagonal lines correspond to each category’s counts.

Pareto Charts in Peltier Tech Charts for Excel

This tutorial shows how to create Pareto Charts, including the specialized data layout needed, and the detailed combination of chart series and chart types required. This manual process takes time, is prone to error, and becomes tedious.

I have created Peltier Tech Charts for Excel to create Pareto Charts (and many other custom charts) automatically from raw data. This utility, a standard Excel add-in, lays out data in the required layout, then constructs a chart with the right combination of chart types.

Pareto charts can be created in vertical or horizontal orientation.

Peltier Tech Charts for Excel - Pareto Chart Options

Values can be plotted (above), or percentages (below left), or bars as values and the line as cumulative percentages (below right).

An “Other” category with different shading can be plotted at the end of the data (below left). The data can also be plotted as a floating cumulative bar shart, like a waterfall (below right).

All of these options are available in vertically or horizontally oriented charts.

This is a commercial product, tested on thousands of machines in a wide variety of configurations, Windows and Mac, which saves time and aggravation.

Please visit the Peltier Tech Charts for Excel page for more information.

Combination Chart to Show Monthly Climate Averages

Friday, June 16, 2023 by Jon Peltier 2 Comments

Friday, June 16, 2023 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

About the Combination Chart

A reader sent me a combination chart showing monthly high and low average temperatures and precipitation in his home state of Bihar, India, and asked how to create this chart in Excel. I believe the source is Wikipedia.

Combo chart showing monthly high and low temperatures and precipitation in Bihar, India

It’s a nice chart, easy to understand, and not too hard to do in Excel. Here is the data I manually extracted from the chart. Precipitation is in millimeters and temperatures in °Celcius.

Monthly high and low temperature and precipitation data for Bihar, India

I can use floating bars to plot low and high temperatures by plotting high and low temps in a line chart (see Floating Bars in Excel Charts and Floating Bars).

I first made both lines red, and put data labels above the High series and below the Low series.

Line chart showing high and low temperatures

I add Up Down Bars to the chart, which connect the first and last data points for each category (month). Since low is first and high is second, the bars go from low to high, so they are Up bars. Sometimes data is mixed, like stock market data which can go up or down, and you will get a mixture of Up and Down bars.

Line chart showing high and low temperatures connected by Up bars

These Up bars need to be formatted. I gave them a red fill color and a border of no color, and I used no line for the high- and low-temperature series.

Line chart with floating bars showing high and low temperatures

The precipitation data is a simple column chart.

We just need to put these pieces together. I developed the following protocol that will simplify creating the combination chart.

Creating the Combination Chart

Starting with the data shown earlier, insert a line chart.

Right click on any series in the chart, and select Change Series Chart Type from the pop-up menu. Change Precip to a Clustered Column type in the dropdown and check the Secondary Axis box.

Change Precip series to Clustered Column type on Secondary Axis in Change Chart Type Dialog

Now it’s a combo chart, but it’s not yet our combo chart. We see low- and high-temperature lines plotted against the primary (left) axis, and precipitation columns on the secondary (right) axis.

Chart 2 - Combination chart showing temperatures on the primary axis and precipitation on the secondary axis

The next step is to stretch the chart to its final (or near final) size.

Chart 3 - Combo chart stretched to a better size

Adjust both Y-axis scales to partition the chart into an upper section for temperature data and a lower section for precipitation data.

Chart 4 - Adjust Y axis scales to partition chart into upper and lower sections

Setting the scales is an iterative (trial-and-error) process, depends on the range and units of your data values, and accommodates space needed for enhancements such as labels. You can readjust the scales at the end to make everything fit nicely. In fact, I built my chart, fine-tuned the axes, and applied those scales to the above chart.

Let’s clean up the chart. Delete the gridlines and legend. For both Y axes, set the label position to None. Set the X-axis label position to Low. Note that the X-axis line remains in the middle of the chart. We could set it to No Line, but I think keeping it helps partition the chart.

Remove excess labels and lines from the combo chart.

Format the precipitation columns. I’ve set the gap width to 100, added data labels in the Outside End position, and used a shade of blue I prefer for the columns and label text.

Format the temperatures. Start by adding data labels above the High and below the Low temperatures, and adding Up-Down Bars to the chart.

Add data labels and up down bars to the temperature lines

Finish formatting the temperatures. Now that there are Up-Down Bars, you can set the gap width to 100. But you have to select and format one of the lines first, because that’s where you’ll find the gap width setting for the bars. Format the temperatures to use no line. I’ve used a preferred shade of red for the bar fill color and for the labels.

Format temperatures: hide the lines and color the bars and labels

Finish the chart. Format the chart title, then add and format two textboxes to the chart as subtitles. Note: to be sure the textboxes become part of the chart, select the chart before you insert the textbox (Insert > Shapes > Basic Shapes > Textbox).

Chart Data From a Different Region

I went online and found similar data for Worcester, MA, a city close to my home. I made a copy of my worksheet, and typed this data in place of the data for Bihar.

Monthly high and low temperature and precipitation data for Worcester MA

The chart for Worcester was totally messed up. But that’s because Bihar’s data is in °Celcius and millimeters, while the data for Worcester is in °Fahrenheit and inches, and the ranges of values are incompatible.

Initial chart of monthly average climate data for Worcester, using Bihar axis scales and units

It only took a minute to tweak the axis scales and fix the units in the textboxes. Here is my monthly climate summary for Worcester.

Fixed chart of monthly average climate data for Worcester, using appropriate axis scales and units relevant

Posted: Friday, June 16th, 2023 under Combination Charts.
Tags: Climate, Column Charts, Combination Charts, Line Charts, Up-Down Bars.
Comments: 2

Combination Chart for Multi-Factor Test Results

Monday, January 31, 2022 by Jon Peltier Leave a Comment

Monday, January 31, 2022 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

I’m not sure what to call this chart, other than it’s a combination chart (stacked column and XY scatter) and it requires some additional data manipulation. The chart shows test results from a test program that includes different factor levels: Groups (Alpha, Beta, Gamma), Classes (High and Low), and Treatments (A, B, C), with four replications per set of factors. The intent is to show each replication, while also showing how the groups, classes, and treatments compare against each other.

Test results from multi-factor program, plotted by treatment, class, and group.

About the Test Program

A few years ago I wrote Make Technical Dot Plots in Excel, which showed each test result from a relatively simple test program. A typical one of these charts is shown here, where all results from each of the three test conditions are shown.

The above chart shows three treatments, but what if we have other factors in the test program. We have two classes and three groups, which might refer to any other way to categorize testing conditions. Age groups of test subjects, species of an infectious agent being targeted, season of the year, etc.

A comment to the article cited above posted a data set, part of which I’ve shown below (after anonymizing it). You can download a CSV file with the data set here: test_results.csv. For each combination of Group, Class, and Treatment, there are Four replications. We want to plot each of these replications in the final chart.

Here is how the first several rows of data look when opened in Excel and converted into a Table.

Exploring the Data

Simple Charts

When I first get my hands on a new data set, I like to make a chart or three to see if there are any obvious insights. My first chart was a simple line chart of the data column. The X-axis is simply the point number, from 1 to 72 (the number of results in the program).

Line chart of test results by record number

Nothing is immediately obvious, other than the scores being skewed toward the upper end of the range (more 2s and 3s on a scale of 0 to 3).

I change this chart to use the Treatment column for category labels. No insights come to light.

I further adjusted the chart to include the Class column to the category labels. Nothing but clutter along the axis.

Line chart of test results by treatment and class

When I included the Group column to the category axis labels, the clutter increased.

Line chart of test results by treatment, class, and group

With the appropriate data layout, multiple columns of category labels can bring order to the chart. This is obviously not the appropriate data layout, but we’ll get to it shortly.

Pivot Table and Charts

Sometimes a Pivot Table helps with displaying data. The Picot Table below has Group, Class, and Treatment in the Rows area, Rep in the Columns area, and the Scores in the Values area. Each test result appears in the cross-tabbed Pivot Table, with one set of conditions per row and the four associated test results in that row. Easier to see all at once, but not (yet) easier to find any meaning.

Here is a pivot chart, type clustered column. It shows all results, but the bars are so pinched together it’s hard to make sense of it. That horizontal axis looks better, though, since it is laid out in a much nicer way than the cluttered one above.

A line chart of the same pivot data looks a bit less cluttered, but it’s hard to see how many points might be occupying the same space. It would be better if we could move some left or right by a small amount (the way the columns are spread out in the previous chart), but line charts do not allow that.

How do we get that nice layout? I’ve described it in Chart with a Dual Category Axis and numerous other articles, but I’ll describe it again. Below is the Pivot Table, where I’ve highlighted the chart source data. The blue highlights indicate the Y Values (the test results in the Pivot Table’s Values area), the red highlights indicate the Series Names (in the Columns area), and the purple highlights indicate the X Values or Category Labels (in the Rows area).

There are three columns of category labels, just like in the original Table of data. What makes the axis work nicely in the Pivot Chart is that repeated labels are replaced by blanks. These blanks tell Excel how to construct the labels: place A-B-C along the axis, center Low and High under successive blocks of A-B-C, and center Alpha, Beta, and gamma under blocks of Low and High.

What we can do is use these three columns of the Pivot Table for our chart’s category axis, and then combine this with XY Scatter series where we calculate X values that include the lateral offset to reduce overlapping of the markers. (We could also construct a range like this by hand, but the Pivot Table is here, so let’s use it.)

Calculating X Values

My previous article, Clustered Column and Line Combination Chart, shows how to calculate X values to position XY Scatter markers precisely over columns in a chart. We’ll use a similar approach here. I’ve columns to the Table to accommodate these calculations.

Each combination of factors in the Table results in one category in the Pivot Chart and in our ultimate chart. So we need to calculate Category Number (“CatNum” in the Table). The formula in cell F3 (which Excel fills into the whole column of the Table) is

=INT((ROW()-ROW($F$2)+3)/MAX([Rep]))

Each replicated test takes up a fraction of the width of each category (“Frac” in the table), with the first replication at zero and the last replication at 1. The formula in G3 is

=([@Rep]-1)/(MAX([Rep])-1)

I want to leave space around the reps within a category, leaving a space between categories. This is like the gap width in a column chart. I’ve placed a gap width of 0.5 in cell $N$3, and this formula in cell H3 (under Decimal)

=($N$3/2+[@Frac]*(1-$N$3)

Finally, I add this Decimal amount to CatNum minus one in I2 (under X) using the formula

=[@CatNum]-1+[@Decimal]

I certainly could have combined all of these calculations into a single “X” column, but for ease of explanation, I used several columns for intermediate calculations. Many Excel users love their big ugly formulas that take up fewer columns, but our column limit now stretches beyond 256 columns, while our cognitive limit seems to have shrunk. Helper columns are your friends.

Table of test results and calculated X values

Let’s just see how it looks. First I’ll plot the Score column in an XY Scatter chart, using lines and markers. Without specifying X values, Excel will just use the counting numbers 1, 2, 3, up to the number of points.

XY chart (markers and lines) of test results by record number

Similar to our first line chart at the beginning of this article. Those connecting lines need to go, and let’s format the markers to have no fill, which lets us see more easily when points overlap.

XY chart (markers only) of test results by record number

Now let’s use the calculated X values for the data. There are 18 categories, so let’s set the axis maximum to 18. The data is also grouped into threes, that is, A, B, and C, so let’s set the axis major unit to 3 and minor unit to 1. We have four replications within each vertical slice of the chart.

XY chart of test results by record number with axis divided into treatment, class, and group

We can make one further improvement. The tests within each category would stand out from other categories more visibly if we plot each category in a different color. To do this, I’ve added three more columns to the Table, labeled A, B, and C for the three treatments. The formula in J2, filled into the three added columns, is

=IF($C3=J$2,$E3,NA())

Table of test results and calculated X values with results split by treatment

Here I’ve plotted the three different treatment columns in place of the one Score column, and colored each series distinctly.

XY chart of test results by record number with axis divided into treatment, class, and group and with markers color-coded by treatment

Building the Chart

We’re going to make a combination chart. We’ll start with a column chart using the Rows area of the Pivot Table as category labels and averages for each category of tests as the column Y values. To this, we’ll add XY Scatter series using our calculations in the data Table for X values and the test results as Y values.

We can’t use a Pivot Chart, since a Pivot Chart is constrained to plot all data in the Pivot Table and no data from outside the Pivot Table. But that’s no impediment, as I described in Making Regular Charts from Pivot Tables.

I’ve added three columns, A, B, and C. to the right of the Pivot Table for the average test result values (much as I did for the test results in the Table). These will serve as a background to the individual test results. The formula in X4 (filled into the range X4:Z21) is:

=IF($R4=X$3,AVERAGE($S4:$V4),0)

Pivot Table with Averages in Nearby Range

One thing to be careful of is the sorting in the original data and in the Pivot Table. The original data lists Class in the order Low-High. Default sorting in the Pivot Table lists Class in the opposite order, High-Low. You can change the order by clicking on a pivot item label and dragging it to where you want it.

Create the Chart

Select the columns of average data; include the header row and Excel will use these cells as series names. Then insert a column chart. I’ve made a stacked column chart, so the overlap is 100% and all columns are centered within their categories. The chart’s source data is highlighted in the worksheet.

Column chart of averages by record number

Now add the category labels from the Pivot Chart. I’ll add them bit by bit so you can see how Excel builds up the multiple tiers of labels.

You could use the Select Data dialog to add the labels, but I find that tedious. It’s much easier to edit the SERIES formula. Select the blue columns and this will appear in the Formula Bar (the precise address will depend on the location of the Pivot Table in the sheet):

=SERIES(Sheet1!$X$3,,Sheet1!$X$4:$X$21,1)

The syntax of the SERIES function is

=SERIES([series name],[X values],[Y values],[series number])

Since there are no X values specified, Excel just uses 1, 2, 3, etc. in the chart. Do assign X values, put your cursor between the two commas where the X values should go, select the range in the sheet, then press Enter. The formula should now look like this:

=SERIES(Sheet1!$X$3,Sheet1!$R$4:$R$21,Sheet1!$X$4:$X$21,1)

You only need to add this to one series, since all series (except for XY Scatter series) share X values/category labels.

The worksheet now shows these highlights, and the column of Treatments is used as category labels in the chart.

Let’s include the column of Class labels in column Q. Simply edit the series formula to include the two-column range:

=SERIES(Sheet1!$X$3,Sheet1!$Q$4:$R$21,Sheet1!$X$4:$X$21,1)

The highlighted regions and the chart show the adjustment.

Column chart of averages by treatment and class

One more time: edit the SERIES formula to include Group in column P, so the formula looks like this. Of course, I could have just selected all three columns of the Rows area two steps ago, but I wanted to show how the category label structure develops.

=SERIES(Sheet1!$X$3,Sheet1!$P$4:$R$21,Sheet1!$X$4:$X$21,1)

The worksheet highlights and chart axis show the change.

Column chart of averages by treatment, class, and group

We have the nice category axis we wanted, now let’s do a little cleanup. Set the scale of the Y-axis to minimum=0, maximum=3, major unit=1. Select one of the column series and set gap width to zero so the columns fill the width of each category. Lighten up the colors: I set the transparency to 75% or so, which makes the colors light and lets the gridlines show through.

Add the Test Results

Now it’s time to add the actual test results. Select and copy the last four columns of the Table (X, A, B, and C) including the column headers. Select the chart, click Paste Special from the Paste dropdown on the Home tab of the ribbon, and make the appropriate selections in the dialog: Add as New Series, Values in Columns, Series Names in First Row, Categories in First Column.

Paste Special - Add as New Series, Values in Columns, Series Names in First Row, Categories in First Column

You can also Paste Special with Alt+E+S, which is a legacy of the old menu structure of Excel 97-2003, and which is permanently ingrained in my muscle memory. Another more recent shortcut that I can never remember is Ctrl+Alt+V.

Excel adds the new series as more sets of stacked columns.

Chart with test results added (as columns, temporarily)

That’s easy to fix. Right-click on any series in the chart, and choose Change Series Chart Type. This rich dialog pops up with a preview of the chart and a list of all series in the chart. Note that all series are Stacked Column.

Change Chart Type dialog before changing chart types.

Change each of the newly added series to Scatter using the Chart Type dropdowns. Excel will check the Secondary Axis box for each, which in this case is what we want.

Change Chart Type dialog after changing chart types.

The resulting chart is starting to look good.

Test results converted to XY scatter series on secondary axis

Format the scale on the secondary (upper) horizontal axis: minimum=0, maximum=18, major unit=3, minor unit=1.

Secondary horizontal axis formatted to nicely line up with primary horizontal axis

Add major and minor secondary vertical gridlines.

Format the secondary (upper) horizontal axis to show no labels. Delete the secondary (right-hand-side) vertical axis.

Finally, format the XY Scatter series so the markers have no fill. This makes overlapping points easier to see.

Posted: Monday, January 31st, 2022 under Combination Charts.
Tags: Column Charts, Combination Charts, Data Techniques, XY Charts.
Comments: none

Clustered Column and Line Combination Chart

Monday, January 24, 2022 by Jon Peltier 6 Comments

Monday, January 24, 2022 by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2023, All rights reserved.

Clustered Column and Line Combination Chart, with markers aligned over bars.

Combination charts in Excel are pretty easy, once you figure them out. But sometimes they present challenges. If you make a combination clustered column chart and line chart, it takes special treatment to align the markers over the columns.

I will warn you that adding several series of lines to a chart with several series of columns might make your chart cluttered and difficult to read, especially if the lines and columns need different axes to show different types of data (for example, dollar sales vs. percentages of target). If that’s the case, you should break down the data into more easily digestible pieces, and use multiple charts for improved clarity.

I often use this technique when I need some kind of special highlighting or labeling, and my added data is partly or totally hidden (by formatting with no markers and no lines).

Setup and First Attempt

We’ll start with data for three categories (Alpha, Beta, and Gamma) and three series (Red, Green, and Blue). These might correspond to three companies over three years, or any other grouping you have in mind.

In addition, we want to display columns and lines for each series, showing perhaps target sales values as columns and actual sales as markers on the lines. Here is our data and the separate column and line charts.

Data for the Clustered Column and Line Combination Chart, and separate clustered column chart and line chart.

Making a combination chart is pretty easy. Start by plotting all of the data using one of the chart types in the finished chart. I started with a column chart, but it would work the same if you start with a line chart.

Right-click on any series in the chart, and choose Change Series Chart Type from the pop-up menu to open the Change Chart Type dialog

If you right-click just anywhere on the chart, the menu option is Change Chart Type, and when you select Combo, Excel will use its own favorite combination, and you will have to override Excel’s choices. In this case it would have been okay, but generally Excel chooses something other than what you want.

Change Chart Type dialog for six clustered column chart series

Combo is selected in the list along the left of the dialog, there is a preview of the chart, and a list of all series in the chart, with a dropdown to select the chart type and a checkbox to select the axis of each series.

Change the chart type of each of the last three series to Line with Markers, and leave the Secondary Axis checkboxes unchecked.

Change Chart Type dialog for three clustered column chart and three line chart series

The resulting combination chart looks just like the preview (well, perhaps I’ve formatted it a bit).

Clustered Column and Line Chart - First Attempt

That was pretty easy, and it looks pretty good, except for one thing. The markers are all centered in each category and are not aligned over their respective columns. The red, green, and blue markers are all centered on the green column.

This is a consequence of how Excel draws line charts. Each data point fits into a category, and it is centered on a category. Unlike clustered column charts, where the points (the columns) are distributed within a category according to properties Gap Width and Overlap.

So we can’t use line chart series to align markers over columns, but all is not lost. We can instead use XY scatter series, because Excel will plot them along the category axis as if the category axis has a numerical scale.

Equivalent Category Axis Scaling

Let’s examine our simple column chart, with three text labels (categories) along its category axis.

Category Axis with Three Text Categories

Numerical Category Axis Scaling?

If we plot XY scatter data on the chart, Excel treats the categories as if the first category is at X=1, the second at X=2, and so on.

Category Axis with Three Numerical Categories

For the XY scatter data, we can consider the axis as a continuous numerical scale starting at the first category number minus 0.5 and ending at the last category number plus 0.5, or in our example, from 0.5 to 3.5. Each category takes up the space from the category number minus 0.5 to plus 0.5.

Category Axis with a Continuous Numerical Scale

We can calculate X values for our XY data to position the markers wherever we want them. If we had used X values of 1, 2, and 3, our XY series would line up on the green columns just like our line chart series. But we can see that our Red X values need to be a little less and the Blue X values a little more than the category number.

How do we know the precise X values? We can guess and adjust them by trial and error, but we can also calculate them easily enough (you’re not afraid of a few formulas, are you??).

Gap Width and Overlap

Excel’s column and bar charts use two parameters, Gap Width and Overlap, to control how columns and bars are distributed within their categories.

Gap Width is the space between bars in adjacent categories, given as a percentage of the width of a column in the chart. The default is 219%, which means the gap is 2.19 times the width of a column.

Overlap is the amount that columns in one series overlap columns in the next series within a category. The default is -27% (below left), meaning that there is a space 0.27 times the width of a column between adjacent series. A positive overlap of 27% (below right) means that the columns actually overlap by 0.27 times the width of a column.

I don’t know where Microsoft came up with these default values, but they are easy to adjust. Select any series and press Ctrl+1 (the universal shortcut to open the formatting user interface for the selected object in Excel). In the task pane, move the sliders or type in the desired values. I usually choose a gap width of 0.5 to 1.5 (and most frequently 0.75 to 1.0), and an overlap of zero (so adjacent series of columns are touching).

Calculating X Values

So let’s set up the calculations. Here are some variables:

BW: width of a bar (or column) = 100 GW: gap width as percentage of BW OL: overlap as percentage of BW iSrs: the number of the series nSrs: the number of series iCat: the number of the category

The X value of a point is the X value of the left edge of a category plus the distance the point is from that edge of the category divided by the width of a category.

The left edge of a category is the category number minus 0.5:

Category Edge = iCat - 0.5

Calculating X from Gap Width and Overlap

The red marker is half a gap width plus half a bar width from the edge of the category.
The green marker is half a gap width plus one and a half bar widths plus one overlap from the edge of the category.
The blue marker is half a gap width plus two and a half bar widths plus two overlaps from the edge of the category.

The general expression is:

Marker Distance = 0.5 * GW + (iSrs - 0.5) * BW - (iSrs - 1) * OL

The width of a category is the gap width (half on each side) plus the bar width times the number of series plus the overlap times the number of series minus one. Actually minus the overlap, because a negative overlap adds to the total width:

Category Width = GW + nSrs * BW - (nSrs - 1) * OL

The complete expression for the X value of a marker is:

X = iCat - 0.5 + [0.5 * GW + (iSrs - 0.5) * BW - (iSrs - 1) * OL] / [GW + nSrs * BW - (nSrs - 1) * OL]

We can simplify this formula since all red markers are offset within a category by the same amount, as are all blue markers, and all red markers. We can then calculate each offset only once and use a lookup formula for each point.

X = iCat + Xoffset Xoffset = - 0.5 + [ 0.5 * GW + (iSrs - 0.5) * BW

Here is the data range I have set up:

Range of calculations for X values of XY series in combination chart

B2:H6 contains my original column and line chart data. B15:D18 is where I calculate the Xoffset values based on values of Gap and Overlap entered in the appropriate cells. The formula in D16 (filled down to D18) is:

=($B$16/2+($C16-0.5)*100-($C16-1)*$B$18)/($B$16+MAX($C$16:$C$18)*100-(MAX($C$16:$C$18)-1)*$B$18)-0.5

B8:H13 contains the helper row of category numbers in column B and the helper row of series numbers in row 13. There are X and Y values for the three XY series. The Y values come from F4:H6 in the table above. The X values in cell C10 (and copied into C10:C12, E10:E12, and G10:G12) are calculated by this formula:

=$B10+XLOOKUP(C$13,$C$16:$C$18,$D$16:$D$18)

Clustered Column and XY Scatter Combination Chart

There are several ways to generate the combination chart. You can start with a column chart with three series then add the XY data. To add the Red XY data, copy the range C9:D12, select the chart, and choose Paste Special from the Paste dropdown on the Home tab of the ribbon. Select the appropriate options in the dialog: New Series, Values in Columns, Series Names in First Row, Categories in First Column, and DO NOT CHECK Replace Existing Categories.

The data is added as another column series; we’ll fix that shortly. Repeat with the Green data in E9:F12 and the Blue data in G9:H12. Here is the column chart:

Convert to a combination chart as we did above for the column-line chart. Right-click on any series, and select Change Series Chart Type from the pop-up menu. Change the chart type of the last three series to Scatter with Straight Lines and Markers, and UNCHECK the Secondary Axis checkbox for all XY series.

Change Chart Type dialog for three clustered column chart and three XY scatter chart series

Another way to generate the combination chart is to build a column chart using the block of data in C3:H6, changing the last three series to XY Scatter, then changing the data for these three series. Change the chart type before changing the data.

Changing the data is easy. You could right=click on the chart and choose Select Data from the pop-up menu and wrangle with that uncomfortable dialog, but my favorite way is to edit the SERIES formula directly. For example, the formula for the red scatter series starts out as:

=SERIES(Sheet1!$G$3,Sheet1!$C$4:$C$6,Sheet1!$G$4:$G$6,4)

but I edit it to:

=SERIES(Sheet1!$D$9,Sheet1!$C$10:$C$12,Sheet1!$D$10:$D$12,4)

You can type the new row and column addresses, or you can select a reference in the formula (select both the sheet name and the cell address), then drag to select the new data range in the worksheet with the mouse.

The Finished Clustered Column and XY Scatter Combination Chart

Here is the complete data range plus the chart.

Calculations and chart for 219% gap width and -27% overlap

Looks good. Let’s check with the same gap width but a positive overlap.

Also looks good. How about a smaller gap and zero overlap?

Calculations and chart for 150% gap width and zero overlap

Still looks good. What if we add a series and a category?

Calculations and chart for 100% gap width and zero overlap, with an added category and series

Looks good, except that all those lines make the chart a bit cluttered. What if we plot markers without lines?

Clean up the chart with markers and no lines

The calculations work and the cleaned-up chart looks pretty good.

Precision Positioning of XY Data Points

Wednesday, October 14, 2020 by Jon Peltier 2 Comments

Often when I plot data in a column, line, or area chart, I want to plot additional points on that chart. For this I use an XY Scatter type series for the extra data points. These added points may be used for additional labels or other purposes.

There are a few tricks for positioning of XY data points. I use these tricks in many of my tutorials, and I describe them in the protocol. But it’s probably good to have a single dedicated page, like this, dedicated to precision positioning of these extra points.

Long ago, when this blog was very young and I was not so old, I wrote Stacked vs. Clustered, which compared stacked and clustered column charts and described what each is well suited for. I included the following throwaway image; well, I considered it a throwaway, until a reader asked how I added the lines and markers.

Thanks for the nudge, dear reader. What follows is the protocol for adding those markers so precisely to the chart.

Clustered Column Chart

Here is my data and my starting column chart.

Column chart of Quarterly Sales by Region

What I could do, of course, is add an XY Scatter series on the secondary axis, then adjust my X and Y values and the secondary axis limits until the points are positioned appropriately. But that is tedious to do in the first place, and if the original data changes, the tweaked X and Y values would probably need readjusting.

But Excel is nothing if not extremely flexible. I can plot my XY data points on the same primary axis as the column data, with my tweaked data points based on the column chart configuration. This means that all XY points will stay in the same position relative to the column chart data, with a minimum of adjustment.

Column Chart Axis Measurements

The two charts below illustrate how scatter chart X values can be calculated based on the column chart’s configuration.

First of all, each category of the column chart has a number, from 1 to the total number of categories. The chart below has categories numbered from 1 to 3. If I use these numbers for my scatter chart X values, the points would be centered on the categories (between the orange and blue columns).

I merely have to calculate how far to the left and right of each category I need my XY data points to be. I need to know my Gap Width and I need to know how many series there are.

The Gap Width is a number, stated as a percentage of the width of a column, that tells me how wide the gap is between the clusters of columns. A Gap Width of 100 means that the white space between clusters is 100% as wide as a column. In the schematic, I have indicated that the columns are 100 percent wide, so the distance between cluster centers is 100 times the number of series, plus the Gap Width. In this case, the Gap Width is 200%, so my categories are 600% from center to center.

Each category in a column chart has an X value of 1 to the number of categories. Columns are 100% wide, and the Gap Width is stated in percentage of a column width.

The X axis of a line chart or an area chart work in exactly the same way. A bar chart is a different story; the approach is similar, but it is more complicated, and you do need to use secondary axes.

The Algebra

The amount my data points need to be offset is easy to calculate. For example, to center a point on the tallest bar in the chart, I start with the category position, then add half the bar’s width divided by the distance to the next category:

X = 1 + 50 / (4 * 100 + Gap Width)

which turns out to be 1.08333333

Since I want my points at the top of the columns, I use the same Y values for the XY points.

Below is my data for the XY points.

I put the name of the category in the first column, and the offset (in column width percentages) from the center of the category in the second. The formula in cell C9 is

=MATCH(A9,$B$1:$D$1,0)+B9/(4*100+$F$9)

The MATCH function looks up the category name and returns the number of the category (1 for North, etc.). Cell F9 contains the Gap Width.

I’ve inserted a blank row between categories, so there is a gap in the line between categories.

Calculated X values for XY scatter points

Adding the XY Series

Copy the XY data in C9:D22. Select the chart, and use Paste Special from the Home tab of the ribbon. Click on New Series, Series in Columns, Series Names in First Row, and Categories in First Column.

Because it’s a column chart, Excel adds the data as another set of columns.

Right click on any series in the chart, and choose Change Series Chart Type. Scroll to the bottom of the list of series and select the added series (“Connect”). Use the Chart Type dropdown to select the XY Scatter with Markers and Straight Lines option (NOT a Line Chart option), and uncheck the Secondary Axis checkbox.

This results in the XY data points we want, except the lines and markers are in an ugly shade of green (well, for this particular color scheme).

The XY data points are there, but sure are ugly

But we know how to fix that.

Adjusting Gap Width

If I change the Gap Width in my chart, from 200% above to 75%, the XY data points are no longer lined up.

Data points misaligned by changing Gap Width

But all that is needed is to change the Gap Width in cell F9 of my calculation range; the X values recalculate and the chart looks great again.

Combination Charts

Pareto Data

Native Excel Pareto Chart

A Simple “Pareto Chart” with Counts Only

DIY Pareto Chart: Counts and Cumulative Percentages

Pareto Chart Option: Counts and Cumulative Counts

Pareto Chart Option: Percentages and Cumulative Percentages

Pareto Chart Option: Stepped or Waterfall

Pareto Chart Option: Diagonal Lines

Pareto Charts in Peltier Tech Charts for Excel

More Articles About Combination Charts

About the Combination Chart

Creating the Combination Chart

Chart Data From a Different Region

About the Test Program

Exploring the Data

Simple Charts

Pivot Table and Charts

Calculating X Values

Building the Chart

Create the Chart

Add the Test Results

Setup and First Attempt

Equivalent Category Axis Scaling

Numerical Category Axis Scaling?

Gap Width and Overlap

Calculating X Values

Clustered Column and XY Scatter Combination Chart

The Finished Clustered Column and XY Scatter Combination Chart

More Combination Chart Articles on the Peltier Tech Blog

Clustered Column Chart

Column Chart Axis Measurements

The Algebra

Adding the XY Series

Adjusting Gap Width

More Combination Chart Articles on the Peltier Tech Blog