This Surplus-Deficit chart shows the history of surpluses and deficits (positive and negative values) over the past few years. The two colors help show when the deviation was positive and when it was negative. It takes a little bit of Excel magic to get that effect to work properly in an Excel area chart.
I showed how to make this kind of chart in an old tutorial, Area Chart – Invert if Negative. That approach required repeating the data in a new range, inserting a row after every row of data, and doing some interpolation to get the chart data right.
But I was looking over a “visual vocabulary” by my colleague Jon Schwabish of Policy Viz, a rough draft summary of many chart and visualization types and guidelines for building them in Excel. When it’s ready for public consumption, I’ll be sure to post a link.
Jon showed a surplus-deficit area chart, and on inspection, it wasn’t exactly right. He went to my earlier post and fixed it, but in the meantime, I thought of a way to do the calculations in a more compact way, supplementing rather than supplanting the original data.
Surplus-Deficit Column Charts
For this tutorial I will use a simple data set, with monthly surplus-deficit data. I’ve shaded this original data blue, so you can follow it in the long protocol. Note that the data is in an Excel Table, which has lots of magic of its own.
You can make a surplus-deficit chart with any number of chart types. Below is a simple column chart of this data showing surpluses and deficits in a simple data set.
Easy to make, easy to read: surpluses grow upwards from the horizontal axis, deficits downwards. Excel makes it easy to show positive and negative values in different colors in bar and column charts. Format the bars with the main (positive) color, then in the Format Fill part of the Format Series task pane, check the Invert If Negative box, and select the color for the negative values.
Surplus-Deficit Area Charts
The Two-Color Area Chart
In general, I prefer columns for this kind of data rather than areas: it always seems to me that an area chart minimizes the points on either side of an axis crossing, and overemphasizes points in the middle of a positive or negative run.
But area charts have their place, so let’s make an area chart of the data:
Let’s format the colors like we did for the column chart. Main color, blue, that’s fine. But there’s no Invert If Negative setting. Let’s stop and scratch out chins for a while. What if we have separate Pos and Neg series in the chart? We can always format them with the desired different colors.
Here’s the data with Pos and Neg data columns added. I’ve used these formulas for Pos and Neg respectively:
Pos: =MAX(0,B2)
Neg: =MIN(0,B2)
Here’s the (unstacked) area chart showing Pos and Neg:
It’s not exactly right. The sloped edges of the different colored should meet on the horizontal axis, along the lines shown in the following chart:
Our data isn’t complete enough. We only have points at the vertices of the surplus-deficit data, indicated by the circular markers in this line chart:
That’s fine for a one-series area chart. But in our Pos and Neg data, here are the points we have:
Adjusting the Data
For the areas to line up correctly, we would need data points at the axis crossings between vertices, like this for the one-series line chart:
…or like this for the Pos Neg chart:
This is where we need to interpolate our area chart data. In my old tutorial, I inserted rows to make room for interpolation, like this. The blue is our original data, and the gold shaded rows are the interpolated data. The green shaded row at the top is a placeholder to make sure I have as many interpolated rows as original rows, for when I adjust the data in a minute.
In the Value column, the formula in the gold cell B4
compares the values in the blue cells above and below it (B3
and B5
). If their product (B3*B5
) is positive, it means that both have the same sign, so there is no axis crossing, and the cell will show #N/A
, to suppress plotting of that point. If their product is negative, it means the blue cells have different signs; the line must cross in between, so the cell contains zero for the Y value at the crossing. Don’t worry about the formulas yet, they need to be redone when the table is adjusted.
In the Date column, if there is no axis crossing (product of blue Value cells B3*B5
is positive or gold Value cell contains #N/A
), then the gold Date cell contains #N/A
, to suppress plotting. If there is a crossing (product of blue Value cells is negative or gold Value cell contains zero), the gold Date cell A4
contains an interpolated value between the two blue Dates A3
and A5
.
Anyway, I realized that I can add columns to the data table for these calculations. To illustrate, I’ll adjust my data in two steps. First I’ll move the gold and green cells to the right by four columns:
Now I’ll remove the blank cells between rows.
And we no longer need the gold and green to distinguish interpolated rows.
The table contains the original data in the blue columns, and these formulas in the other columns
Pos: =MAX(0,B2)
Neg: =MIN(0,B2)
Date 2: =IF(ROW()=ROW(INDEX([Date],1)),NA(),IF(B1*B2>0,NA(),
B1/(B1-B2)*(A2-A1)+A1))
Value 2: =IF(ROW()=ROW(INDEX([Date],1)),NA(),IF(B1*B2>0,NA(),0))
Pos 2: =IF(ROW()=ROW(INDEX([Date],1)),NA(),IF(B1*B2>0,NA(),0))
Neg 2: =IF(ROW()=ROW(INDEX([Date],1)),NA(),IF(B1*B2>0,NA(),0))
There are a few tricks. The first IF
looks at the condition
ROW()=ROW(INDEX([Date],1))
which is true if the row is the first data row of the table. And if it’s true, the formula uses NA()
to put #N/A
into the cell.
The second IF
looks at B1*B2
. If values in both B1
and B2
are positive or both are negative, the result of multiplication is positive, so the formula returns #N/A
. If they are mixed, we get a negative result, so the formula puts zero into the cell.
We’re never actually multiplying B1
and B2
, which would produce a #VALUE!
error, because cell B1
contains a text header label. In that row of the table, the first IF
takes precedence, because it already has determined that the cell is in the first data row and the value should be #N/A
.
Adjusting the Chart
Now we just need to get this additional data into the chart. The original Pos-Neg area chart looks like this.
The two series formulas look like this:
Pos: =SERIES(Sheet1!$C$1,Sheet1!$A$2:$A$13,Sheet1!$C$2:$C$13,1)
Neg: =SERIES(Sheet1!$D$1,Sheet1!$A$2:$A$13,Sheet1!$D$2:$D$13,2)
We need to change the X values fromSheet1!$A$2:$A$13
to(Sheet1!$A$2:$A$13,Sheet1!$E$2:$E$13)
And the Y values fromSheet1!$C$2:$C$13
to(Sheet1!$C$2:$C$13,Sheet1!$G$2:$G$13)
andSheet1!$D$2:$D$13
to(Sheet1!$D$2:$D$13,Sheet1!$H$2:$H$13)
You can make the changes by typing in the series formula directly, or by using the Select Data dialog. After changing the series formulas, the areas in the resulting chart align across the axis as desired.
It’s a nice little trick, and it works for area charts and other charts that have category axes that can display a Date scale. These charts internally sort by date before plotting the data. So we can put the data anywhere in the list, even in a separate column that makes up a different part of the list.