Fill Below Standard Normal Distribution
A common request I get is to show someone how to apply shading under part of a plotted curve. This can be done easily using a combination chart. Combination charts may sound difficult, but they are actually pretty easy. We could use a line and area chart combination, but that’s not smart enough, so we’ll use an XY and area chart combination.
Plotting the normal curve, in fact, plotting any function, is an easy task. Put the X values for the function into one column of a worksheet, then calculate the Y values in the next column, and plot this data. The data for the standard normal distribution is shown here (it extends down to row 65).
The X values in column A start at -3 in cell A5 and increment by 0.1 down to X=+3 in cell A65. This formula in cell B5 to calculate the standard normal distribution is copied into the range B5:B65.
The data for the area chart (at least for the first cut) is shown below:
Cells C1 and C2 contain the minimum and maximum X values to fill below.
This formula in cell C5 is copied into C5:C65.
First Cut at Filling Below a Plotted Line
This first exercise will show that, for this particular example, a line and area chart combination isn’t sufficient. For other uses, such a combination may be perfect.
This is the data in A4:B65 plotted in an Excel line chart. Note that the labels along the X axis do not denote numerical values. A line chart treats anything in the X values range as nonnumeric labels and plots data at equally spaced intervals along X.
This is the data in A4:C65 plotted in an Excel line chart.
This is how it looks when the “partial” line chart series is converted to area chart type.
This doesn’t work quite right. The edges of the shaded region are not vertical. Since the default text-style axis of a line chart (also used by area charts and column charts) plots each point above a category label, the edge of the shaded area slopes from the value at one category to the value at the next.
Filling Below a Plotted Line, Done Right
We’ll need a more clever way to get our chart, so we’ll use an XY and area chart combination.
We can take advantage of the behavior of the date-style axis of a line chart. This treats numerical X values as dates starting at 1 for January 1, 1900, and incrementing up one unit for each day. In fact, it starts at zero for January 0, 1900. Values are plotted horizontally according to date, so they no longer need to be equally spaced, if the days between points are not equal. You can even plot multiple values for the same date, and the points will be plotted along a vertical line. This is what we want for our shaded region.
Since the date axis only recognizes whole days as whole numbers, and ignores any fractions, we can’t use the same scale as the -3 to +3 for the normal curve, That is also no-go because the date axis only recognizes non-negative dates. But we can normalize our X values so they run from 0 to 10,000 (which is the same as 5/18/1927). The data table is shown below.
A few constants are included in the first two rows of the worksheet. E1, E2 contain min and max X values for the X axis of the normal curve. G1, G2 contain min and max X for highlighting under curve; change these numbers to change which region is shaded. F1 is the max for the area chart’s date axis (the minimum is zero).
The formula to calculate the standard normal curve is the same as in the previous example with the line chart. A5:A65 contains values of -3 to +3, incremented by 0.1, while cell B5 contains this formula, which is copied into B5:B65.
The first two and last two rows of the area chart data range contain dummy values for the unshaded ends of the chart. The formula in cell E5 is:
The formula in cells E6, E7 is:
The formula way down in E57, E58 is:
The formula in E59 is:
Cells G5, G6, G58, G59 contain the value zero.
The range D7:D57 have values from zero to one, incremented by 0.02. I judged that this spacing has more than sufficient resolution for a chart 400 or so pixels wide.
The formula in E8:E56 computes the X values used to compute the normal curve for the area chart.
The formula in F5:F58 converts these X values into “dates” for the area chart’s X axis.
Finally, the formula in G7:G57 computes the standard normal values plotted by the area chart.
Let’s start by selecting F4:G59 and inserting an area chart. Doesn’t matter if you use stacked area or unstacked, since there’s only one area series in the final chart, but I used unstacked.
Convert the default text-style axis to a date-style axis. Make sure it is scaled from zero to 10,000.
Copy the XY data in A4:B65, select the chart, and use Paste Special to add the data as a new series, with categories in first column and series names in first row. This data is added as another area series.
Convert the new series to an XY type. If it doesn’t automatically go onto the secondary axes (and you don’t see axes on all four sides of the plot area), format the new series, and choose Secondary Axis.
Align the curve and the area by setting the secondary X axis (top of chart) so it ranges from -3 to +3.
We want to use the top horizontal axis along the bottom of the chart, so we need to switch the horizontal axes. You have to format the corresponding vertical axis to set this. Format the secondary vertical axis (right of chart) so that the horizontal axis crosses at the automatic position, which means at zero.
Now format the primary vertical axis (left of chart) so the horizontal axis crosses at the maximum axis value.
Oh noes! the axis is where we want it, but now we’ve shaded ABOVE the plotted curve. This is because area charts and column charts start at the horizontal axis and rise upward or hang downward to reach the plotted value. Unless the corresponding vertical axis is missing, so select the primary vertical axis (left of chart) and click Delete. All series in the chart use the secondary Y axis if it’s the only one available.
Now delete the primay X axis (top), or hide it by formatting it so it has no line, no tick marks, and no tick labels.
Finally, hide the secondary Y axis (right of chart) by formatting it so it has no line, no tick marks, and no tick labels.
Remove the legend and write a descriptive chart title, and the chart is done.
As promised, changing the values in G1:G2 change where the curve is shaded. Instead of -1.75 to 0.75, let’s change the values to ± 1.5.
Or let’s highlight the curve from 1 to 2.
See how easy it is to use a combination chart to highlight some or all of the area under a curve.