LOESS (LOWESS) Regression
Described by William Cleveland in 1979, LOESS is a technique for smoothing data characterized by a lot of scatter. Essentially, as the data is smoothed, a moving subset of the data, weighted by distance from the center of the moving range, is fitted to a linear or quadratic regression, and a smoothed value is computed. This regression is reweighted and recomputed for a subset of the data for each output point that is calculated.
I described how this can work in Excel VBA in LOESS Smoothing in Excel in the Peltier Tech Blog. Later I developed a LOESS Utility for Excel, then improved (LOESS Utility – Awesome Update) and documented (LOESS Utility – What the Buttons Mean) this utility. Finally I incorporated it into Peltier Tech Charts for Excel, and I have made improvements to it several times since then.
Peltier Tech Charts for Excel offers LOESS regression in a LOESS Calculator which output static smoothed values to a worksheet. It is also available as a dynamic worksheet function which can be used in worksheet formulas.
LOESS Regression in the Peltier Tech Ribbon
The LOESS Calculator is launched by clicking the LOESS button in the Data group, near the far right end of the Peltier Tech ribbon.
The LOESS Calculator is available in Standard and Advanced Editions of Peltier Tech Charts for Excel; the Quadratic Smoothing option is only available in the Advanced Edition.
Peltier Tech LOESS Calculator Dialog
The Peltier Tech LOESS Calculator dialog contains several options. Many of these options are saved for the next time you open the dialog.
LOESS Data
Data inputs for the LOESS Calculator must be aligned in columns.
When selecting data, you can select the entire amount of data, or you can select just the top row and let the calculator figure out the height of the data range. Don’t select headers, just numerical data.
The data need not be sorted; the calculator internally sorts the data prior to computing any regressions. The calculator ignores blank cells and non-numeric values.
Nominally four columns are used in a smoothing operation.
- X input and Y input – raw data used to compute the smoothed curve
- X output and Y output – the program calculates a Y output value for each X output value
- X input and X output can be shared, that is, Y output values can be calculated using each X input value as an X output value.
Note: the LOESS Calculator will overwrite any values in the Y output range with new calculations. These calculations are done in VBA, and the cells are populated with the static values that result.
LOESS Data Arrangements
Numerous data arrangements are accommodated, corresponding to the buttons in the dialog. These make data selection easier when the data is appropriately arranged in your workbook. All parameters are stored to repopulate the dialog when the analysis is repeated.
One single range. Shared X input and X output values in first column, Y input in second column, Y output will go into third column. See example 1.
Multiple three-column blocks can be specified. See example 2.
Two separate ranges. Shared X input and X output values in first column, Y input in second column of first range. Y output will go into second range. More than one column of Y input and Y output values can be specified.
Three separate ranges. X input in first range. Y input in second range. X output in first column and Y output will go into second column of third range. More than one column of Y input and Y output values can be specified.
Two separate ranges. X input and Y input values in first and second columns of first range. X output in first column and Y output will go into second column of second range. More than one column of Y input and Y output values can be specified. See example 3.
Three separate ranges. X input and Y input values in first and second columns of first range. X output in second range. Y output will go into third range. More than one column of Y input and Y output values can be specified.
Three separate ranges. X input in first range. Y input in second range. X output in first column and Y output will go into second column of third range. More than one column of Y input and Y output values can be specified.
Four separate ranges. X input in first range, Y input in second range. X output in third range. Y output will go into fourth column. More than one column of Y input and Y output values can be specified. See example 4.
Other Settings
Settings are stored in each worksheet and reused when the LOESS Calculator is called again.
Extrapolate Above/Below Input X Range
If you use distinct X input and X output ranges, some X output values may fall outside the range of X input values. It is possible to calculate smoothed values beyond the input range, but such extrapolation can be dangerous. By default the calculator ignores X output values outside the X input range, but you can override this by checking the boxes.
Smoothing Parameter, α
Alpha (α) is typically between 0.25 and 0.5, and 0.33 is a commonly used setting. The LOESS Calculator sets an initial value of 0.33, and computes the corresponding number of points in the moving regression.
Points in Moving Regression, Npts
This is the number of points used in the moving regression. It is a fraction α of the total number of input points. The program saves Npts, not α, for subsequent analyses.
Quadratic Moving Regression
LOESS is commonly performed using a linear, first-order regression over the moving range. This can lead to some unnaturally straight sections of smoothed output data near the beginning and end of the data range.
The LOESS Calculator offers a moving quadratic regression, which may avoid these straight sections near the ends of the smoothed curve. This requires a heavier processor load, but it is not as much slower as expected.
The Quadratic Moving Regression option is only available in the Advanced Edition of Peltier Tech Charts for Excel.
LOESS Calculator Examples
Click this link to download a simple worksheet with data from the NIST Engineering Statistics Handbook: NIST-Example-LOESS-Data.xlsx.
Example 1 – Simple One-Area Data Range
This example uses data from the Example LOESS Computation. It uses the first data arrangement, with shared X input and X output data in the first column, Y input data in the second column, and a blank column for Y output (smoothed) values.
When the dialog comes up, click on the first data arrangement button (top left). The three unneeded data entry boxes are hidden.
Click the dash button at the right of the data entry box. A small input box appears. Select the top of the data range (A2:C2) with the mouse.
Click OK and the dialog is populated with the data range. The default value of 0.33 is entered for alpha, and the calculated value of 7 is entered for the number of moving range points. You can adjust either of these values.
None of the checkboxes for extrapolation or quadratic analysis are checked. Click OK, and the smoothed values are inserted into the worksheet.
Example 2 – Multiple Data Ranges
example 2
When the input box appears for you to select the data range, select the first range, then hold the Ctrl key while selecting subsequent ranges. There is no built-in limit to the number of ranges that can be entered. Addresses of all of the selected ranges appear in the input box, separated by commas.
The dialog reflects the multiple selected ranges.
Click OK to perform all of the smoothing operations at once.
Example 3 – Distinct X Input and X Output
You may want to calculate smoothed Y outputs for different X values than are in your data set. In our sample, the input data is not uniformly distributed across the X range, but we’d like the outputs to be distributed.
We’ll use the same input data as in Example 1, but with a different set of X outputs. Note that the output range may have a different number of rows (data points) than the input range.
Here is the dialog, with the appropriate data button clicked and the two data entry boxes filled.
Click OK to calculate the smoothed outputs. Note that the first and last Y output cells are blank. The corresponding X output values lie outside the X input range, and the Extrapolate checkboxes were unchecked in the dialog.
Example 4 – Multiple Y Inputs and Y Outputs
Most of the data arrangements allow multiple Y inputs and outputs. Often your data consists of multiple Y variables collected over time, and you’d like to analyze them together. In our data below, we have one set of X values (call them time points), with two Y values at each X value.
The data looks like this.
Since the X and Y values are not connected, we need to use the last data arrangement, which uses four data entry boxes, one each for X input, Y inputs, X output, and Y outputs. There are as many Y output columns as Y input columns. Here is the dialog, populated and ready to go.
Click OK to populate the Y output value range.
Here is our chart with input data points and smoothed curves.
LOESS Worksheet Function
In addition to a LOESS Calculator that produces static values in a worksheet, Peltier Tech Charts for Excel offers a worksheet function that calculates dynamic values live.
The function is PT_Loess, and is available when the workbook is open on a computer that has Peltier Tech Charts for Excel installed. If the workbook is opened on a computer that does not have the add-in installed, the cells will display an error.
The function syntax is:
PT_Loess(InputX, InputY, OutputX, NumberOfPoints,
[ExtrapolateAbove], [ExtrapolateBelow], [QuadraticRegression])
The names of the function arguments should be obvious from the corresponding LOESS Calculator dialog options.
Here is our sample data set. Note that cell C2 is the active cell, and we can see the LOESS formula in the Formula Bar. The formula is
=PT_Loess($A$2:$A$22,$B$2:$B$22,A2,7,FALSE,FALSE,FALSE)
The worksheet function uses the same calculation code as the LOESS Calculator, so the results are identical.