Peltier Tech Blog

Excel Chart Add-Ins | Training | Custom Solutions | Charts and Tutorials | PTS Blog

 

Main menu:

 
Peltier Tech
Chart Add-Ins

Peltier Tech Waterfall Chart Utility
Peltier Tech Cluster-Stack Chart Utility
Peltier Tech Box and Whisker Chart Utility
Peltier Tech Marimekko Chart Utility
Peltier Tech Dot Plot Utility
Peltier Tech Cascade Chart Utility

 
Excel Dashboards

Subscribe

Site search

Subscribe

Site search


Recent Posts

Recently Commented

Popular Posts

Archive


 

Categories


 

Privacy Policy

Creative Commons License
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

LOESS Smoothing in Excel

by Jon Peltier
Monday, March 9th, 2009
Peltier Technical Services, Inc., Copyright © 2010.
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

In 1979 William Cleveland published the LOESS (or LOWESS) technique for smoothing data, and in 1988 he and Susan J. Devlin published a refined version of the technique (references are given at the end of this article). For each X value where a Y value is to be calculated, the LOESS technique performs a regression on points in a moving range around the X value, where the values in the moving range are weighted according to their distance from this X value.

The NIST Engineering Statistics Handbook has a good description of the LOESS technique, including a worked example. A commenter named Nick used the NIST chapter as a starting point for his implementation of a LOESS function for Excel, and he posted in in a comment on JunkCharts. Nick’s approach was to create a UDF in VBA. The UDF accepts as inputs the X and Y data ranges, the number of points to use in the moving regression, and the X value for which to calculate Y. Nick’s UDF used Dectionary objects to hold intermediate values, and it outputs the Y value for the input X value.

I’ve expanded on Nick’s starting point, and produced the function presented later in this article. I’ve discarded the Dictionary objects in favor of VB arrays. It accepts the input X and Y data and the output X values either as ranges or as vertical arrays, and it outputs the calculated LOESS Y values as a vertical array. This means it can be called from within other procedures using arrays, or as a UDF from a worksheet, as an array formula. The original data must be sorted by X in either ascending or descending order (Nick’s Dictionaries do not require sorted input data, and I have an idea to remove the requirement from my function).

To use the function as a UDF, select the multicell output Y range, and enter this formula:

=loess(C2:C22,D2:D22,F2:F21,7)
 

where C2:C22 and D2:D22 are the input X and Y ranges, F2:F21 is the output X range, and 7 is the number of points in the moving regression (see screenshot below).

LOESS in Excel Worksheet 1

Enter this as an array formula by holding Ctrl and Shift while pressing Enter, and the selection fills with the calculated Y values. Note the curly braces around the formula in the formula bar, which indicates the formula is an array formula.

LOESS in Excel Worksheet 2

This chart shows the original NIST data points and the smoothed LOESS curve.

LOESS in Excel - NIST Data and Smoothed Curve

In Local regression, Wikipedia has a decent description of LOESS, with some pros and cons of this approach compared to other smoothing methods.

Example Uses of LOESS

This chart compares LOESS smoothing of website statistics with a simple 7-day moving average. The LOESS captures the major trends in the data, but is less severely affected by week to week fluctuations such as those occuring around Thanksgiving and over the year-end and New Year holidays.

LOESS vs Moving Average

Using LOESS to analyze the body mass indexes (BMI) of Playboy playmates gives more insights than linear regression over the whole data set or over portions of the data. See the discussion in Wired Relates Playboy Playmate BMI and Average BMI, 1954-2008 on the FlowingData blog.

Playmate BMI over time

The LOESS Function

Public Function LOESS(X As Variant, Y As Variant, xDomain As Variant, nPts As Long) As Double()
  Dim i As Long
  Dim iMin As Long
  Dim iMax As Long
  Dim iPoint As Long
  Dim iMx As Long
  Dim mx As Variant
  Dim maxDist As Double
  Dim SumWts As Double, SumWtX As Double, SumWtX2 As Double, SumWtY As Double, SumWtXY As Double
  Dim Denom As Double, WLRSlope As Double, WLRIntercept As Double
  Dim xNow As Double
  Dim distance() As Double
  Dim weight() As Double
  Dim yLoess() As Double

  If TypeName(X) = "Range" Then
    X = X.Value
  End If

  If TypeName(Y) = "Range" Then
    Y = Y.Value
  End If

  If TypeName(xDomain) = "Range" Then
    xDomain = xDomain.Value
  End If

  ReDim yLoess(LBound(xDomain, 1) To UBound(xDomain, 1), 1 To 1)

  For iPoint = LBound(xDomain, 1) To UBound(xDomain, 1)

    iMin = LBound(X, 1)
    iMax = UBound(X, 1)

    xNow = xDomain(iPoint, 1)

    ReDim distance(iMin To iMax)
    ReDim weight(iMin To iMax)

    For i = iMin To iMax
      ' populate x, y, distance
      distance(i) = Abs(X(i, 1) - xNow)
    Next

    Do
      ' find the nPts points closest to xNow
      If iMax + 1 - iMin <= nPts Then Exit Do
      If distance(iMin) > distance(iMax) Then
        ' remove first point
        iMin = iMin + 1
      ElseIf distance(iMin) < distance(iMax) Then
        ' remove last point
        iMax = iMax - 1
      Else
        ' remove both points?
        iMin = iMin + 1
        iMax = iMax - 1
      End If
    Loop

    ' Find max distance
    maxDist = -1
    For i = iMin To iMax
      If distance(i) > maxDist Then maxDist = distance(i)
    Next

    ' calculate weights using scaled distances
    For i = iMin To iMax
      weight(i) = (1 - (distance(i) / maxDist) ^ 3) ^ 3
    Next

    ' do the sums of squares
    SumWts = 0
    SumWtX = 0
    SumWtX2 = 0
    SumWtY = 0
    SumWtXY = 0
    For i = iMin To iMax
      SumWts = SumWts + weight(i)
      SumWtX = SumWtX + X(i, 1) * weight(i)
      SumWtX2 = SumWtX2 + (X(i, 1) ^ 2) * weight(i)
      SumWtY = SumWtY + Y(i, 1) * weight(i)
      SumWtXY = SumWtXY + X(i, 1) * Y(i, 1) * weight(i)
    Next
    Denom = SumWts * SumWtX2 - SumWtX ^ 2

    ' calculate the regression coefficients, and finally the loess value
    WLRSlope = (SumWts * SumWtXY - SumWtX * SumWtY) / Denom
    WLRIntercept = (SumWtX2 * SumWtY - SumWtX * SumWtXY) / Denom
    yLoess(iPoint, 1) = WLRSlope * xNow + WLRIntercept

  Next

  LOESS = yLoess

End Function
 

The LOESS Utility

The flexibility of this LOESS function will make it easy to encapsulate into an add-in that uses a dialog to facilitate user selection of data and parameters. A working version uses the following dialog:

Dialog for LOESS Utility

Update 24 June 2009

A LOESS utility for Excel has finally been made ready for public consumption. It is described in LOESS Utility for Excel, where there is a link to download the utility. It’s still in preliminary form, but runs pretty much trouble free. Users are encouraged to comment on it to drive further development.

Update 8 October 2009

The LOESS utility for Excel has been updated, and the interface made more flexible. It is described in LOESS Utility – Awesome Update, where there is a link to download the new utility.

References

Cleveland, W.S. (1979), “Robust Locally Weighted Regression and Smoothing Scatterplots,” Journal of the American Statistical Association, Vol. 74, pp. 829-836.

Cleveland, W.S. and Devlin, S.J. (1988), “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting,” Journal of the American Statistical Association, Vol. 83, pp. 596-610.

NIST Engineering Statistics Handbook, 4.1.4.4. LOESS (aka LOWESS)

NIST Engineering Statistics Handbook, Example of LOESS Computations

Wikipedia, Local regression

Related Posts:

Bookmark and share this entry:
  • Twitter
  • Digg
  • Facebook
  • LinkedIn
  • del.icio.us
  • Technorati
  • StumbleUpon
  • Google Bookmarks
  • Reddit
  • MySpace
  • Slashdot
  • Yahoo! Buzz
  • Tumblr

Learn how to create Excel dashboards.

Comments


Comment from Colin Banfield
Time: Monday, March 9, 2009, 1:02 pm

Jon, nice stuff. I see that you’re added a link to this page on Wikipedia :D I haven’t done a lot of digging on the subject, but I was wondering how one would determine the optimum values for alpha and N. In moving average forecasing, for example, one would calculate the interval (simple moving average) or the damping factor (exponential smoothing) based on minimizing the mean absolute percentage error (which Excel’s ATP assumes you’re already calculated in your head!).


Comment from Jon Peltier
Time: Monday, March 9, 2009, 1:20 pm

Colin -

I don’t know how you’d optimize your alpha. A couple sources I looked at said “around 0.33″ or “below 0.5″, but I think you’re left to decide based on how it looks. If N gets too large, the curve doesn’t even follow most of the points, but if it isn’t large enough, the curve is too wiggly.

I used N=7 for the NIST data, because that’s what the example used, though I looked at a lot of other values. I used probably alpha=0.33 for the playmate data, and I looked at a few other values, but it didn’t make much difference. The web stats showed some difference with changing alpha, but mostly in the width and depth of the year-end dip.

I think the idea is to get a good overview which isn’t too badly affected by an outlying point.

If anyone has any guidelines other than “it looks nice”, please share with us.


Pingback from LOESS smoothing « Learning R
Time: Monday, March 9, 2009, 3:15 pm

[...] Posted by learnr under R | Tags: ggplot2, R |   Jon Peltier writes about the LOESS smoothing in Excel, and presents a utility to facilitate adding smoothers to the data. He goes on to show how to use [...]


Comment from TV
Time: Tuesday, March 10, 2009, 7:32 am

Is changing this to LOESS function to a 2nd order difficult? It seems like a first order fit leads to discontinuities in the derivative…


Comment from Jon Peltier
Time: Tuesday, March 10, 2009, 11:05 am

TV -

You’d have to overhaul everything after this comment:

    ' do the sums of squares

 
I did a quick Google search, and found one useful bit of information in a pile of results pages:

http://www.eng-tips.com/viewthread.cfm?qid=55381&page=1

For y=ax^2+bx+c

P=Sum(x)
Q=Sum(x^2)
R=Sum(x^3)
S=Sum(x^4)
T=Sum(y)
U=Sum(xy)
V=Sum(x^2y)
W=N*Q*S+2*P*Q*R-Q^3-P^2*S-N*R^2

Then

a=(N*Q*V+P*R*T+P*Q*U-Q^2*T-P^2*V-N*R*U)/W
b=(N*S*U+P*Q*V+Q*R*T-Q^2*U-P*S*T-N*R*V)/W
c=(Q*S*T+Q*R*U+P*R*V-Q^2*V-P*S*U-R^2*T)/W

You’ll have to figure out what to do with the weights.


Comment from Will Dwinnell
Time: Saturday, March 14, 2009, 9:56 am

I like your article.

I came to the same conclusion you did about the Playboy data from Wired, and got results similar to yours, fitting a kernel regression in MATLAB.


Comment from Jon Peltier
Time: Saturday, March 14, 2009, 12:30 pm

Hi Will -

Thanks for the validation of my conclusions, especially since they involved a different tool. Two smart guys can’t be wrong, eh?


Pingback from dekay.org » links for 2009-03-15
Time: Sunday, March 15, 2009, 6:02 am

[...] LOESS Smoothing in Excel | PTS Blog (tags: loess data smoothing curve fit via:mento.info) [...]


Comment from Nick
Time: Wednesday, April 1, 2009, 4:27 pm

I stumbled upon that old comment of mine at JunkCharts, and saw your comment as well which led me here. First let me say: Wow, you’ve obviously got more VBA skills than I do, and although I haven’t tested it it looks great.

As for the question about using 2nd degree polynomials or higher instead of linear. You could replace the stuff after ‘do sum of squares’ with some trickery involving the LINEST function. (See Clyde38’s post at http://www.eng-tips.com/viewthread.cfm?qid=184726) but you’d lose the weighting.


Comment from Jon Peltier
Time: Wednesday, April 1, 2009, 4:30 pm

Nick -

I have to say that I wouldn’t have gotten too far without your coding of the original function.

As for the higher order fit, it seems most people that I noticed just use first order, and second, I would want somehow to keep the weighting. I thought maybe a little dimensional analysis might help, but I need at least an hour to get my brain around stuff like that.


Comment from Bang Seung Beom
Time: Monday, May 18, 2009, 9:04 pm

Hi.

Thanks for your code.

But I cannot find smoothing parameter (alpha) in your function?

Which variable is it?


Comment from Jon Peltier
Time: Monday, May 18, 2009, 9:08 pm

Alpha isn’t used directly in the calculations. The code uses nPts, the number of points to use for the moving regression. Alpha is the fraction of the whole data set used in the moving regression, or nPts/(number of values in X or Y). In the dialog-based version of this code, you can set either parameter, and the other is computed.


Comment from Andrew Bond
Time: Saturday, May 23, 2009, 3:02 pm

Jon

Thanks for a very interesting blog on Loess. I am in the process of writing a C++ implementation of Loess for use in processing information from ground investigations (I am a civil engineer by profession).

Having failed to port the available C implementations that I have found on the web, I have used your code as inspiration for mine.

I am now at the testing stage and wanted to compare results from my program with those posted in this blog. However, I can’t find the original (full) set of data. Can you provide a link to it here, please?

I would like to have the web statistics and Playmate data – the NIST data I already have. (I have found a subset of the Playmate data, containing just Playmates of the Year – not all Playmates).

Thanks in anticipation.

Andrew


Comment from Jon Peltier
Time: Saturday, May 23, 2009, 5:49 pm

Andrew -

Infoporn: Today’s Playmates Are More Like Anime Figures Than Real Humans in Wired Magazine has links to a PDF visualization and an Excel workbook of BMI data.


Comment from Andrew Bond
Time: Sunday, May 24, 2009, 5:51 am

Jon

Many thanks for the link – I missed it when I first visited the Wired website. What smoothing coefficient did you use for your loess fit? And is the website data as readily available as the Playmate data?

Again, thanks in anticipation.

Andrew


Comment from Jon Peltier
Time: Sunday, May 24, 2009, 6:53 am

Andrew -

I used 150 for the number of points in the analysis. Here is a comparison of three curves, for 100, 150, and 200 points in the moving regression:

Playmate BMI over time

The website data is from my own site’s history several months ago. I’m not sure I still have that particular data set.


Comment from Brian
Time: Friday, June 5, 2009, 4:06 pm

This is something I’ve been looking all over for. Thanks. I added a weighting capability, so that each case can be weighted with a third input range. I know next to nothing about VBA, but I think I did it right.

Public Function LOESSW(X As Variant, Y As Variant, PtWt As Variant, xDomain As Variant, nPts As Long) As Double()
  Dim i As Long
  Dim iMin As Long
  Dim iMax As Long
  Dim iPoint As Long
  Dim iMx As Long
  Dim mx As Variant
  Dim maxDist As Double
  Dim SumWts As Double, SumWtX As Double, SumWtX2 As Double, SumWtY As Double, SumWtXY As Double
  Dim Denom As Double, WLRSlope As Double, WLRIntercept As Double
  Dim xNow As Double
  Dim distance() As Double
  Dim Weight() As Double
  Dim yLoess() As Double
  Dim totalWeight As Double

  If TypeName(X) = "Range" Then
    X = X.Value
  End If

  If TypeName(Y) = "Range" Then
    Y = Y.Value
  End If

  If TypeName(PtWt) = "Range" Then
    PtWt = PtWt.Value
  End If

  If TypeName(xDomain) = "Range" Then
    xDomain = xDomain.Value
  End If

  ReDim yLoess(LBound(xDomain, 1) To UBound(xDomain, 1), 1 To 1)

  For iPoint = LBound(xDomain, 1) To UBound(xDomain, 1)

    iMin = LBound(X, 1)
    iMax = UBound(X, 1)

    xNow = xDomain(iPoint, 1)

    ReDim distance(iMin To iMax)
    ReDim Weight(iMin To iMax)

    For i = iMin To iMax
      ' populate x, y, distance
      distance(i) = Abs(X(i, 1) - xNow)
    Next

    Do
      ' find the nPts points closest to xNow
      If iMax + 1 - iMin  distance(iMax) Then
        ' remove first point
        iMin = iMin + 1
      ElseIf distance(iMin)  maxDist Then maxDist = distance(i)
      totalWeight = totalWeight + PtWt(i, 1)
    Next

    ' calculate weights using scaled distances
    For i = iMin To iMax
      Weight(i) = (PtWt(i, 1) / totalWeight) * (1 - (distance(i) / maxDist) ^ 3) ^ 3
    Next

    ' do the sums of squares
    SumWts = 0
    SumWtX = 0
    SumWtX2 = 0
    SumWtY = 0
    SumWtXY = 0
    For i = iMin To iMax
      SumWts = SumWts + Weight(i)
      SumWtX = SumWtX + X(i, 1) * Weight(i)
      SumWtX2 = SumWtX2 + (X(i, 1) ^ 2) * Weight(i)
      SumWtY = SumWtY + Y(i, 1) * Weight(i)
      SumWtXY = SumWtXY + X(i, 1) * Y(i, 1) * Weight(i)
    Next
    Denom = SumWts * SumWtX2 - SumWtX ^ 2

    ' calculate the regression coefficients, and finally the loess value
    WLRSlope = (SumWts * SumWtXY - SumWtX * SumWtY) / Denom
    WLRIntercept = (SumWtX2 * SumWtY - SumWtX * SumWtXY) / Denom
    yLoess(iPoint, 1) = WLRSlope * xNow + WLRIntercept

  Next

  LOESSW = yLoess

End Function
 


Comment from Jon Peltier
Time: Friday, June 5, 2009, 5:52 pm

Thanks Brian. Maybe one of the smart people who read this blog will check it out for us.


Comment from Tian
Time: Thursday, September 24, 2009, 12:38 pm

Hi Jon,

Thank you for the nice code. It helped me easily added a function to a C++ App. There is a small bug you might not notice. The assignment for xNow should be:

xNow = X(xDomain(iPoint, 1) + 1, 1)

instead of:

xNow = xDomain(iPoint, 1)

Thank you,


Comment from Jon Peltier
Time: Thursday, September 24, 2009, 12:58 pm

Tian -

The line of code works fine in VBA, which assigns the values in a range to a 1-based array. You have to adjust the syntax to use the code in C++.


Comment from Tian
Time: Thursday, September 24, 2009, 1:17 pm

Jon,

Please see the difference of the output:

X input Y input X output original output new output
0.55782 18.63654 0 -13.07212037 20.59304051
2.021727 103.4965 1 47.06914445 107.1603064
2.577325 150.3539 2 105.8974689 139.7673806
3.414029 190.5103 3 159.6924955 174.2630716
4.301408 208.7012 4 196.8626739 207.2333938
4.744839 213.7114 5 219.0115635 216.6616039
5.107378 228.4935 6 227.6417493 220.5444981
6.541166 233.5539 7 230.2534309 229.8606994
6.721618 234.5505 8 227.0376643 229.8347242
7.260058 223.8923 9 221.246659 229.4301269
8.133587 227.6834 10 202.9876659 226.6044626
9.122438 223.9198 11 187.8518139 220.3904231
11.92967 168.02 12 170.6711911 172.3479193
12.37977 164.9575 13 162.210816 163.8416617
13.27286 152.6111 14 160.7756941 161.8489846
14.27675 160.7874 15 159.7155843 160.3350921
15.3731 168.5557 16 161.0838629 160.1920102
15.64766 152.4266 17 194.5740586 161.0555463
18.56054 221.707 18 215.4190384 227.3399984
18.58664 222.6904 19 236.7904956 227.8985782
18.75728 243.1883


Comment from Jon Peltier
Time: Thursday, September 24, 2009, 6:23 pm

Your original output is the result of LOESS calculation using a 7 point moving regression, using X and Y input as inputs and X output as the X output.

Your new output is the result of LOESS calculation using a 7 point moving regression, using X and Y input as inputs and X input as the X output.


Comment from Tian
Time: Friday, September 25, 2009, 9:40 am

Now I understand what you mean. Thank you for clarifying.


Comment from Don
Time: Tuesday, October 6, 2009, 7:01 pm

In MathCAD 14, you create a 2-column matrix of the data and set the first column = X and the second column = Y. I named my matrix NIST. Then enter the following:

NIST(x):=interp(loess(X,Y,.5),X,Y,x)

where x was set as a range variable x:=0,0.1,..20

This creates a function NIST(x) that you can plot versus the range variable x

Here is the MathCAD explanation for the two built-in functions ‘interp’ & ‘loess’:

loess(vx, vy, span) Returns a vector which interp uses to find a set of second-order polynomials that best fit the neighborhood of x and y data values in vx and vy in the least-squares sense. The size of the neighborhood is controlled by span.

interp(vs, vx, vy, x) Returns the interpolated y-value corresponding to x using the output vector vs from loess.


Comment from Tom
Time: Friday, January 29, 2010, 6:16 pm

I guess your page explains LOESS more practical than all the rocket science pages I tried before I came here.

You have a big glitch in your moving average example. MA always get a lag of n/2 points. The reason is that the MA is the 50% trend value of one linear regression of n x,y points. Both smoothing filters are comparable in the X axis if the MA is calculated from the n/2 days in the future. You would use the MA with OFFSET and AVG functions then and not take the diagram’s moving average.
However, some example charts indicate lags in the LOESS .. am I wrong?

By the way, I always was and am a big fan of your Excel experiences.
Big timesavers for me.


Comment from Jon Peltier
Time: Saturday, January 30, 2010, 10:06 am

Tom -

A comparison of moving average to loess would make more sense if the calculated moving average was plotted in the middle of the data for the moving average, but conventionally this is not done. Loess never used this convention.

Where I have noted an apparent lag in loess is in cases where the variation in the raw data is not symmetric, for example, when an increase is followed by a decrease which is much more or less steep. Changing the number of points in the moving regression may reduce the apparent lag. The apparent lag may be pronounced near the ends of the data set, where the regression consists of substantially more points on one side of the X value for the loess calculation.


Comment from tom
Time: Saturday, January 30, 2010, 2:57 pm

>The apparent lag may be pronounced near the ends of the data set
That’s what I know from moving average fitting.
It was designed to fit a time series where mostly the newest fit is the interesting one.

From the VBA code I would say that LOESS fitting is really a choice for sparse data series. Time series make only sense if there are gaps in between. I used a similar technique already on time series in Excel like in your example.

If the distances are always the same between two X points, then the algorithm is just too general and consuming. My experience is that you always find a moving average that is as almost as good fitting as the moving regression window.

By the way: Why did you use Redim in a for-loop? There is another redim going over
all available data. One second Redim before the for would take more memory and save
time.


Comment from Jon Peltier
Time: Saturday, January 30, 2010, 4:37 pm

“I would say that LOESS fitting is really a choice for sparse data series.”
The literature indicates that loess is better for heavily populated data sets.

Do your moving averages weight the data prior to averaging?

The ReDim statements and the iMin and iMax statements can all be taken out of the loop. Perhaps the original programmer initially had them inside the loop before I saw the code, and then I never put them in front of the loop.


Comment from Neha
Time: Thursday, July 29, 2010, 9:46 am

Hi,

I have count data (around 20,000 cases) for different days of the weeks. There are outliers in my series. Since my data is sparse, I cannot remove these outliers completely. I am planning to assign the weights to these outliers in order to smooth the data series.

For this I thought of using either Loess or rloess. Please suggest is it correct to do so?


Comment from Jon Peltier
Time: Thursday, July 29, 2010, 1:38 pm

Neha -

All I can say is “Try it”.

LOESS assigns its own weights to data for each of its regression calculations, I don’t know what kind of weights you plan to assign.

Write a comment

I welcome comments from my readers. If you have an opinion on this post, if you have a question or if there is anything to add, I want to hear from you. Whether you agree or disagree, please join the discussion.

If you want to include an image in your comment, post it on your own site or on one of the many free image sharing sites, and include a link in your comment. I'll download your image and insert the necessary html to display the image inline.

Read the PTS Blog Comment Policy.





Subscribe without commenting

Peltier Tech Waterfall Chart Utility Peltier Tech Cluster-Stack Chart Utility Peltier Tech Box and Whisker Chart Utility Peltier Tech Marimekko Chart Utility Peltier Tech Dot Plot Utility Peltier Tech Cascade Chart Utility

Create Excel dashboards quickly with Plug-N-Play reports.