In 1979 William Cleveland published the LOESS (or LOWESS) technique for smoothing data, and in 1988 he and Susan J. Devlin published a refined version of the technique (references are given at the end of this article). For each X value where a Y value is to be calculated, the LOESS technique performs a regression on points in a moving range around the X value, where the values in the moving range are weighted according to their distance from this X value.
The NIST Engineering Statistics Handbook has a good description of the LOESS technique, including a worked example. A commenter named Nick used the NIST chapter as a starting point for his implementation of a LOESS function for Excel, and he posted in in a comment on JunkCharts. Nick’s approach was to create a UDF in VBA. The UDF accepts as inputs the X and Y data ranges, the number of points to use in the moving regression, and the X value for which to calculate Y. Nick’s UDF used Dictionary objects to hold intermediate values, and it outputs the Y value for the input X value.
I’ve expanded on Nick’s starting point, and produced the function presented later in this article. I’ve discarded the Dictionary objects in favor of VB arrays. It accepts the input X and Y data and the output X values either as ranges or as vertical arrays, and it outputs the calculated LOESS Y values as a vertical array. This means it can be called from within other procedures using arrays, or as a UDF from a worksheet, as an array formula. The original data must be sorted by X in either ascending or descending order (Nick’s Dictionaries do not require sorted input data, and I have an idea to remove the requirement from my function).
My algorithm, Nick’s original algorithm I based mine on, the NIST algorithm Nick based his upon, and all others I checked in the wild use a linear regression of the weighted values, and the weighting factor for input data point i is a sigmoidal curve based on
W(i) = (1 - X(i)^3)^3
where X(i) is the normalized distance (along the X axis) between input data point i and the output X value at which the LOESS smoothed value is being computed. The normalization X is the distance/(maximum distance among points in the moving regression).
To use the function as a UDF, select the multicell output Y range, and enter this formula:
=loess(C2:C22,D2:D22,F2:F21,7)
where C2:C22 and D2:D22 are the input X and Y ranges, F2:F21 is the output X range, and 7 is the number of points in the moving regression (see screenshot below).
Enter this as an array formula by holding Ctrl and Shift while pressing Enter, and the selection fills with the calculated Y values. Note the curly braces around the formula in the formula bar, which indicates the formula is an array formula.
This chart shows the original NIST data points and the smoothed LOESS curve.
In Local regression, Wikipedia has a decent description of LOESS, with some pros and cons of this approach compared to other smoothing methods.
Example Uses of LOESS
This chart compares LOESS smoothing of website statistics with a simple 7-day moving average. The LOESS captures the major trends in the data, but is less severely affected by week to week fluctuations such as those occuring around Thanksgiving and over the year-end and New Year holidays.
Using LOESS to analyze the body mass indexes (BMI) of Playboy playmates gives more insights than linear regression over the whole data set or over portions of the data. See the discussion in Wired Relates Playboy Playmate BMI and Average BMI, 1954-2008 on the FlowingData blog.
The LOESS Function
Public Function LOESS(X As Variant, Y As Variant, xDomain As Variant, nPts As Long) As Double()
Dim i As Long
Dim iMin As Long
Dim iMax As Long
Dim iPoint As Long
Dim iMx As Long
Dim mx As Variant
Dim maxDist As Double
Dim SumWts As Double, SumWtX As Double, SumWtX2 As Double, SumWtY As Double, SumWtXY As Double
Dim Denom As Double, WLRSlope As Double, WLRIntercept As Double
Dim xNow As Double
Dim distance() As Double
Dim weight() As Double
Dim yLoess() As Double
If TypeName(X) = "Range" Then
X = X.Value
End If
If TypeName(Y) = "Range" Then
Y = Y.Value
End If
If TypeName(xDomain) = "Range" Then
xDomain = xDomain.Value
End If
ReDim yLoess(LBound(xDomain, 1) To UBound(xDomain, 1), 1 To 1)
For iPoint = LBound(xDomain, 1) To UBound(xDomain, 1)
iMin = LBound(X, 1)
iMax = UBound(X, 1)
xNow = xDomain(iPoint, 1)
ReDim distance(iMin To iMax)
ReDim weight(iMin To iMax)
For i = iMin To iMax
' populate x, y, distance
distance(i) = Abs(X(i, 1) - xNow)
Next
Do
' find the nPts points closest to xNow
If iMax + 1 - iMin <= nPts Then Exit Do
If distance(iMin) > distance(iMax) Then
' remove first point
iMin = iMin + 1
ElseIf distance(iMin) < distance(iMax) Then
' remove last point
iMax = iMax - 1
Else
' remove both points?
iMin = iMin + 1
iMax = iMax - 1
End If
Loop
' Find max distance
maxDist = -1
For i = iMin To iMax
If distance(i) > maxDist Then maxDist = distance(i)
Next
' calculate weights using scaled distances
For i = iMin To iMax
weight(i) = (1 - (distance(i) / maxDist) ^ 3) ^ 3
Next
' do the sums of squares
SumWts = 0
SumWtX = 0
SumWtX2 = 0
SumWtY = 0
SumWtXY = 0
For i = iMin To iMax
SumWts = SumWts + weight(i)
SumWtX = SumWtX + X(i, 1) * weight(i)
SumWtX2 = SumWtX2 + (X(i, 1) ^ 2) * weight(i)
SumWtY = SumWtY + Y(i, 1) * weight(i)
SumWtXY = SumWtXY + X(i, 1) * Y(i, 1) * weight(i)
Next
Denom = SumWts * SumWtX2 - SumWtX ^ 2
' calculate the regression coefficients, and finally the loess value
WLRSlope = (SumWts * SumWtXY - SumWtX * SumWtY) / Denom
WLRIntercept = (SumWtX2 * SumWtY - SumWtX * SumWtXY) / Denom
yLoess(iPoint, 1) = WLRSlope * xNow + WLRIntercept
Next
LOESS = yLoess
End Function
The LOESS Utility
The flexibility of this LOESS function will make it easy to encapsulate into an add-in that uses a dialog to facilitate user selection of data and parameters. A working version uses the following dialog:
Update 24 June 2009
A LOESS utility for Excel has finally been developed. It is described in LOESS Utility for Excel.
Update 8 October 2009
The LOESS utility for Excel has been updated, and the interface made more flexible. It is described in LOESS Utility – Awesome Update.
LOESS Smoothing in Peltier Tech Charts for Excel
I liked this utility and used it so much, that I have included it in my commercial Excel Charting software, Peltier Tech Charts for Excel.
I’ve made numerous improvements to this utility in the seven years since I released the free demo in this article. The ones that come to mind off the top of my head:
- Data Improvement:
- Data is internally sorted by X prior to running the analysis.
- Missing data (X or Y) is ignored during the analysis.
- Algorithms are less sensitive to factors that previously caused errors (too much input at repeated X values, too few points in the analysis).
- If X output prediction range extends beyond X input data range, you can decide not to extrapolate below minimum or above maximum X values.
- Algorithm Improvement:
- You can use a moving quadratic (2nd order) regression instead of the normal 1st order regression. This is slower than the first order computation, but not as much slower as I feared.
Articles About Regression and Trendlines on this Blog
- Trendlines and Chart Types in Excel
- Add One Trendline for Multiple Series
- Trendline Calculator for Multiple Series
- Trendline Fitting Errors
- Choosing a Trendline Type
- Plot Two Time Series and Trendlines with Different Dates
- Polynomial Fit vs. Statistical Process Control
- Moving Averages
- Stacked Column Chart with Stacked Trendlines
- Deming Regression
- Deming Regression Utility
- LOESS Utility for Excel
References
Cleveland, W.S. (1979), “Robust Locally Weighted Regression and Smoothing Scatterplots,” Journal of the American Statistical Association, Vol. 74, pp. 829-836.
Cleveland, W.S. and Devlin, S.J. (1988), “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting,” Journal of the American Statistical Association, Vol. 83, pp. 596-610.
NIST Engineering Statistics Handbook, 4.1.4.4. LOESS (aka LOWESS)
NIST Engineering Statistics Handbook, Example of LOESS Computations
Wikipedia, Local regression
Colin Banfield says
Jon, nice stuff. I see that you’re added a link to this page on Wikipedia :D I haven’t done a lot of digging on the subject, but I was wondering how one would determine the optimum values for alpha and N. In moving average forecasing, for example, one would calculate the interval (simple moving average) or the damping factor (exponential smoothing) based on minimizing the mean absolute percentage error (which Excel’s ATP assumes you’re already calculated in your head!).
Jon Peltier says
Colin –
I don’t know how you’d optimize your alpha. A couple sources I looked at said “around 0.33” or “below 0.5”, but I think you’re left to decide based on how it looks. If N gets too large, the curve doesn’t even follow most of the points, but if it isn’t large enough, the curve is too wiggly.
I used N=7 for the NIST data, because that’s what the example used, though I looked at a lot of other values. I used probably alpha=0.33 for the playmate data, and I looked at a few other values, but it didn’t make much difference. The web stats showed some difference with changing alpha, but mostly in the width and depth of the year-end dip.
I think the idea is to get a good overview which isn’t too badly affected by an outlying point.
If anyone has any guidelines other than “it looks nice”, please share with us.
TV says
Is changing this to LOESS function to a 2nd order difficult? It seems like a first order fit leads to discontinuities in the derivative…
Jon Peltier says
TV –
You’d have to overhaul everything after this comment:
I did a quick Google search, and found one useful bit of information in a pile of results pages:
http://www.eng-tips.com/viewthread.cfm?qid=55381
For y=ax^2+bx+c
P=Sum(x)
Q=Sum(x^2)
R=Sum(x^3)
S=Sum(x^4)
T=Sum(y)
U=Sum(xy)
V=Sum(x^2y)
W=N*Q*S+2*P*Q*R-Q^3-P^2*S-N*R^2
Then
a=(N*Q*V+P*R*T+P*Q*U-Q^2*T-P^2*V-N*R*U)/W
b=(N*S*U+P*Q*V+Q*R*T-Q^2*U-P*S*T-N*R*V)/W
c=(Q*S*T+Q*R*U+P*R*V-Q^2*V-P*S*U-R^2*T)/W
You’ll have to figure out what to do with the weights.
Will Dwinnell says
I like your article.
I came to the same conclusion you did about the Playboy data from Wired, and got results similar to yours, fitting a kernel regression in MATLAB.
Jon Peltier says
Hi Will –
Thanks for the validation of my conclusions, especially since they involved a different tool. Two smart guys can’t be wrong, eh?
Nick says
I stumbled upon that old comment of mine at JunkCharts, and saw your comment as well which led me here. First let me say: Wow, you’ve obviously got more VBA skills than I do, and although I haven’t tested it it looks great.
As for the question about using 2nd degree polynomials or higher instead of linear. You could replace the stuff after ‘do sum of squares’ with some trickery involving the LINEST function. (See Clyde38’s post at http://www.eng-tips.com/viewthread.cfm?qid=184726 ) but you’d lose the weighting.
Jon Peltier says
Nick –
I have to say that I wouldn’t have gotten too far without your coding of the original function.
As for the higher order fit, it seems most people that I noticed just use first order, and second, I would want somehow to keep the weighting. I thought maybe a little dimensional analysis might help, but I need at least an hour to get my brain around stuff like that.
Bang Seung Beom says
Hi.
Thanks for your code.
But I cannot find smoothing parameter (alpha) in your function?
Which variable is it?
Jon Peltier says
Alpha isn’t used directly in the calculations. The code uses nPts, the number of points to use for the moving regression. Alpha is the fraction of the whole data set used in the moving regression, or nPts/(number of values in X or Y). In the dialog-based version of this code, you can set either parameter, and the other is computed.
Andrew Bond says
Jon
Thanks for a very interesting blog on Loess. I am in the process of writing a C++ implementation of Loess for use in processing information from ground investigations (I am a civil engineer by profession).
Having failed to port the available C implementations that I have found on the web, I have used your code as inspiration for mine.
I am now at the testing stage and wanted to compare results from my program with those posted in this blog. However, I can’t find the original (full) set of data. Can you provide a link to it here, please?
I would like to have the web statistics and Playmate data – the NIST data I already have. (I have found a subset of the Playmate data, containing just Playmates of the Year – not all Playmates).
Thanks in anticipation.
Andrew
Jon Peltier says
Andrew –
Infoporn: Today’s Playmates Are More Like Anime Figures Than Real Humans in Wired Magazine has links to a PDF visualization and an Excel workbook of BMI data.
Andrew Bond says
Jon
Many thanks for the link – I missed it when I first visited the Wired website. What smoothing coefficient did you use for your loess fit? And is the website data as readily available as the Playmate data?
Again, thanks in anticipation.
Andrew
Jon Peltier says
Andrew –
I used 150 for the number of points in the analysis. Here is a comparison of three curves, for 100, 150, and 200 points in the moving regression:
The website data is from my own site’s history several months ago. I’m not sure I still have that particular data set.
Brian says
This is something I’ve been looking all over for. Thanks. I added a weighting capability, so that each case can be weighted with a third input range. I know next to nothing about VBA, but I think I did it right.
Jon Peltier says
Thanks Brian. Maybe one of the smart people who read this blog will check it out for us.
Tian says
Hi Jon,
Thank you for the nice code. It helped me easily added a function to a C++ App. There is a small bug you might not notice. The assignment for xNow should be:
xNow = X(xDomain(iPoint, 1) + 1, 1)
instead of:
xNow = xDomain(iPoint, 1)
Thank you,
Jon Peltier says
Tian –
The line of code works fine in VBA, which assigns the values in a range to a 1-based array. You have to adjust the syntax to use the code in C++.
Tian says
Jon,
Please see the difference of the output:
X input Y input X output original output new output
0.55782 18.63654 0 -13.07212037 20.59304051
2.021727 103.4965 1 47.06914445 107.1603064
2.577325 150.3539 2 105.8974689 139.7673806
3.414029 190.5103 3 159.6924955 174.2630716
4.301408 208.7012 4 196.8626739 207.2333938
4.744839 213.7114 5 219.0115635 216.6616039
5.107378 228.4935 6 227.6417493 220.5444981
6.541166 233.5539 7 230.2534309 229.8606994
6.721618 234.5505 8 227.0376643 229.8347242
7.260058 223.8923 9 221.246659 229.4301269
8.133587 227.6834 10 202.9876659 226.6044626
9.122438 223.9198 11 187.8518139 220.3904231
11.92967 168.02 12 170.6711911 172.3479193
12.37977 164.9575 13 162.210816 163.8416617
13.27286 152.6111 14 160.7756941 161.8489846
14.27675 160.7874 15 159.7155843 160.3350921
15.3731 168.5557 16 161.0838629 160.1920102
15.64766 152.4266 17 194.5740586 161.0555463
18.56054 221.707 18 215.4190384 227.3399984
18.58664 222.6904 19 236.7904956 227.8985782
18.75728 243.1883
Jon Peltier says
Your original output is the result of LOESS calculation using a 7 point moving regression, using X and Y input as inputs and X output as the X output.
Your new output is the result of LOESS calculation using a 7 point moving regression, using X and Y input as inputs and X input as the X output.
Tian says
Now I understand what you mean. Thank you for clarifying.
Don says
In MathCAD 14, you create a 2-column matrix of the data and set the first column = X and the second column = Y. I named my matrix NIST. Then enter the following:
NIST(x):=interp(loess(X,Y,.5),X,Y,x)
where x was set as a range variable x:=0,0.1,..20
This creates a function NIST(x) that you can plot versus the range variable x
Here is the MathCAD explanation for the two built-in functions ‘interp’ & ‘loess’:
loess(vx, vy, span) Returns a vector which interp uses to find a set of second-order polynomials that best fit the neighborhood of x and y data values in vx and vy in the least-squares sense. The size of the neighborhood is controlled by span.
interp(vs, vx, vy, x) Returns the interpolated y-value corresponding to x using the output vector vs from loess.
Tom says
I guess your page explains LOESS more practical than all the rocket science pages I tried before I came here.
You have a big glitch in your moving average example. MA always get a lag of n/2 points. The reason is that the MA is the 50% trend value of one linear regression of n x,y points. Both smoothing filters are comparable in the X axis if the MA is calculated from the n/2 days in the future. You would use the MA with OFFSET and AVG functions then and not take the diagram’s moving average.
However, some example charts indicate lags in the LOESS .. am I wrong?
By the way, I always was and am a big fan of your Excel experiences.
Big timesavers for me.
Jon Peltier says
Tom –
A comparison of moving average to loess would make more sense if the calculated moving average was plotted in the middle of the data for the moving average, but conventionally this is not done. Loess never used this convention.
Where I have noted an apparent lag in loess is in cases where the variation in the raw data is not symmetric, for example, when an increase is followed by a decrease which is much more or less steep. Changing the number of points in the moving regression may reduce the apparent lag. The apparent lag may be pronounced near the ends of the data set, where the regression consists of substantially more points on one side of the X value for the loess calculation.
tom says
>The apparent lag may be pronounced near the ends of the data set
That’s what I know from moving average fitting.
It was designed to fit a time series where mostly the newest fit is the interesting one.
From the VBA code I would say that LOESS fitting is really a choice for sparse data series. Time series make only sense if there are gaps in between. I used a similar technique already on time series in Excel like in your example.
If the distances are always the same between two X points, then the algorithm is just too general and consuming. My experience is that you always find a moving average that is as almost as good fitting as the moving regression window.
By the way: Why did you use Redim in a for-loop? There is another redim going over
all available data. One second Redim before the for would take more memory and save
time.
Jon Peltier says
“I would say that LOESS fitting is really a choice for sparse data series.”
The literature indicates that loess is better for heavily populated data sets.
Do your moving averages weight the data prior to averaging?
The ReDim statements and the iMin and iMax statements can all be taken out of the loop. Perhaps the original programmer initially had them inside the loop before I saw the code, and then I never put them in front of the loop.
Neha says
Hi,
I have count data (around 20,000 cases) for different days of the weeks. There are outliers in my series. Since my data is sparse, I cannot remove these outliers completely. I am planning to assign the weights to these outliers in order to smooth the data series.
For this I thought of using either Loess or rloess. Please suggest is it correct to do so?
Jon Peltier says
Neha –
All I can say is “Try it”.
LOESS assigns its own weights to data for each of its regression calculations, I don’t know what kind of weights you plan to assign.
Scott says
Does anyone have the C++ implementation of this? I have the Add-In working great after searching a while. Not having success with the C++ I have found on the net and I know someone out there has ported it over. Many thanks in advance!
Bancam says
Hi Jon,
I like this vanilla implementation – as a simple UDF. Could make both nPts and nAlpha optional with code to decide on what to use.
I am curious: what did mx and iMx represent in the original code?
Only thing I see is, it needs the Xvalues to be in ascending order. Some form of sorting would be handy when x,y data is not ordered.
Jon Peltier says
iMx and mx are so secret, that I don’t even know what they do (or did).
I have some enhancements to the code in the works. Sorting wasn’t one of them, but it should be, so I’ll add it to the list. I’m also working on the way the utility handles N and alpha (which my OCD doesn’t like) and also making it accessible to other VBA procedures.
No schedule for the updates, you’ll just have to wait until I announce it in the blog.
Bancam says
Jon,
I did a couple of test where the x and y values were random numbers, and found that if there were large gaps in the x values, the function (with a sort) filled this OK.
I was concerned that sometimes, the ‘nPts’ closest data points to the xDomain value may be a long way off and to one side.. It would be better to determine the points surrounding the iPoint of interest within a specified distance along the x axis, not in terms of rank.. then having to either leave a gap if there were insufficient or expand the range to reach a minimum number of points.
I also discovered that often there are clusters of points. I know from experience that real data does this dues to vagaries of process and sampling times: in such clusters, the slope can be totally unrepresentative of the wider range of x values and when the iPoint is outside the cluster, you get extreme extrapolations.
I also read that in a particular commercial application, they determine the loess for each x value in the data and spline the curve together. That would avoid unrealistic interpolations.
Cheers
Iain says
Hi Jon,
I’m yet another person who is indebted to you for your tireless work with Excel. Thank you!
I am interested in the weighted version of LOESS that Brian posted in the comments, as it would be very useful to assign my own weights to each data point. If you could incorporate that into your major overhaul that would be brilliant =)
One other thing I would be interested in is being able to look at the confidence limits for the fit. In the R implementation, you can plot any confidence limit you choose (e.g. 5-95%), as the LOESS algorithm calculates and the standard errors. See about 2/3 of the way down this post for example: http://tinyurl.com/3b6a3lo
That would be another feature that would be very useful to have in your Excel version.
(sorry for being full of suggestions!)
Mauricio Prado says
Hi Jon
This was a great find !
I need help though !
I implemented your code and I am trying to use if for a certain problem I have. I have a set of data Y(i), X(i) and I am trying to find a smothed trend of Y as a funciton of X. I played a little bit with your routine and I am getting a few problems. Basically what is happening is that for some values of nPts or sometimes for some points i, the Denominator (Denom) variable is becoming ZERO ! Do you happen to have any “easy” references on the math behind LOESS and how the Intercep and Slope are calculated. I would like to see what cases can yield a Denom of Zero and what can be done to go around this problem. If you want to take a look at the example I can send you the spreasheet.
thanks
mauricio
harry says
if I have two predictors (x1, x2) and one output y, how can I use the LOESS?
This what I run into:
I ran regular multi regression, got a fitting function
y_hat = a0 + a1*x1 + a2*x2
But the R square is only 67%, can I use LOESS to improve the fit of the model?
Jon Peltier says
Harry –
I have not attempted anything more intricate than a first order, one factor weighted regression. You could probably Google for something; someone several months ago found a weighted second order algorithm.
I don’t think the point is really to reduce your R², since there’s already so much variability in the data. The point is to find a smoothed curve that passed reasonably well through the data.
Bill McNair says
i can’t believe i missed this blog article until now. well done. thank you.
Chad S. says
Does a version of the above LOESS function (VBA code) exist for HORIZONTAL array inputs?
I have a model with X, Y, and xDomain in rows rather than columns, and use intermediary TRANSPOSE functions as a workaround. However, it would be nice to have both a Horizontal and Vertical version of the LOESS function (similar to how VLOOKUP has the HLOOKUP counterpart). I tried modifying the code to work for Horizontal inputs, but apparently I don’t understand the code well enough to make it work.
Any advice?
Thanks, by the way, for this excellent info.
Jon Peltier says
Well, it’s generally best practice to have data series in columns, since this is in accord with a database output’s layout of fields in columns and records in rows. So my advice is to transpose your whole workbook. Of course, I don’t know what you’re doing and why it’s arranged that way, I’m just generalizing.
You might adjust the input ranges like this:
I have no plans to rewrite my add-in’s code to handle horizontal ranges.
Chad S. says
Jon –
Many thanks for your quick reply!
By the way, it’s a legacy financial model, which uses a SAS-summarized dataset of aggregated monthly transaction records for a segment of accounts. Rows are metrics like deposits, withdrawals, account attrition, etc. ‘Month’ is in columns. So the dataset, model, and graphs show how the metrics have moved/will move over time.
Jon Peltier says
Oh, “legacy”… That can get tough.
I’m about to drop support for Microsoft’s legacy product, Excel 2003, even though it’s my favorite. It’s just too much effort to support two of everything, and users of 2007/2010 now far outnumber users of 2003.
Charles Vanya says
How can download Loess smoothing tool for excel. I have been trying to download it but it is just displaying encrypted information. How can I get it?
Regards,
Jon Peltier says
As the 8 Oct 2009 update to the article indicates, you should browse to
https://peltiertech.com/loess-utility-awesome-update/
then download the add-in file at the link near the end of the article. Save the file, don’t try to run it from the link. Then use the instructions at the link near the add-in link to install the add-in in Excel.
Pierre-Jean says
Hi
I am not very familiar with this technique so I don’t really know if my question even make sense.
But anyway, is there any way to implement this code to obtain the error on the Y output?
Thanks
Mark S. Blumberg MD says
I have in hand “Graphical Methods for Data Analysis” by Chambers, Cleveland, Kleiner, and Tukey, 1983. It describes Lowess and references several articles using the same acronym. There is no such thing as loess (graphs or software).
I recall the first article that I saw that used the term loess. I believe the author was a prominent Stanford statistician. I was surprised and sought to find out how the error occurred. I concluded that Spell check was the culprit. It did not recognize the acronym Lowess and substituted the real word loess. It is a geology term for loose earth derived from the Swiss German language.
I once edited the Wikipedia article but the author rejected all my efforts.
Jon Peltier says
Mark –
That’s an old book. Cleveland’s first publication about the technique was published in 1979; he used the term LOWESS (for LOcally Weighted regrESSion) in a 1981 publication. Your book is dated 1983. Cleveland and Devlin published another paper in 1988 describing the same process, and they used neither LOWESS nor LOESS in the paper. LOESS seems to have evolved to mean LOcal regrESSion, a broader term that still describes the same approach.
If you Google for “loess regression” or “loess statistics”, you’ll find sufficient search results to assure yourself that in the almost 30 years since your book was published, the term LOESS has become very common for describing this technique for local regression.
ryan says
I am having trouble implementing the VBA script when the input X is in integer.
I have input X and input Y data like this
X Y
1 0.1
2 0.5
3 0.64
4 1.3
I want to output a third, smoothed, column but cannot get the VBA code to work unless i change the X data so something like 1.1, 1.2, 1.3 ext….
Any suggestions would be greatly appreciated. I am just learning VBA now in order to use this function :)
Thanks
Ryan
Jon Peltier says
Ryan –
I get results that look correct using whole numbers as X values.
What is the “trouble”? You don’t mention VBA error messages or errors in the cells.
Are you using the LOESS function as a UDF in the cells, using ranges as parameters? Or are you passing arrays in and out directly in VBA?
How many points are in your input data?
Ryan says
Wow thanks for the quick response. Let me do my best to answer your concerns. I am an extreme neophyte with VBA and with integrating VBA scripts in Excel.
I inserted the VBA script from your page into a “module” the VB editor in EXCEL. Google taught me how to do this earlier today :)
I then called it from excel as “LOESS(A1:A10,B1:B10,C1:C10,3)”… A1-A10 are just 1-10.. and B1-B10 are randomly generated.. C1-C10 are just the same as A1-A10… I think this means I am using it as a UDF right?
The error i get from excel is #VALUE and if I hover over the cell it says that one of the input parameters is of an unexpected type.
But.. if i replace A1-A10 with 1.1-10.1 everything works fine.
Erwin Walter says
Hi Jon
Your post is incredibly helpful and works perfectly for me with your ‘Awesome update’.
But I do have a problem using your posted vba code directly, a similar if not the same problem as a previous reply.
I receive a ‘#Value’ error in all of the Y output cells. The vba error is ‘A value used in the formula is of the wrong data type’.
Using the helper ‘Show calculation steps’, the dialog opens showing the function with the message ‘The next evaluation will result in an error’.
I am using excel 2010, I copied your vba code and pasted it into a module, selected the correct cells and ctrl-shift-return using:
‘=LOESS(C2:C22,D2:D22,F2:F21,7)’
I even copied your data from the 22 point example to evaluate the result.
The function is called with the correct data, the calculations are correct having stepped through the code to view the results, these are collected in the yloess variable array and presumably returned by the function, the #value error is posted in every cell. I have tried to make changes to the function call and cell formatting and and and, but not really being a vba user I am not having much luck. I also tried changing the x Input method from ‘C2:C22’ to data values as your other reply suggested but no such luck
I have the feeling this is down to some setting in excel, although this is excel directly out of the box.
Have you any idea that might solve this problem?
All the best
erwin
Jon Peltier says
Erwin –
There is no setting in Excel for “Correctly Calculate Array Formulas” nor for “Correctly Display Results of User-Defined Functions”. I doubt it’s even an Excel setting.
You say the yloess array is properly populated with the correct calculations? I can’t imagine why Excel would give you that error if the data to display is correct.
I just tested with the NIST data from the example (http://www.itl.nist.gov/div898/handbook/pmd/section1/dep/dep144.htm), and I obtained the appropriate results.
Erwin Walter says
Hi Jon,
Thanks for your prompt reply.
Don’t know how to explain this one! I copied and pasted the data and module code into a new excel workbook, even copied and pasted the formula entry format and it worked without a problem. I would be interested in your evaluation of the problem.
The reason being I wanted to create a single data entry excel workbook for you to view. If you are interested in looking at both workbooks they are available at:
https://www.dropbox.com/s/vopoz7k558syf5d/LOESS.zip
https://www.dropbox.com/s/9bc3os4cu11nxiu/working%20version.zip
Contains many data sheets some using your ‘Awesome update’ and some the basic code, much of the data is stuff I am interested in, although your original data (NIST data) is also available:
It appears from my perspective that the reason for failure and #value cells was because the original failed workbook and code was saved as xlsm only and the working workbook was save as both xlsm for the code and xlsx for the data.
For the working example I have taken the liberty of using xDomain values as the real X values in your example data.
Would be grateful for your comments.
All the best
erwin
Erwin Walter says
Hi Jon,
Me again, just for an explanation of my reasons for following this over the holiday period – problem to keep me occupied over the holiday.
I think I understand your interpretation and implementation of the loess function and I now need to create a version in C# for testing and C for implementation on an embedded platform – not sure yet if the computational overhead is feasible but will try it out. Was initially confused by the 0- n based x Output (xDomain) values in your sample data- but sorted.
So have now have completed the c# ( yet to be tested) testing function and will move onto the C implementation for embedded.
Thanks for your help and so wish you a (not sure if your Happy Holiday or Merry Christmas) but me I’m, Merry Christmas, UK 18:31December 27th – just cracked the Tequila! And for your explanation and for my understanding the loess function, Merry Christmas and a Happy New Year.
erwin
Jon Peltier says
Do yourself a favor. In the VB Editor, go to Tools menu > Options. Check the box for Require Variable Declaration.
This inserts “Option Explicit” at the top of every new module. At the top of any modules that were created before you changed this setting, insert the line “Option Explicit”. This means VB will expect each variable you use to be explicitly declared (otherwise any unrecognized variable is initialized as a variant).
You’ve renamed the function ALOESS, but at the end you’re assigning a value to LOESS, the old name of the function. Without Option Explicit, VB assigns the array values to a new variable named LOESS. It hasn’t assigned any value to ALOESS that it can return to the worksheet function, so you see the #VALUE! error.
When you try to run your code under Option Explicit, The LOESS = yLoess will be highlighted as a compile error. VB doesn’t know what LOESS is supposed to be, but refuses to guess. This error should remind you to fix the statement at the end.
Change this statement to ALOESS = yLoess, and it will work fine.
Erwin Walter says
Hi Jon,
Thanks for that information, I had forgotten about the option explicit rule – this was my school boy error, it’s been a while since I have used vb in any form.
All the best
erwin
Naomi says
Hi!
Thanks so much for this, it was a great help!
I have a question regarding what happens at the end of a data set (or X var. within an data set) when it is smoothed. If the alpha size doesn’t fully cover the last chunk of data (or, in other words, if the last window only actually includes a small amount of data), is this corrected for?
I have a couple of Loess curves in which the end of the curve shoots up dramatically, which looks pretty suspicious. I’m assuming that this is because the last window only represents a few high values in the data, and therefore, results in a unrepresentative curve. Is this right? If so, is there any way of correcting for this other than getting rid of some data at the ends of my variables?
I hope my question was clear…
Thank you!
Jon Peltier says
Naomi –
Actually, the number of points in each moving analysis is the same. Near the ends of the data set, the calculations are skewed, because there are more data points toward the center and fewer toward the ends of the range. This means that at the ends, the fit becomes more and more like a linear fit. Some people use not a weighted linear regression in their moving analysis, but a weighted quadratic regression, in order to mimic the curvature that might have been calculated if the data were extended.
Saswata Kumar Sahoo says
Hi Jon,
First I must thank you for the excellent add-in that you coded for Excel and that you have made it public. I have been struggling for over a year on smoothing my experimental data. I came across Loess technique a few weeks ago and the add-in that you provided
I have tallied the answers that it gives for my time series data with the ones given by Sigma-Plot 10 . The error^2 {= (Sigma-Plot – Peltier Tech)^2} is in the order of 10^-10.
With my almost no knowledge in VBA, I have modified the code of the 1st degree polynomial Loess with 2nd degree polynomial Loess and found the avg. Error^2 with respect to Sigma-Plot in the range of 0.001 to 0.2 with the raw data around 350 units and above. Below is the modified code for 2nd degree polynomial Loess or Quadratic-Loess (QLOESS):
——————————————————————————————————————————————————
———————————————————————————————————————————————————
Regards
Roman says
Hi Jon,
I’ve been fumbling with this code for an hour and it threw the #VALUE! error all the time freaking me out completely. Just by hit and miss I found that the problem was that in the first line of the code my VBA didn’t like the ‘As Double()’ part at the end.
“Public Function LOESS(X As Variant, Y As Variant, xDomain As Variant, nPts As Long) As Double()
Dim i As Long
Dim iMin As Long
Dim iMax As Long….”
I felt that the the problem must be with the type mismatch or something like that. I deleted it the ‘As Double()’ part. Now it works. Thanks to everybody! Btw, I’m using Excel for Mac 2011. It’s a bit different than the straight Excel for Windows in some small but nasty ways.
Colin Banfield says
Hi Jon,
I know that this is an old thread, but recently I’ve had the occasion to revisit the loess regression in some detail.
Question: Why does your Y outputs not match the outputs in the NIST example, even though your’re using the same data?
Jon Peltier says
Colin –
In my example, I carried out LOESS calculations every 1.0 along the X axis, so it doesn’t match the NIST example precisely.
If I perform the calculations at the X values of the input data, my outputs (shown above) exactly match the NIST outputs tabulated here:
http://www.itl.nist.gov/div898/handbook/pmd/section1/dep/dep144.htm
If I calculate LOESS values every 0.1 along the X axis, my curve very closely matches the NIST chart in the same link.
Jack Lewis says
I am unable to compile the QLOESS code posted by Saswata Kumar Sahoo. It looks like some critical characters have been dropped. Is there a way to have this emailed to me? I have found that in R the quadratic loess is better at fitting relative minima and maxima and areas of high curvature.
Jon Peltier says
Jack –
I actually built a QLOESS function a year ago, planning to implement it as part of my commercial add-in. However, I found the quadratic algorithm overcorrected the smoothing, especially at the ends of the fit. Because of this, I decided not to incorporate the quadratic function into my software.
However, comparing my QLOESS function with that in the comment by Saswata Kumar Sahoo, I noticed that a crucial do loop was incorrectly coded. I’ve corrected the routine in that comment and highlighted the changed section. Note that I have not tested whether this updated code will compile, I merely corrected the relevant syntax and took steps to prevent WordPress from removing greater than and less than signs.
[Note: I have more recently revisited the quadratic regression. It turns out I had switched a couple signs in my version of the code, which led to calculation errors. I have since corrected these errors, liked the result, and added quadratic smoothing to my commercial product, Peltier Tech Charts for Excel 3.0.]
Himanshu T says
Your article now has a reference in “Drilling Productivity Report” by EIA and how this technique is used to calculate numbers of wells and their productivity.
Jon Peltier says
Himanshu –
Interesting. Do you have a link?
HT says
Hello Jon,
Here you go for the EIA link aforementioned:
http://www.eia.gov/petroleum/drilling/pdf/dpr_methodology.pdf
Nalini says
Hi! I just installed the loess utility function in Excel 2007 and have a very basic doubt. The input XY columns to be selected are clear; however for the X output, is that something that is also generated by the formula or do we have to give something? If I leave that blank, I get an error on the Y output after the first entry. In the example above, the X output column is given as 0,1,2,3,4,5…
I have been using LOESS on R and aware of the span value, but I am not familiar with the Excel tool. It seems quite user friendly, still stuck at the first step! Thanks a lot!
Jon Peltier says
Nalini –
You have to enter the X output values, so the program knows where to calculate Y output values.
Nalini says
Thanks Jon. So can I put arbitrary values for X column as 0,1,2,3,4,5..etc..or is there any significance behind that?
Jon Peltier says
Nalini –
The X values you use are “arbitrary” in the sense that they don’t need to be the same X values used in the inputs, but they are not “arbitrary” in the sense that you enter X values at which you want to calculate Y values.
Matthew says
Still helpful in 2016. Thanks for the code!
John Baker says
Hi,
I can’t see how to extrapolate from the data. I am using your function call, =loess(C2:C22,D2:D22,F2:F21,7), and tried extending the X input/output and the Y output but it doesn’t seem to extrapolate. And I don’t see the Dialog that you show in https://peltiertech.com/loess-utility-awesome-update/ but it did show up momentarily. Now I cannot find it and I think it showed a way to do extrapolation. Help please!
Thanks, John
John Baker says
Jon, I forgot to say I’m using your latest 3.0 version on Excel 2007 on a Windoz
7 PC.
John
Jon Peltier says
John –
The formula must be array-entered: hold down Ctrl+Shift while you press Enter.
The dialog should pop up when you click the Loess button on the Peltier Tech ribbon tab.
I’ve just tested in 2007. If you’re having issues, email me directly and I’ll try to see what’s wrong.
Paul W. says
Jon,
If you smooth a set of data on a worksheet and later change the name of the worksheet the Loess add-in hence after does not accept changes to most of the input fields. It seems to get stuck in a loop with references to the old worksheet ranges and crashes when you attempt a change. Is there a way to clear the old references. (excel 2007)
Thanks,
Paul
Jon Peltier says
Paul –
The LOESS utility available from these web pages is a very simple tool, without much defensive programming, and I am not actively supporting it any longer. It doesn’t handle unsorted data or nonnumeric/blank cells well, and certain data characteristics cause crazy results.
I have continued developing the LOESS function, and have built a bulletproof version into the commercial Peltier Tech Charts for Excel add-in.
The information from the dialog is saved as hidden range names in the worksheet. You can’t see these hidden names in the Excel UI, but you can if you use the Name Manager, a very helpful and free add-in made by my colleagues Jan Karel Pieterse and Charles Williams (http://www.jkp-ads.com/officemarketplacenm-en.asp). Delete any names that begin with “ptsLOESS” or similar phrase.
Dan says
Can the loess utility automatically generate the optimal parameters and smoothing constants rather than requiring us to manually input them.
Jon Peltier says
Dan –
I don’t know if there is really an optimum forthe smoothing parameters; the literature usually just says use alpha around 1/3 or between 1/4 and 1/2. I usually select something relevant to my data, using N of 30 or 60 if I have daily data, which usually is much less than alpha = 1/4. It’s a tool to help you find trends or patterns, and shouldn’t be used for prediction.
Jeff says
Hello: I see that someone asked back in 2012 if the LOESS function has been adapted to allow for multiple predictors (i.e., X1 and X2). At that time, the response seemed uncertain; is anyone aware of some adaptations for this now?
Jon Peltier says
Jeff –
I have not looked into LOESS for multiple independent variables. I just Googled, and the only promising search result was to a paper at Michigan State, and the link was a 404; I think the professor left the university. However, it was archived on the Wayback Machine. Turns out it was a PDF of a PowerPoint presentation of some lecture notes, filled with mathematical hieroglyphics that I can’t make sense of.
But you’re welcome to try:
https://web.archive.org/web/20160803225010/http://polisci.msu.edu/jacoby/icpsr/regress3/lectures/week4/16.MultiLoess.pdf
Rob de Vos says
Dear mr. Peltier, I am a frequent user of your LOESS smoothing utility (a few years old, the free version). Now I have a dispute with a climatologist who uses the R statistical program. The results of the Peltier LOESS smoothing do not seem completely identical to those of R. Is that possible? The climatologist is convinced that he uses the right calculation. Since I am not skilled in programming, I can’t talk back to him about the matter. Can you inform me? Thanks!
Jon Peltier says
The algorithm I have used in my LOESS calculations follows the NIST Engineering Statistics Handbook.
The simplest reason that your LOESS output differs from someone else’s is that you are using different smoothing parameters. Conventionally a value of 0.33 is used; higher values smooth the output more, while lower values result in bumpier output.
There was less code in my older program to take into account missing and unsorted values, which might lead to some calculation differences.
Without seeing the data and the LOESS calculations, I can’t really comment further. And I have no knowledge of how R works.