In LOESS Smoothing in Excel I described a technique for smoothing data, which essentially runs a moving weighted regression on the data set. The amount of smoothing that can be achieved without washing out the data is remarkable. In that post I showed a screen shot of a dialog of a working LOESS utility.
The LOESS utility for Excel has been updated, and the interface made more flexible. The new version is described in LOESS Utility – Awesome Update.
I’ve used this utility in-house for a while, adding little enhancements here and there. I’ve come up with three main ways to use it, illustrated in the dialog screen shots below. Based on the option selected in the top of the dialog, the mode of operation is changed.
Here the data is arranged simply, with the input X and Y in two adjacent columns, and the output Y, calculated using the input X values, placed into the third adjacent column.
Here the input X and Y are in two columns, and the output Y is calculated using the input X values, but is located in another column. This is handy if you are calculating smoothed values for different values of alpha or N and placing the calculations in different columns.
Finally, the input X and Y values are in two adjacent columns, and the output X and Y are in two other adjacent columns, not adjacent to the input columns. Here the output X values need not be the same as the input X values.
When installed, the utility places a control on the PTS Charts menu. If you haven’t installed other PTS Chart Utilities, this menu will be created first.
Compare my weight over the past three years, smoothed using a seven-day moving average…
… and using the LOESS utility. The main trends are plainly visible, while the short term fluctuations have been removed.
Update 8 October 2009
The LOESS utility for Excel has been updated, and the interface made more flexible. It is described in LOESS Utility – Awesome Update.
LOESS Smoothing in Peltier Tech Charts for Excel
I liked this utility and used it so much, that I have continued to enhance it, and I’ve included it in my commercial Excel Charting software, Peltier Tech Charts for Excel.
I’ve made numerous improvements to this utility in the seven years since I released the free demo in this article. The ones that come to mind off the top of my head:
- Data Improvement:
- Data is internally sorted by X prior to running the analysis.
- Missing data (X or Y) is ignored during the analysis.
- Algorithms are less sensitive to factors that previously caused errors (too much input at repeated X values, too few points in the analysis).
- If X output prediction range extends beyond X input data range, you can decide not to extrapolate below minimum or above maximum X values.
- Algorithm Improvement:
- You can use a moving quadratic (2nd order) regression instead of the normal 1st order regression. This is slower than the first order computation, but not as much slower as I feared.
Please visit the Peltier Tech Charts for Excel page for more information.
Articles About Regression and Trendlines on this Blog
- Trendlines and Chart Types in Excel
- Add One Trendline for Multiple Series
- Trendline Calculator for Multiple Series
- Trendline Fitting Errors
- Choosing a Trendline Type
- Plot Two Time Series and Trendlines with Different Dates
- Polynomial Fit vs. Statistical Process Control
- Moving Averages
- Stacked Column Chart with Stacked Trendlines
- Deming Regression
- Deming Regression Utility
- LOESS Smoothing in Excel
References
Cleveland, W.S. (1979), “Robust Locally Weighted Regression and Smoothing Scatterplots,” Journal of the American Statistical Association, Vol. 74, pp. 829-836.
Cleveland, W.S. and Devlin, S.J. (1988), “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting,” Journal of the American Statistical Association, Vol. 83, pp. 596-610.
NIST Engineering Statistics Handbook, 4.1.4.4. LOESS (aka LOWESS)
NIST Engineering Statistics Handbook, Example of LOESS Computations
Wikipedia, Local regression
derek says
Jon, it looks very nice, but I have a couple of problems installing it. First it objects it “can’t find project or library”, then demands a password. Fortunately cancelling the password prompt results in an installation.
Then I had a problem understanding the instructions, and seeing where the LOESS series was going to be inserted. I think I got it right when I selected the third radio button, Separate XY input and output. Finally, I seem to have lost the ability to Cancel, OK. or Close the application; it responds to mouse clicks by depressing the buttons, but not by closing anything. So I can’t tell what happens in the final stage.
Okay, I killed Excel and restarted. Same password problem on install, but this time the radio button responded with a new range dialog box for the output range. That makes more sense, because I wasn’t sure what was going to happen when I filled in the range input box.
But when I filled the input box and went to fill the output box, something went wrong and I’m back to a stuck Excel. (the output range dialog box has disappeared, which probably happened the first time, but I didn’t notice it)
I’m using Excel 2000 SP3 in XP 5.1 SP3.
Jon Peltier says
Sounds like a mishmash of this post and the previous one. I should have followed my own advice and saved the utility in Excel 2000 before posting it.
When I first opened the xla in Excel 2000, I experienced the same issues you’ve described. I unchecked the reference to the RefEdit control and saved the add-in. When I reopened it in 2000, it worked fine. When I reopened it in 2003 in turn, again all was fine.
Try the download again.
David says
Jon,
That’s a nice little add-in. I installed it and had it working in Excel 2007 in less than 5 minutes. I tried a basic curve using your default settings and it looks great. I tried tweaking the smoothing parameter up and down, but settled on the default values as the cleanest representation.
Thanks for the tool and the great tips in the blog.
David
Brigt E says
Thanks for this one. Used this function today for a wage analysis at my company. By the way I’ve been rather fan of this blog, so keep up the good work. Hopefully my skills will evolve so that I one day can contribute to this community.
Best regards from Norway
Richard Berthelsdorf says
LOESS is a very neat and useful utility!
I discovered that it refuses to work if the spreadsheet is set to use the R1C1 reference style (I’m using Excel 2003).
Jon Peltier says
Richard –
I rarely use R1C1 notation, so I never noticed the error. Thanks for pointing it out. I’ll have to check into it.
Jon Peltier says
I think I’ve fixed the R1C1 problem. Try it again: LOESS Utility.
jeff weir says
Hi Jon. I’ve been playing with this utility today, and found that while it does a great job of smoothing fairly small datasets, it doesn’t do so well with the large datasets with fairly regular cyclical swings that I’ve been feeding it today.
I’m guessing this is an artifact of who the LOESS technique works, rather than due to your fine utility.
In your original LOESS post, you say “For each X value where a Y value is to be calculated, the LOESS technique performs a regression on points in a moving range around the X value, where the values in the moving range are weighted according to their distance from this X value.” I’m guessing that the larger your dataset, the larger the moving range around the X value.
I was using it on some half-hourly electricity metering data. When I used the LOESS utility to smooth a week’s worth of data (336 points), I get a great looking curve compared to the rather ‘jagged’ appearance of the raw data. But when I use it to smooth the whole year (17520 points), I get a flat line. I was expecting to get a similar effect for the whole year’s worth of data as I did with a week’s worth…that is I thought I’d still see some swings etc.
Is this how you’d expect LOESS to work with such a large dataset?
Regards
Jeff
Jon Peltier says
Jeff –
Since alpha is a fraction, the number of points in a LOESS analysis of larger data sets is larger than that for a smaller data set, given a constant alpha. If you get a reasonable smoothing for a week of data for a given alpha, note the number of points used for smoothing. Then carry out LOESS smoothing on the larger data set, using the same number of points (and a correspondingly smaller alpha).
Jeff Weir says
Ahhh…I didn’t realise you could overwrite the number of points. This utility rocks. I like how it retains the last range you used it on…handy if you want to rerun the function on the same range but just change a setting.
Some thoughts on improvement:
1. One thing that would be better is if when selecting a range you could use Shift + Cntl + Down to select really long ranges rather than having to drag down with the mouse. Is this possible? I see that Name Manager doesn’t let you do this either – so maybe it’s an artifact of constructing such addins?
2. In excel 2007, the PTS menu disappears from the addins ‘menu commands’ if you select the chart. For a moment, I thought it was suddenly uninstalled from my machine.
3. The function can also give a misleading match at each end of the series, because it has less data points with which to perform a regression on points in a moving range around the X value than it does everywhere else (and it might be substituting zero values for any missing data here, but I’m not sure). For an example of how a sine wave is affected, see my uploaded speadsheet at http://cid-f380a394764ef31f.skydrive.live.com/self.aspx/.Public/LOESS%20on%20SineWave.xls
To counter this, you could amend the function so that either:
– if it goes to perform a regression with say a moving range of 6 points then it doesn’t return data for the first 3 or the last 3 values; or
– in the above case, it substitutes any missing data from data it does have. For instance, in the above case, for the first point it would use the first 3 numbers twice instead of a moving range that has missing data points.
Alternately, the user could simply amend their range so that they use a subset of the LOESS function output that doesn’t include end values. But they might not be aware of this.
Regardless, this is staying on my ribbon! THanks Jon
Jon Peltier says
Jeff –
1. The utility is constrained by how the RefEdit control works. You can select one cell, then use the mouse and scroll bars to find the opposite cell, nhold Shift, and click on the last cell.
1A. The latest version improves on this. If you only select the top cell of a column, the program uses the whole column, down to the first blank cell.
2. I didn’t add the command to the Chart menu bar, because I figured it was really a data kind of utility. You’ve convinced me otherwise. This will go into the newest edition, and I’ll post it sometime soon.
3. The utility uses the same number of points, but it is a one-sided fit, with points further away given a lower weighting. It’s not perfect, but it beats simple moving averages. Statisticians use this methodology, with the knowledge that there may be end effects.
DaleW says
Nice utility, Jon. Am I right that it needs at least 15 input points to avoid generating an overflow error, or am I doing something it wasn’t expecting?
jeff weir says
I got an overflow error one time when I didn’t select one of the output ranges properly. Maybe this is what’s going on for you, Dale?
DaleW says
Well, Jeff, I got the overflow error roughly 20 for 20 times when I selected 14 pairs or less, and 0 for 8 times when I selected 15 pairs or more. Looks systematic to me. I even duplicated it from scratch with a new workbook.
Each time I was using the contiguous range option. It can accept fewer input points for separate I/O ranges without the overflow error, but with small N the output still isn’t right.
Since you aren’t having this problem, I’ll try again tomorrow with Excel 2003 on a different PC.
Jeff Weir says
Interesting…if you do a pass with say 12 points using the default N (i.e. Number of points for the moving regression) of 5, then it bombs out. If you try again with N of 6, then it works. And then if you try again with N of 5, it works.
Jon Peltier says
Dale & Jeff –
I’ve found the LOESS algorithm to be pretty robust. In the back of my head, the number of 4 points for the moving regression sounds familiar as a limit I’ve encountered, though most of my data sets are at least dozens of points, and I don’t think I’ve routinely gone below a double-digit N.
Stay tuned. I’ve made some enhancements to the utility that make range selection much easier. I’ll be posting the new version in a few days.
Peter says
I want to run the PTS_LOESS from an excel macro. Is that possible?
Thanks
Jon Peltier says
Peter –
That’s not included in the functionality of the LOESS utility, but it’s a good idea. I’ll forward your suggestion to the programming staff. Oh wait, that’s me.
Gary says
Hi Jon,
Great utility! I am a novice using SAS, so having this utility at my fingertips is quite helpful. Has anyone published work using this utility? If so, how have they referenced it?
Best,
Gary
Stephane says
Hi Jon,
Very nice, easy to use, program. I was looking for such a utiliy since ages.
What should I do if I want to calculate a confidence interval (basically how to know the standars deviation for each point) for the Lowess curve ? Is it possible to include that in the utility?
Regards
Stephane
Norman says
Hello Jon,
first excuse my linguistical mistakes if there are some. I am from Germany and trying to use your LOESS Calculator. Unfortunately I have a problem with it. There are two columns representing the X- and Y-values (year and Bray-Curtis-Index). I have managed to input all the necessary data needed in this dialogue (smoothing parameter, number of points).
Then I press the OK-button, but there is not any chart displayed. The program only computes the Y-output data. Can you give me an advice, what could have gone wrong. Besides I use Excel 2007.
P.S. Thank you for this great website and your hard work on it.
my data:
year Bray-Curtis-Index
1995 0,059
1996 0,305
1997 0,376
1998 0,382
1999 0,343
2000 0,440
2001 0,325
2002 0,703
2005 0,633
2006 0,576
2007 0,552
2008 0,700
2009 0,747
2010 0,745
Jon Peltier says
Norman –
The utility only calculates values. You need to insert your own chart.
Norman says
Hello Jon,
thank you for your quick reply. Now I understand this tool.
You need to insert the Loess-values and the original data in the same chart
(secondary axis for Loess-values). But what if I have for the same year different Bray-
Curtis-values. Then the chart looks odd. Thanks in advance.
extract from my table:
value B-C-Index loess
1995 0,059 0,5381
1995 0,182 0,5381
1995 0,447 0,5381
1995 0,396 0,5381
1995 0,514 0,5381
1995 0,674 0,5381
1995 0,623 0,5381
1995 0,809 0,5381
1995 0,781 0,5381
1995 0,489 0,5381
1995 0,503 0,5381
1995 0,525 0,5381
1996 0,305 0,5710
1996 0,561 0,5710
1996 0,349 0,5710
1996 0,533 0,5710
1996 0,393 0,5710
1996 0,693 0,5710
1996 0,834 0,5710
1996 0,531 0,5710
1996 0,782 0,5710
1996 0,698 0,5710
1996 0,636 0,5710
1996 0,699 0,5710
1997 0,376 0,6024
1997 0,493 0,6024
1997 0,349 0,6024
1997 0,421 0,6024
1997 0,537 0,6024
1997 0,553 0,6024
1997 0,421 0,6024
1997 0,425 0,6024
1997 0,941 0,6024
1997 0,591 0,6024
1997 0,850 0,6024
1997 0,584 0,6024
1998 0,382 0,6391
1998 0,506 0,6391
1998 0,441 0,6391
1998 0,442 0,6391
1998 0,402 0,6391
1998 0,538 0,6391
1998 0,495 0,6391
1998 0,395 0,6391
1998 0,925 0,6391
1998 0,810 0,6391
1998 0,958 0,6391
1998 0,709 0,6391
1999 0,343 0,6672
1999 0,592 0,6672
1999 0,729 0,6672
1999 0,467 0,6672
1999 0,738 0,6672
1999 0,842 0,6672
1999 0,694 0,6672
1999 0,852 0,6672
1999 0,990 0,6672
1999 0,754 0,6672
1999 0,975 0,6672
1999 0,935 0,6672
2000 0,440 0,6921
2000 0,626 0,6921
2000 0,789 0,6921
2000 0,536 0,6921
2000 0,778 0,6921
2000 0,873 0,6921
2000 0,904 0,6921
2000 0,920 0,6921
2000 0,818 0,6921
2000 0,910 0,6921
2000 0,992 0,6921
2000 0,950 0,6921
2001 0,325 0,7176
2001 0,533 0,7176
2001 0,539 0,7176
2001 0,359 0,7176
2001 0,637 0,7176
2001 0,788 0,7176
2001 0,936 0,7176
2001 0,925 0,7176
2001 0,826 0,7176
2001 0,924 0,7176
2001 0,969 0,7176
2001 0,952 0,7176
Jon Peltier says
Norman –
Do you have full dates for the multiple values in each year? What the utility does is compute a Y value for each input X value. Since each X value is repeated, the calculated Y value is also repeated. The many duplicates also caused problems with the regression algorithm within the LOESS program.
If I assume the 12 values for each year correspond in order to values on the first of each month (1/1/95, 2/1/95, etc.) I get a result which is much more typical of what I would have expected.
Norman says
Hello Jon,
thank you very much. This is a very good solution for this problem.
Now it looks much better than before. Thanks again.
Norman says
Hello Jon,
I have another question about the utility. According to my last two comments which
contain the necessary data I would like to know if a prediction is possible for the next two years. I found a similar comment from K. A. (Saturday, October 8, 2011, 3:39 am).
Unfortunately I did not understand your solution quite well. Maybe you could explain it to me once more (with the final table). Thank you in advance.
Jon Peltier says
Norm –
Sure, you can extrapolate, subject to concerns raised by any extrapolation.
In the other comment I showed how to get the extrapolated values. Using the first button in the second row, you pick two ranges, each two columns wide. The first has the input data, that is, the actual X and Y values to be used in the extrapolation. The second has the output X, that is, the X values at which you want to extrapolate Y, and blank cells for the output Y.
The result:
DaleW says
Jon,
LOESS extrapolation seems pretty dicey to me. Results can vary widely depending on personal preference for the smoothing parameter.
Extrapolating a linear or exponential or power law or similar fit at least puts the weight of the entire dataset behind the extrapolation. One might even think of a theoretical reason to believe the extrapolation as one is finishing the graph. LOESS just puts the edge of the dataset behind the extrapolation, after effectively admitting one didn’t have a clue in advance as to the real shape of the interpolative fit one is resorting to LOESS to do.
Subject to even more concerns than most extrapolations, I’d say.
Norman says
Thanks Jon, now it’s working.
Shirley says
Hello,
I’m having a trouble using the link to your most current download. When I do click, a new tab opens and the browser is full of symbols (gibberish). Will you mind sending me another link to use? By the way, should this add-in also work in Excel 2010?
Thank you so much for your work on this!
Shirley
Jon Peltier says
Shirley –
I assume you’re talking about the download from LOESS Utility – Awesome Update, since the download link on this page has been disabled.
If you’re getting gibberish, it means your browser is trying to render a page using the Excel add-in. This isn’t going to work. Right click on the link and use Save As to save the add-in to a convenient location. Then install per instructions.
Suza says
Hello John,
I love the loess calculator as it makes statistics so easy…
But there is one thing I always fail to handle: Many of my data contain some blank cells, “” or #NV. And that gives a runtime error 17.
And with only few data I can fill in some substitute numbers that might be in the range. But for a huge amount of data this doesnt work.
Is there any possibility to overcome this difficulty?
Thank you for your calculator and for support
Susanne
Jon Peltier says
Suza –
In a development version of the LOESS calculator, I have built in a presorting function that internally sorts the data by X value and removes blanks (and perhaps non-numeric values of all types). I would suggest sorting your data, first by Y to remove any non-numeric data (blanks and errors), then by X to make sure the existing calculator can correctly process the data.
Karen Runge says
Does this work for Excel 2010? I downloaded the utility, but am having difficulty locating the calculator in the menu.
Jon Peltier says
Karen –
It works in Excel 2000 through 2016. You may have to unblock the file (see https://peltiertech.com/Utility30/Documentation30/RibbonDisappears.html), then the LOESS button appears on the Add-Ins atab of the ribbon.
Derek says
Hello Jon,
I see that a previous user asked about the ability to plot confidence intervals around the LOESS curve (basically how to know the standard deviations for each point) but didn’t see a response. Is this something that can be done in the current utility?
Thank you,
Derek
Jon Peltier says
Derek –
I presume that this is possible, since the Y value at each point is based on a regression calculation, and such a calculation can produce a confidence interval. But the extended confidence interval calculation is too complicated for me to have undertaken. I sprained my brain on the moving regression as it was.