In LOESS Smoothing in Excel I described a technique for smoothing data, which essentially runs a moving weighted regression on the data set. The amount of smoothing that can be achieved without washing out the data is remarkable. In that post I showed a screen shot of a dialog of a working LOESS utility.

*The LOESS utility for Excel has been updated, and the interface made more flexible. The new version is described in LOESS Utility – Awesome Update, where there is a link to download the new utility.*

I’ve used this utility in-house for a while, adding little enhancements here and there. I’ve come up with three main ways to use it, illustrated in the dialog screen shots below. Based on the option selected in the top of the dialog, the mode of operation is changed.

Here the data is arranged simply, with the input X and Y in two adjacent columns, and the output Y, calculated using the input X values, placed into the third adjacent column.

Here the input X and Y are in two columns, and the output Y is calculated using the input X values, but is located in another column. This is handy if you are calculating smoothed values for different values of alpha or N and placing the calculations in different columns.

Finally, the input X and Y values are in two adjacent columns, and the output X and Y are in two other adjacent columns, not adjacent to the input columns. Here the output X values need not be the same as the input X values.

When installed, the utility places a control on the PTS Charts menu. If you haven’t installed other PTS Chart Utilities, this menu will be created first.

Compare my weight over the past three years, smoothed using a seven-day moving average…

… and using the LOESS utility. The main trends are plainly visible, while the short term fluctuations have been removed.

The utility is a regular old Excel add-in, which can be downloaded in the zip file LOESS.zip *(see the update below for a new version of the utility)*. Install this add-in using the protocol in Installing an Excel Add-In or Installing an Add-In in Excel 2007.

Try it out, and tell me what you hate about it.

## Update 8 October 2009

*The LOESS utility for Excel has been updated, and the interface made more flexible. It is described in LOESS Utility – Awesome Update, where there is a link to download the new utility.*

## LOESS Smoothing in Peltier Tech Charts for Excel 3.0

I liked this utility and used it so much, that I have continued to enhance it, and I’ve included it in my commercial Excel Charting software, Peltier Tech Charts for Excel 3.0.

I’ve made numerous improvements to this utility in the seven years since I released the free demo in this article. The ones that come to mind off the top of my head:

- Data Improvement:
- Data is internally sorted by X prior to running the analysis.
- Missing data (X or Y) is ignored during the analysis.
- Algorithms are less sensitive to factors that previously caused errors (too much input at repeated X values, too few points in the analysis).
- If X output prediction range extends beyond X input data range, you can decide not to extrapolate below minimum or above maximum X values.

- Algorithm Improvement:
- You can use a moving quadratic (2nd order) regression instead of the normal 1st order regression. This is slower than the first order computation, but not as much slower as I feared.

Please visit the Peltier Tech Charts for Excel 3.0 page for more information.

## References

Cleveland, W.S. (1979), “Robust Locally Weighted Regression and Smoothing Scatterplots,” *Journal of the American Statistical Association*, Vol. 74, pp. 829-836.

Cleveland, W.S. and Devlin, S.J. (1988), “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting,” Journal of the American Statistical Association, Vol. 83, pp. 596-610.

NIST Engineering Statistics Handbook, 4.1.4.4. LOESS (aka LOWESS)

NIST Engineering Statistics Handbook, Example of LOESS Computations

Wikipedia, Local regression

Jon, it looks very nice, but I have a couple of problems installing it. First it objects it “can’t find project or library”, then demands a password. Fortunately cancelling the password prompt results in an installation.

Then I had a problem understanding the instructions, and seeing where the LOESS series was going to be inserted. I think I got it right when I selected the third radio button, Separate XY input and output. Finally, I seem to have lost the ability to Cancel, OK. or Close the application; it responds to mouse clicks by depressing the buttons, but not by closing anything. So I can’t tell what happens in the final stage.

Okay, I killed Excel and restarted. Same password problem on install, but this time the radio button responded with a new range dialog box for the output range. That makes more sense, because I wasn’t sure what was going to happen when I filled in the range input box.

But when I filled the input box and went to fill the output box, something went wrong and I’m back to a stuck Excel. (the output range dialog box has disappeared, which probably happened the first time, but I didn’t notice it)

I’m using Excel 2000 SP3 in XP 5.1 SP3.

Sounds like a mishmash of this post and the previous one. I should have followed my own advice and saved the utility in Excel 2000 before posting it.

When I first opened the xla in Excel 2000, I experienced the same issues you’ve described. I unchecked the reference to the RefEdit control and saved the add-in. When I reopened it in 2000, it worked fine. When I reopened it in 2003 in turn, again all was fine.

Try the download again.

Jon,

That’s a nice little add-in. I installed it and had it working in Excel 2007 in less than 5 minutes. I tried a basic curve using your default settings and it looks great. I tried tweaking the smoothing parameter up and down, but settled on the default values as the cleanest representation.

Thanks for the tool and the great tips in the blog.

David

Thanks for this one. Used this function today for a wage analysis at my company. By the way I’ve been rather fan of this blog, so keep up the good work. Hopefully my skills will evolve so that I one day can contribute to this community.

Best regards from Norway

LOESS is a very neat and useful utility!

I discovered that it refuses to work if the spreadsheet is set to use the R1C1 reference style (I’m using Excel 2003).

Richard –

I rarely use R1C1 notation, so I never noticed the error. Thanks for pointing it out. I’ll have to check into it.

I think I’ve fixed the R1C1 problem. Try it again: LOESS Utility.

Hi Jon. I’ve been playing with this utility today, and found that while it does a great job of smoothing fairly small datasets, it doesn’t do so well with the large datasets with fairly regular cyclical swings that I’ve been feeding it today.

I’m guessing this is an artifact of who the LOESS technique works, rather than due to your fine utility.

In your original LOESS post, you say “For each X value where a Y value is to be calculated, the LOESS technique performs a regression on points in a moving range around the X value, where the values in the moving range are weighted according to their distance from this X value.” I’m guessing that the larger your dataset, the larger the moving range around the X value.

I was using it on some half-hourly electricity metering data. When I used the LOESS utility to smooth a week’s worth of data (336 points), I get a great looking curve compared to the rather ‘jagged’ appearance of the raw data. But when I use it to smooth the whole year (17520 points), I get a flat line. I was expecting to get a similar effect for the whole year’s worth of data as I did with a week’s worth…that is I thought I’d still see some swings etc.

Is this how you’d expect LOESS to work with such a large dataset?

Regards

Jeff

Jeff –

Since alpha is a fraction, the number of points in a LOESS analysis of larger data sets is larger than that for a smaller data set, given a constant alpha. If you get a reasonable smoothing for a week of data for a given alpha, note the number of points used for smoothing. Then carry out LOESS smoothing on the larger data set, using the same number of points (and a correspondingly smaller alpha).

Ahhh…I didn’t realise you could overwrite the number of points. This utility rocks. I like how it retains the last range you used it on…handy if you want to rerun the function on the same range but just change a setting.

Some thoughts on improvement:

1. One thing that would be better is if when selecting a range you could use Shift + Cntl + Down to select really long ranges rather than having to drag down with the mouse. Is this possible? I see that Name Manager doesn’t let you do this either – so maybe it’s an artifact of constructing such addins?

2. In excel 2007, the PTS menu disappears from the addins ‘menu commands’ if you select the chart. For a moment, I thought it was suddenly uninstalled from my machine.

3. The function can also give a misleading match at each end of the series, because it has less data points with which to perform a regression on points in a moving range around the X value than it does everywhere else (and it might be substituting zero values for any missing data here, but I’m not sure). For an example of how a sine wave is affected, see my uploaded speadsheet at http://cid-f380a394764ef31f.skydrive.live.com/self.aspx/.Public/LOESS%20on%20SineWave.xls

To counter this, you could amend the function so that either:

– if it goes to perform a regression with say a moving range of 6 points then it doesn’t return data for the first 3 or the last 3 values; or

– in the above case, it substitutes any missing data from data it does have. For instance, in the above case, for the first point it would use the first 3 numbers twice instead of a moving range that has missing data points.

Alternately, the user could simply amend their range so that they use a subset of the LOESS function output that doesn’t include end values. But they might not be aware of this.

Regardless, this is staying on my ribbon! THanks Jon

Jeff –

1. The utility is constrained by how the RefEdit control works. You can select one cell, then use the mouse and scroll bars to find the opposite cell, nhold Shift, and click on the last cell.

1A. The latest version improves on this. If you only select the top cell of a column, the program uses the whole column, down to the first blank cell.

2. I didn’t add the command to the Chart menu bar, because I figured it was really a data kind of utility. You’ve convinced me otherwise. This will go into the newest edition, and I’ll post it sometime soon.

3. The utility uses the same number of points, but it is a one-sided fit, with points further away given a lower weighting. It’s not perfect, but it beats simple moving averages. Statisticians use this methodology, with the knowledge that there may be end effects.

Nice utility, Jon. Am I right that it needs at least 15 input points to avoid generating an overflow error, or am I doing something it wasn’t expecting?

I got an overflow error one time when I didn’t select one of the output ranges properly. Maybe this is what’s going on for you, Dale?

Well, Jeff, I got the overflow error roughly 20 for 20 times when I selected 14 pairs or less, and 0 for 8 times when I selected 15 pairs or more. Looks systematic to me. I even duplicated it from scratch with a new workbook.

Each time I was using the contiguous range option. It can accept fewer input points for separate I/O ranges without the overflow error, but with small N the output still isn’t right.

Since you aren’t having this problem, I’ll try again tomorrow with Excel 2003 on a different PC.

Interesting…if you do a pass with say 12 points using the default N (i.e. Number of points for the moving regression) of 5, then it bombs out. If you try again with N of 6, then it works. And then if you try again with N of 5, it works.

Dale & Jeff –

I’ve found the LOESS algorithm to be pretty robust. In the back of my head, the number of 4 points for the moving regression sounds familiar as a limit I’ve encountered, though most of my data sets are at least dozens of points, and I don’t think I’ve routinely gone below a double-digit N.

Stay tuned. I’ve made some enhancements to the utility that make range selection much easier. I’ll be posting the new version in a few days.

I want to run the PTS_LOESS from an excel macro. Is that possible?

Thanks

Peter –

That’s not included in the functionality of the LOESS utility, but it’s a good idea. I’ll forward your suggestion to the programming staff. Oh wait, that’s me.

Hi Jon,

Great utility! I am a novice using SAS, so having this utility at my fingertips is quite helpful. Has anyone published work using this utility? If so, how have they referenced it?

Best,

Gary

Hi Jon,

Very nice, easy to use, program. I was looking for such a utiliy since ages.

What should I do if I want to calculate a confidence interval (basically how to know the standars deviation for each point) for the Lowess curve ? Is it possible to include that in the utility?

Regards

Stephane

Hello Jon,

first excuse my linguistical mistakes if there are some. I am from Germany and trying to use your LOESS Calculator. Unfortunately I have a problem with it. There are two columns representing the X- and Y-values (year and Bray-Curtis-Index). I have managed to input all the necessary data needed in this dialogue (smoothing parameter, number of points).

Then I press the OK-button, but there is not any chart displayed. The program only computes the Y-output data. Can you give me an advice, what could have gone wrong. Besides I use Excel 2007.

P.S. Thank you for this great website and your hard work on it.

my data:

year Bray-Curtis-Index

1995 0,059

1996 0,305

1997 0,376

1998 0,382

1999 0,343

2000 0,440

2001 0,325

2002 0,703

2005 0,633

2006 0,576

2007 0,552

2008 0,700

2009 0,747

2010 0,745

Norman –

The utility only calculates values. You need to insert your own chart.

Hello Jon,

thank you for your quick reply. Now I understand this tool.

You need to insert the Loess-values and the original data in the same chart

(secondary axis for Loess-values). But what if I have for the same year different Bray-

Curtis-values. Then the chart looks odd. Thanks in advance.

extract from my table:

value B-C-Index loess

1995 0,059 0,5381

1995 0,182 0,5381

1995 0,447 0,5381

1995 0,396 0,5381

1995 0,514 0,5381

1995 0,674 0,5381

1995 0,623 0,5381

1995 0,809 0,5381

1995 0,781 0,5381

1995 0,489 0,5381

1995 0,503 0,5381

1995 0,525 0,5381

1996 0,305 0,5710

1996 0,561 0,5710

1996 0,349 0,5710

1996 0,533 0,5710

1996 0,393 0,5710

1996 0,693 0,5710

1996 0,834 0,5710

1996 0,531 0,5710

1996 0,782 0,5710

1996 0,698 0,5710

1996 0,636 0,5710

1996 0,699 0,5710

1997 0,376 0,6024

1997 0,493 0,6024

1997 0,349 0,6024

1997 0,421 0,6024

1997 0,537 0,6024

1997 0,553 0,6024

1997 0,421 0,6024

1997 0,425 0,6024

1997 0,941 0,6024

1997 0,591 0,6024

1997 0,850 0,6024

1997 0,584 0,6024

1998 0,382 0,6391

1998 0,506 0,6391

1998 0,441 0,6391

1998 0,442 0,6391

1998 0,402 0,6391

1998 0,538 0,6391

1998 0,495 0,6391

1998 0,395 0,6391

1998 0,925 0,6391

1998 0,810 0,6391

1998 0,958 0,6391

1998 0,709 0,6391

1999 0,343 0,6672

1999 0,592 0,6672

1999 0,729 0,6672

1999 0,467 0,6672

1999 0,738 0,6672

1999 0,842 0,6672

1999 0,694 0,6672

1999 0,852 0,6672

1999 0,990 0,6672

1999 0,754 0,6672

1999 0,975 0,6672

1999 0,935 0,6672

2000 0,440 0,6921

2000 0,626 0,6921

2000 0,789 0,6921

2000 0,536 0,6921

2000 0,778 0,6921

2000 0,873 0,6921

2000 0,904 0,6921

2000 0,920 0,6921

2000 0,818 0,6921

2000 0,910 0,6921

2000 0,992 0,6921

2000 0,950 0,6921

2001 0,325 0,7176

2001 0,533 0,7176

2001 0,539 0,7176

2001 0,359 0,7176

2001 0,637 0,7176

2001 0,788 0,7176

2001 0,936 0,7176

2001 0,925 0,7176

2001 0,826 0,7176

2001 0,924 0,7176

2001 0,969 0,7176

2001 0,952 0,7176

Norman –

Do you have full dates for the multiple values in each year? What the utility does is compute a Y value for each input X value. Since each X value is repeated, the calculated Y value is also repeated. The many duplicates also caused problems with the regression algorithm within the LOESS program.

If I assume the 12 values for each year correspond in order to values on the first of each month (1/1/95, 2/1/95, etc.) I get a result which is much more typical of what I would have expected.

Hello Jon,

thank you very much. This is a very good solution for this problem.

Now it looks much better than before. Thanks again.

Hello Jon,

I have another question about the utility. According to my last two comments which

contain the necessary data I would like to know if a prediction is possible for the next two years. I found a similar comment from K. A. (Saturday, October 8, 2011, 3:39 am).

Unfortunately I did not understand your solution quite well. Maybe you could explain it to me once more (with the final table). Thank you in advance.

Norm –

Sure, you can extrapolate, subject to concerns raised by any extrapolation.

In the other comment I showed how to get the extrapolated values. Using the first button in the second row, you pick two ranges, each two columns wide. The first has the input data, that is, the actual X and Y values to be used in the extrapolation. The second has the output X, that is, the X values at which you want to extrapolate Y, and blank cells for the output Y.

The result:

Jon,

LOESS extrapolation seems pretty dicey to me. Results can vary widely depending on personal preference for the smoothing parameter.

Extrapolating a linear or exponential or power law or similar fit at least puts the weight of the entire dataset behind the extrapolation. One might even think of a theoretical reason to believe the extrapolation as one is finishing the graph. LOESS just puts the edge of the dataset behind the extrapolation, after effectively admitting one didn’t have a clue in advance as to the real shape of the interpolative fit one is resorting to LOESS to do.

Subject to even more concerns than most extrapolations, I’d say.

Thanks Jon, now it’s working.

Hello,

I’m having a trouble using the link to your most current download. When I do click, a new tab opens and the browser is full of symbols (gibberish). Will you mind sending me another link to use? By the way, should this add-in also work in Excel 2010?

Thank you so much for your work on this!

Shirley

Shirley –

I assume you’re talking about the download from LOESS Utility – Awesome Update, since the download link on this page has been disabled.

If you’re getting gibberish, it means your browser is trying to render a page using the Excel add-in. This isn’t going to work. Right click on the link and use Save As to save the add-in to a convenient location. Then install per instructions.

Hello John,

I love the loess calculator as it makes statistics so easy…

But there is one thing I always fail to handle: Many of my data contain some blank cells, “” or #NV. And that gives a runtime error 17.

And with only few data I can fill in some substitute numbers that might be in the range. But for a huge amount of data this doesnt work.

Is there any possibility to overcome this difficulty?

Thank you for your calculator and for support

Susanne

Suza –

In a development version of the LOESS calculator, I have built in a presorting function that internally sorts the data by X value and removes blanks (and perhaps non-numeric values of all types). I would suggest sorting your data, first by Y to remove any non-numeric data (blanks and errors), then by X to make sure the existing calculator can correctly process the data.

Does this work for Excel 2010? I downloaded the utility, but am having difficulty locating the calculator in the menu.

Karen –

It works in Excel 2000 through 2016. You may have to unblock the file (see http://peltiertech.com/Utility30/Documentation30/RibbonDisappears.html), then the LOESS button appears on the Add-Ins atab of the ribbon.