Last fall in LOESS Utility – Awesome Update I introduced the latest version of my LOESS Utility for Excel. Compared to the earlier version, this one provides the user with much more flexibility in data range selection. As with all of my utilities, I made these adjustments after using the utility for my own real analysis. I don’t know of too many companies where the programmers are also users of the same software.
Lately I’ve received a few comments and emails asking how to use the utility’s dialog. I thought it was self-explanatory, but then, as the designer I already knew how it worked. This article is intended to answer these questions.
The LOESS Utility Dialog
The dialog has seven icon-buttons, and depending on which of these has been selected, one to four range selection boxes appear in the large empty space to the right of the buttons. The icons represent the data arrangement: Blue = X Input, Red = Y Input, Green = X Output, Orange = Y Output. The labels above the icons and the labels adjacent to the range selection boxes are intended to guide the user.
The X Input and Y Input ranges must have the same number of rows, and the X Output and Y Output ranges also must have the same number of rows, but not necessarily the same number as the input ranges.Mismatched numbers of rows will result in an error, as will blank cells and non-numeric cells within the selected ranges (except that the contents of the Y Output range are ignored and overwritten).
If any two of the ranges selected in the range boxes are not contiguous, they need not begin on the same row.
One handy feature I built into this version of the utility is autoselection of the number of rows in a range. If you select the first data cell of a column, the utility determines how many rows are filled with data, and uses that range in its calculation. If you select more than one row, the program does no autoselection, but instead assumes you selected the rows you needed, and it calculates based on your selection. The autoselection means the user doesn’t have to count the number of rows of the ranges.
LOESS Utility Button 1
The first button needs only one range selection box, because the data all fits into a contiguous range. The first column of the range contains the X values (used for both X Input and X Output), the second column contains the Y Input data, and the Y Output data will be placed into the third column, overwriting any existing data in the column.
LOESS Utility Button 2
The second button needs two range selection boxes. The first column of the first range contains the X values (used for both X Input and X Output), the second column contains the Y Input data. The Y Output data will be placed into the only column of the second range, overwriting any existing data in the column. The Y Output range does not have to start on the same row as the first range. All ranges must have the same number of rows.
LOESS Utility Button 3
The third button needs three range selection boxes, each indicating a single column. The first range contains the X values (used for both X Input and X Output), the second rangecontains the Y Input data, and the Y Output data will be placed into the third range, overwriting any existing data in the column. The three ranges do not have to start on the same row. All ranges must have the same number of rows.
LOESS Utility Button 4
The fourth button needs two range selection boxes. The first range contains the X Input and Y Input data in the first and second columns of the range. The X output data is in the first column of the second range, and the Y Output data will be placed into the second column, overwriting any existing data in the column. The two ranges do not need to start on the same row or contain the same number of rows.
LOESS Utility Button 5
The fifth button needs three range selection boxes. The first range contains the X Input and Y Input data in the first and second columns of the range. The X output data is in the second range, and the Y Output data will be placed into the third range, overwriting any existing data in the column. The three ranges do not need to start on the same row. The second and third ranges must have the same number of rows, but not necessarily the same number of rows as the first range.
LOESS Utility Button 6
The sixth button needs three range selection boxes. The first range contains the X values, the second range contains the Y Input data, the first column of the third range contains the X Output values, and the Y Output data will be placed into the second column of the third range, overwriting any existing data in the column. The three ranges do not have to start on the same row. The first and second ranges must have the same number of rows, but not necessarily the same number of rows as the third range.
LOESS Utility Button 7
The seventh button needs all four range selection boxes. The first range contains the X Input values, the second range contains the Y Input data, the third range contains the X output values, and the Y Output data will be placed into the fourth range, overwriting any existing data in the column. None of the ranges have to start on the same row. The first and second ranges must have the same number of rows, and the third and fourth ranges must have the same number of rows, but not necessarily the same number of rows as the first and second ranges.
Smoothing Parameter and Number of Points
The user can adjust the smoothness of the calculated Y Output by changing either the smoothing parameter alpha (the fraction of points used in the moving regression) or the number of points used in the moving regression. When one of these is changed, the other is updated, based on the number of points (rows) in the X Input range.
I think that values of 0.25 to 0.50 are typical for alpha, but it seems to me that if you have some logical number of points built into the data you should use that number. For example, if you have a few years of daily data, you might choose 90 points (3 months or a calendar quarter) for your analysis.
Rerunning an Analysis
When you perform a LOESS calculation with this utility, the values in the dialog are saved, and re-entered into the dialog the next time it is called on the same worksheet. Saved settings from one worksheet do not affect settings from another worksheet. If you use the output in a monthly report, this month’s settings are saved. Next month when you run the utility on the updated worksheet, your previous settings will be used as defaults, saving you a lot of time.
If you select only the first data cell of a range, the autoselection feature will select the right number of rows this month, and again next month even if data has been added or removed.
LOESS Smoothing in Peltier Tech Charts for Excel
I liked this utility and used it so much, that I’ve included a greatly enhanced version in my commercial Excel Charting software, Peltier Tech Charts for Excel.
I’ve made numerous improvements to this utility in the seven years since I released the free demo in this article. The ones that come to mind off the top of my head:
- Data Improvement:
- Data is internally sorted by X prior to running the analysis.
- Missing input data (X or Y) is ignored during the analysis.
- Algorithms are less sensitive to factors that previously caused errors (too much input at repeated X values, too few points in the analysis).
- If X output prediction range extends beyond X input data range, you can decide not to extrapolate below minimum or above maximum X values.
- Algorithm Improvement:
- You can use a moving quadratic (2nd order) regression instead of the normal 1st order regression. This is slower than the first order computation, but not as much slower as I feared.
Please visit the Peltier Tech Charts for Excel page for more information.
Cris says
PTS,
I have a large array with 1000’s of points. It seems I cannot get LOESS to work with more than 500 points. Is that the limit or am I doing something wrong?
Thank You
Jon Peltier says
Cris –
There is no inherent limitation to the number of points that the Loess utility could handle. I just ran a series of tests with up to 5000 points using alpha=0.25. The 5000 point model took a while (10s of seconds) to compute. Certainly computation time would be an issue.
Did you receive any message from the program, or did it just not seem to work?
Jon Peltier says
Cris reports that he had some non-numeric values (#DIV/0! errors) in his data. When he removed these, the program ran, albeit slowly.
irene says
this truly is awesome. Thank you so much!!!
Jerry says
I’m curious about this, but the different threads I’ve chased have all ended up with a .xla even though I’m sure I saw a text reference to .xlam. Can you guide me to that?
Jon Peltier says
Jerry –
I guess I’m not sure what you mean. The LOESS Utility is an .xla file.
Jerry says
Thank you – I don’t know where I got that xlam idea. I’ll load the .xla into Excel 2007 then. Thanks.
Joemon says
Thanks for this excellent utility. I tried the first 3 options, its works fine and got the Y value. However I would like to get the X value, so tried the 4th option, but it didn’t work, not getting the X values.
Jon Peltier says
You have to enter the X values, and the corresponding Y values are calculated.
Joemon says
Thanks for the reply. I have tried exactly as shown in LOESS Utility Button 4. Given Xinput and Y input as show in picture. But no values getting in Xoutput column, 7y
Jon Peltier says
The program cannot know what X values to use.
Enter the set of X values for which you want to output Y values. This option is for cases where you want to use different X values than what went into the analysis. For example, if there are very many input points, such as Xinput={0,0.1,0.2,0.3,etc.}, you may want to calculate only enough values to plot the data, using Xoutput={0,1,2,..,9,10}.
This means you select Xinput and Yinput used to compute the moving weighted regression, then you select a different set of X values, Xoutput, for which the program will calculate Y values, Youtput.
Cameron says
Very interested to use this tool in Excel 2016 v15.34, running on a MacBook Pro Retina, and under OSX 10.11.6.
I have spent about an hour. Perhaps I have not read the instructions fully enough.
I was able to download, and install, and activate the Macro. I did get the image above with colored icons.
• I found I could not get the Macro to be remembered, in that I could not add it as a permanent Macro to a specific spreadsheet, or spreadsheets in general. The current Excel preference setting is Disable All Macros with Notification, so approve on a case by case basis. That seems about right.
• The dialogue box and text is quite small on the retina Mac and a little hard to read.
• The first run it hung Excel. But later attempts it did work.
• I selected contiguous data on the subsequent runs.
• I entered manually the beginning and end, consistent range, for X in put and y input. There was no autofill on clicking the top cell, though that cell reference was entered enabling me to complete the range.
• I remain confused as to the significance of the different options, other than these options seem to be different to how you have your data columns situated within a spreadsheet. In my case, the new output would be only Y data to my understanding, with X data remaining the same. I have regular interval X data with “noisy” Y values, and seek a smoother curve for the Y values. Perhaps my understanding is not complete with the process.
• Several attempts generated output X or output X&Y values, but all the values were the same constant.
• I tried different vales for alpha. I note that changes to N or alpha lead to a recalculation of the other. My data set has length around N=1000. If I think about 30 data points is sufficient for a calculation, that seems to produce a very low alpha. I tried running the macro with just alpha 0.33 set so it could calculate the N number, and at other times changing N from 30 to 300. Same single value outcomes.
Puzzled as to what I am doing wrong?
Is it possible that with updates to Excel something has changed within Excel that prevents this macro add in from working? Or does it rely on the other standard add ins from being installed perhaps, eg Analysis Tool pack, first? (I do not have them installed, active, currently.)
Jon Peltier says
Cameron –
This utility is a free sample which was written long ago in Excel for Windows. All of the descriptions in this article are based on Windows Excel.
There are limitations in Excel 2011 for Mac and even more limitations in Excel 2016 for Mac which interfere with the utility’s functionality on the Mac. Many of your issues relate to these limitations, and all limitations have been addressed in the commercial product mentioned at the end of the article.
Mac issues:
– Sometime Mac Excel 2016 does not remember which add-ins are installed.
– Macs use pixels rather than points to define screen sizes, so the dialog is very small on the Mac. In addition, Mac Excel only allows 24-bit bitmaps as images in VBA dialogs, and I had used a different image format in this sample utility.
The add-in does not interfere with other add-ins, and updates to Excel should not affect how it works.
The commercial version has improvements that deal with unsorted X values, missing or invalid data entries, extremely low alpha settings, etc.
You need to specify the top value of the input X, input Y, and output X values; do not select the text label. The program will find the last value in the column. The program does not calculate output X values: how does it know where you want to calculate Y values? he program should not let you proceed without entering all three imputs, although some options allow using the same X values for input and output.
Cameron says
Apologies for my misunderstanding taking up your time.
“ I liked this utility and used it so much, that I have continued to enhance it, and I’ve included it in my commercial Excel Charting software, Peltier Tech Charts for Excel 3.0.”
Should it help, I read this to mean that the blog discussing the improvements and referring to a down load was the new improved algorithm, to which you have also added it to your commercial graph package. I went and looked at your graph package, and at that time decided that the kind of additional graphing types were not of direct relevance to me. Hence my misunderstanding.
For my particular situation, a 5 year project with daily data collection, my data collection remains in Excel, enabling monitoring of progress, and some initial analysis. The data is then exported to other packages for more detailed analysis as a whole project. I am thinking for identification of max and min points in the data, applying Loess smoothing, would be of value pre export for analysis.
I do appreciate your work here with Loess algorithms, and including the Mac versions. There are some Loess and other curve smoothing functions in Python packages and R, however, I would very much value being able to apply in Excel for better monitoring and before export, so I may well be a customer shortly.
Jon Peltier says
Cameron –
Yes, I realize that the wording was misleading, and I’ve adjusted it. I apologize for any confusion.