I’ve written a bit about regression and curve fitting; see Regression Approach to a Simple Physics Problem, Choosing a Trendline Type, and Trendline Fitting Errors. A blog reader asked for help with some sample data that he couldn’t fit. Here is the data.
I plotted the data and gave it the hairy eyeball. Not a linear trend, maybe something quadratic.
The blog reader had fitted a 6th order polynomial trendline, and was having trouble using it to predict values. My fit is shown below, and I had no such problems with predictions matching the trendline. I suspect the user had insufficient precision in his coefficients, which is covered in Trendline Fitting Errors.
The 6th order fit isn’t really all that great. I decided it really isn’t much better than the quadratic fit I had initially suspected.
Then I thought the data almost fit two line segments over different ranges of data. I’ve plotted these below.
I replied to the user with this suggestion, and he said that wouldn’t work, because the data would have to be fitted with many line segments, because the data he gave me was only part of a much larger sequence of values.
I thought a moment and realized that with many weeks of repeated data, what the user needed was an approach based on Statistical Process Control. I wrote about Control charts in Introducing Control Charts (Run Charts). This is an opportunity to illustrate another set of run charts. In this example, I relied on techniques from a small, 136-page book called Understanding Variation.
Donald J. Wheeler
I added a column to my table to calculate the Moving Range, which is simply the absolute value of the difference between the current value and the previous value. This is an easier measure of variation to compute than the standard deviation, though with modern computer hardware and software that’s not an important consideration.
In any case, I plotted the weekly values data and the moving range data.
I computed the averages of the values data and of the moving ranges. I added horizontal lines to indicate the averages (see Run Chart with Mean and Standard Deviation Lines for detailed instructions).
Then I used simple factors to determine upper and lower control limits for these quantities, and I added the limits to the charts. For the values, the control limits are given by:
Limit = Average Value ± 2.66 * Average Moving Range
For the moving range, the lower control limit is zero and the upper control limit is given by:
Limit = 3.27 * Average Moving Range
What this tells me is that the values and the moving ranges fall within limits, so the variability is given not by anything we can fit a curve to, but simply by normal variation within the process. Closer examination of some of the data would probably point to an out-of-control process (for example, the last five values show continuing decline). Let’s just worry about violations of the control limits.
I calculated 70 more values with the same mean and standard deviation as the original 10 values, to simulate an ongoing process (because the blog reader did not provide more data). I plotted these values on the same chart with the original ten values, using the limits calculated based on the original ten values.
The values look pretty good, all within the limit except for a single point, which should be examined for any special causes of variation. All of the moving range points fall within the upper control limit. I recalculated the averages and limits using the entire data set and replotted the data.
There was little difference; the limits were slightly more generous. The value that exceeded the control limit in the first chart of all the data still is out of control, and still deserves a closer look.
One final note: The polynomial regression breaks down completely in a process like this which is successfully modeled using SPC. A linear fit may be useful to detect a possible trend of the average over time.
Further Reading about Statistical Process Control