PTS Blog

Main menu:

PTS Utilities

Commercial Utilities developed by Peltier Technical Services

  Waterfall Chart
  Box and Whiskers
     Coming soon!


 

Excel Books

Books that I own and use while developing in Excel

Goods and Services

Excel or charting related products and services which I use or feel are worthwhile additions

Subscribe

Subscribe

Site search


Recent Posts

Recently Commented

Popular Posts

Archive


 

Categories


 

Polynomial Fit vs. Statistical Process Control

by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2008. All rights reserved.

I’ve written a bit about regression and curve fitting; see Regression Approach to a Simple Physics Problem, Choosing a Trendline Type, and Trendline Fitting Errors. A blog reader asked for help with some sample data that he couldn’t fit. Here is the data.

Table

I plotted the data and gave it the hairy eyeball. Not a linear trend, maybe something quadratic.

Data

Attempted Regression

The blog reader had fitted a 6th order polynomial trendline, and was having trouble using it to predict values. My fit is shown below, and I had no such problems with predictions matching the trendline. I suspect the user had insufficient precision in his coefficients, which is covered in Trendline Fitting Errors.

Poly Fit

The 6th order fit isn’t really all that great. I decided it really isn’t much better than the quadratic fit I had initially suspected.

Poly Fit

Then I thought the data almost fit two line segments over different ranges of data. I’ve plotted these below.

Bi-Linear Fit

I replied to the user with this suggestion, and he said that wouldn’t work, because the data would have to be fitted with many line segments, because the data he gave me was only part of a much larger sequence of values.

Run Charts

I thought a moment and realized that with many weeks of repeated data, what the user needed was an approach based on Statistical Process Control. I wrote about Control charts in Introducing Control Charts (Run Charts). This is an opportunity to illustrate another set of run charts. In this example, I relied on techniques from a small, 136-page book called Understanding Variation.

Understanding Variation: The Key to Managing Chaos
Donald J. Wheeler

I added a column to my table to calculate the Moving Range, which is simply the absolute value of the difference between the current value and the previous value. This is an easier measure of variation to compute than the standard deviation, though with modern computer hardware and software that’s not an important consideration.

Table

In any case, I plotted the weekly values data and the moving range data.

Values
Moving Range

I computed the averages of the values data and of the moving ranges. I added horizontal lines to indicate the averages (see Run Chart with Mean and Standard Deviation Lines for detailed instructions).

Values with Average
Moving Range with Average

Then I used simple factors to determine upper and lower control limits for these quantities, and I added the limits to the charts.  For the values, the control limits are given by:

Limit = Average Value ± 2.66 * Average Moving Range
 

For the moving range, the lower control limit is zero and the upper control limit is given by:

Limit = 3.27 * Average Moving Range
 

Values with Limits
Data

What this tells me is that the values and the moving ranges fall within limits, so the variability is given not by anything we can fit a curve to, but simply by normal variation within the process. Closer examination of some of the data would probably point to an out-of-control process (for example, the last five values show continuing decline). Let’s just worry about violations of the control limits.

I calculated 70 more values with the same mean and standard deviation as the original 10 values, to simulate an ongoing process (because the blog reader did not provide more data). I plotted these values on the same chart with the original ten values, using the limits calculated based on the original ten values.

Extended Values
Extended Moving Range

The values look pretty good, all within the limit except for a single point, which should be examined for any special causes of variation. All of the moving range points fall within the upper control limit. I recalculated the averages and limits using the entire data set and replotted the data.

All Values
All Moving Range

There was little difference; the limits were slightly more generous. The value that exceeded the control limit in the first chart of all the data still is out of control, and still deserves a closer look.

One final note: The polynomial regression breaks down completely in a process like this which is successfully modeled using SPC. A linear fit may be useful to detect a possible trend of the average over time.

SPC vs Trendline

SPC vs Trendline

Further Reading about Statistical Process Control

ISO 9001 - Introduction to SPC

Control Charts on Wikipedia

Interpreting Control Charts

Selecting the Right Control Chart

Possibly Related Posts:

If you liked this entry, please bookmark and share it:
  • Digg
  • del.icio.us
  • Facebook
  • Technorati
  • TwitThis
  • StumbleUpon
  • Google
  • Reddit

Comments

I welcome comments from my readers. If you have an opinion on this post, if you have a question or if there is anything to add, I want to hear from you. Whether you agree or disagree, please join the discussion.

Read the PTS Blog Comment Policy.


Comment from Mike Woodhouse
Time: Thursday, October 9, 2008, 5:51 am

I ran the data through Curvexpert which got the best fit to be a Gaussian model:

y = a*exp(-(x-b)^2/(2*c^2)) ( I think)

where

a = 407.68954
b = 4.8840398
c = 3.2081591

It fits slightly better than the quadratic (r = 0.9286) but I wouldn’t give either much credit, especially when we subsequently read that we don’t have the whole data set.

And a sixth-order polynomial is over-fitting to an appallingly dangerous degree unless you know exactly what you’re doing (in which case the question wouldn’t have been asked in the first place!)


Comment from Jon Peltier
Time: Thursday, October 9, 2008, 7:22 am

Mike - Thanks for that. The data does have a shape that would be somewhat Gaussian. But as you say (and I said above), fitting only a small section of a larger data set is generally not a valid approach.

I’ve also said that a 6th order poly fit is overkill. You gain in the third or fourth digit of R², but that’s fooling yourself. In the physical world there are few phenomena that follow a quadratic relationship, never mind four orders higher.


Comment from Rob
Time: Monday, October 13, 2008, 8:52 am

a good example of the 6th order overkill is Runge’s phenomenon:

http://www.google.co.uk/search?q=runge’s+phenomenon

Write a comment





Subscribe without commenting

Create Excel dashboards quickly with Plug-N-Play reports.