<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Polynomial Fit vs. Statistical Process Control</title>
	<atom:link href="http://peltiertech.com/WordPress/polynomial-fit-vs-statistical-process-control/feed/" rel="self" type="application/rss+xml" />
	<link>http://peltiertech.com/WordPress/polynomial-fit-vs-statistical-process-control/</link>
	<description>Peltier Tech Excel Charts and Programming Blog</description>
	<lastBuildDate>Fri, 10 Feb 2012 23:37:49 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<item>
		<title>By: Bryan</title>
		<link>http://peltiertech.com/WordPress/polynomial-fit-vs-statistical-process-control/comment-page-1/#comment-159137</link>
		<dc:creator>Bryan</dc:creator>
		<pubDate>Tue, 29 Nov 2011 18:31:04 +0000</pubDate>
		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=574#comment-159137</guid>
		<description>Maybe it&#039;s because I have always worked in empirical science and not purely speculative fields, but I have always been instructed and found it to be valid to distrust each additional order of a polynomial past linear, in increasing level of distrust as the exponent increases. Years ago, there was a post-doc in our lab who insisted on using high-order polynomials to curve fit her Bradford assay standards. She got very tight fits. They were also meaningless, since there was no way the kinetics of the Bradford could really be explained by a polynomial. The usefulness of the polynomial fit is that it can account for k variables that effect the level of y. The polynomial order should never be larger than k. If you really have no clue what k might be, then do not ever use a polynomial model, no matter how tightly it might fit. Otherwise, you&#039;re just making up effects to suit your desired outcome.</description>
		<content:encoded><![CDATA[<p>Maybe it&#8217;s because I have always worked in empirical science and not purely speculative fields, but I have always been instructed and found it to be valid to distrust each additional order of a polynomial past linear, in increasing level of distrust as the exponent increases. Years ago, there was a post-doc in our lab who insisted on using high-order polynomials to curve fit her Bradford assay standards. She got very tight fits. They were also meaningless, since there was no way the kinetics of the Bradford could really be explained by a polynomial. The usefulness of the polynomial fit is that it can account for k variables that effect the level of y. The polynomial order should never be larger than k. If you really have no clue what k might be, then do not ever use a polynomial model, no matter how tightly it might fit. Otherwise, you&#8217;re just making up effects to suit your desired outcome.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DaleW</title>
		<link>http://peltiertech.com/WordPress/polynomial-fit-vs-statistical-process-control/comment-page-1/#comment-20885</link>
		<dc:creator>DaleW</dc:creator>
		<pubDate>Fri, 23 Oct 2009 00:35:00 +0000</pubDate>
		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=574#comment-20885</guid>
		<description>Jon,

I&#039;d hate to see you fall into absolute skepticism whenever there are only 10 data points.

Based solely on these 10 points, a rather compelling case can be made that we do NOT here have a random sample from a process that is in a state of statistical control.  That conclusion seems meaningful to me, and suggests SPC charting of a fuller dataset would very likely find it to be out of control.  (Or perhaps  that you were presented with a worst case example from a much larger dataset.)

As our friendly default hypothesis, let&#039;s assume we have your hypothesized stable process where each measurement has only independent random error; this is the hypothesis against which an SPC chart is intended to detect exceptions.

The alternate hypothesis on the table is a particular nonrandom pattern in our data, some type of peak which spans more than a single data point (not just an isolated outlier). Let’s adopt this alternative only if the default hypothesis fails to explain our data, say at p&lt;0.01 (p&lt;0.05 is typical, but let&#039;s be more conservative here since there are several different patterns that would each cause us to reject the default hypothesis).

What test do we use to detect peak-ness?  We may need to improvise one.  The t-test is a good starting point, since our default hypothesis assumes we have a constant but unknown population variance.

Any distribution with unique values has a largest value.  Under our default hypothesis, the points closest to it in time (&quot;Near&quot;) should not have any higher value that the points (&quot;Far&quot;) further from it in time.  We might define our Near category as the half of the points closer in time to our maximum, and our Far category as the half of points further in time from it, excluding the maximum point from either set to keep the test fair, with an expected difference of zero.  Then we can use a one-sided pooled t-test to determine if Near &gt; Far by a significant amount. =TTEST({400,325,360,305},{190,250,255,191,123),1,2) evaluates to 0.001526 for a p-value, or 1 chance in 655 for a metric that extreme for a process that is in a state of statistical control. Unlikely!

Alternatively, we might define &quot;near&quot; as absolute difference and try to fit an inverse v-peak to our data.  Then a linear fit of Y v. ABS(X-Xmax) for our remaining nine points not including Xmax has a t-score of 6.07 for the slope, which has a two-tailed probability of 0.0005 by regression using the Data Analysis add-in.  That&#039;s one chance in 1984 times for our default hypothesis -- really half that likely, since only one tail supports a peak claim.  When only a 1 in 4000 tail of our default model fits the data, it&#039;s probably time to look at a different model, wouldn&#039;t you say?

This isn&#039;t at all likely to be random data -- not in the sense that we define random for an SPC chart.</description>
		<content:encoded><![CDATA[<p>Jon,</p>
<p>I&#8217;d hate to see you fall into absolute skepticism whenever there are only 10 data points.</p>
<p>Based solely on these 10 points, a rather compelling case can be made that we do NOT here have a random sample from a process that is in a state of statistical control.  That conclusion seems meaningful to me, and suggests SPC charting of a fuller dataset would very likely find it to be out of control.  (Or perhaps  that you were presented with a worst case example from a much larger dataset.)</p>
<p>As our friendly default hypothesis, let&#8217;s assume we have your hypothesized stable process where each measurement has only independent random error; this is the hypothesis against which an SPC chart is intended to detect exceptions.</p>
<p>The alternate hypothesis on the table is a particular nonrandom pattern in our data, some type of peak which spans more than a single data point (not just an isolated outlier). Let’s adopt this alternative only if the default hypothesis fails to explain our data, say at p&lt;0.01 (p&lt;0.05 is typical, but let's be more conservative here since there are several different patterns that would each cause us to reject the default hypothesis).</p>
<p>What test do we use to detect peak-ness?  We may need to improvise one.  The t-test is a good starting point, since our default hypothesis assumes we have a constant but unknown population variance.</p>
<p>Any distribution with unique values has a largest value.  Under our default hypothesis, the points closest to it in time ("Near") should not have any higher value that the points ("Far") further from it in time.  We might define our Near category as the half of the points closer in time to our maximum, and our Far category as the half of points further in time from it, excluding the maximum point from either set to keep the test fair, with an expected difference of zero.  Then we can use a one-sided pooled t-test to determine if Near > Far by a significant amount. =TTEST({400,325,360,305},{190,250,255,191,123),1,2) evaluates to 0.001526 for a p-value, or 1 chance in 655 for a metric that extreme for a process that is in a state of statistical control. Unlikely!</p>
<p>Alternatively, we might define &#8220;near&#8221; as absolute difference and try to fit an inverse v-peak to our data.  Then a linear fit of Y v. ABS(X-Xmax) for our remaining nine points not including Xmax has a t-score of 6.07 for the slope, which has a two-tailed probability of 0.0005 by regression using the Data Analysis add-in.  That&#8217;s one chance in 1984 times for our default hypothesis &#8212; really half that likely, since only one tail supports a peak claim.  When only a 1 in 4000 tail of our default model fits the data, it&#8217;s probably time to look at a different model, wouldn&#8217;t you say?</p>
<p>This isn&#8217;t at all likely to be random data &#8212; not in the sense that we define random for an SPC chart.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Peltier</title>
		<link>http://peltiertech.com/WordPress/polynomial-fit-vs-statistical-process-control/comment-page-1/#comment-20859</link>
		<dc:creator>Jon Peltier</dc:creator>
		<pubDate>Thu, 22 Oct 2009 14:50:07 +0000</pubDate>
		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=574#comment-20859</guid>
		<description>Dale -

From these ten points alone, we can say nothing meaningful. I only suggested applying SPC to 80 points (the original 10 and 70 more derived from a population with the same mean and standard deviation as the first 10). Without 70 more valid sample data points, we can speculate on many varied scenarios: is it random, is it a section of a sinusoid relationship, are there intermittent peaks?</description>
		<content:encoded><![CDATA[<p>Dale -</p>
<p>From these ten points alone, we can say nothing meaningful. I only suggested applying SPC to 80 points (the original 10 and 70 more derived from a population with the same mean and standard deviation as the first 10). Without 70 more valid sample data points, we can speculate on many varied scenarios: is it random, is it a section of a sinusoid relationship, are there intermittent peaks?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DaleW</title>
		<link>http://peltiertech.com/WordPress/polynomial-fit-vs-statistical-process-control/comment-page-1/#comment-20856</link>
		<dc:creator>DaleW</dc:creator>
		<pubDate>Thu, 22 Oct 2009 14:16:52 +0000</pubDate>
		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=574#comment-20856</guid>
		<description>TableCurve finds that a Lorentzian peak is a slightly better fit than a Gaussian peak, but I agree with Mike that your starting data, at first glance and with no context, looks more like a peak than a process that is in a state of statistical control.

SPC is not very powerful with only 10 points, and we might be better off using direct hypothesis testing if we can&#039;t see the larger data set.  At the end of your ten points, you&#039;ve got 5 points in a row steadily decreasing.  Assuming no ties, the odds that six random points in a row are sequentially ordered from any given point is two in (1*2*3*4*5*6) or 1 in 360.  One more point would typically be considered an SPC out of control rule violation, and your data set is so small that standard SPC rules, which are certainly a great tool for larger datasets, are too forgiving here.

A look at the larger data set might show just randomness as you suggested.  Or it might show intermittent peaks, or other out of control drift of the mean.  From these ten points alone, not knowing if they were cherry-picked or typical of the larger distribution, shouldn&#039;t we tentatively affirm that this limited evidence favors the existence of a local peak?</description>
		<content:encoded><![CDATA[<p>TableCurve finds that a Lorentzian peak is a slightly better fit than a Gaussian peak, but I agree with Mike that your starting data, at first glance and with no context, looks more like a peak than a process that is in a state of statistical control.</p>
<p>SPC is not very powerful with only 10 points, and we might be better off using direct hypothesis testing if we can&#8217;t see the larger data set.  At the end of your ten points, you&#8217;ve got 5 points in a row steadily decreasing.  Assuming no ties, the odds that six random points in a row are sequentially ordered from any given point is two in (1*2*3*4*5*6) or 1 in 360.  One more point would typically be considered an SPC out of control rule violation, and your data set is so small that standard SPC rules, which are certainly a great tool for larger datasets, are too forgiving here.</p>
<p>A look at the larger data set might show just randomness as you suggested.  Or it might show intermittent peaks, or other out of control drift of the mean.  From these ten points alone, not knowing if they were cherry-picked or typical of the larger distribution, shouldn&#8217;t we tentatively affirm that this limited evidence favors the existence of a local peak?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Peltier</title>
		<link>http://peltiertech.com/WordPress/polynomial-fit-vs-statistical-process-control/comment-page-1/#comment-11723</link>
		<dc:creator>Jon Peltier</dc:creator>
		<pubDate>Mon, 09 Mar 2009 10:19:18 +0000</pubDate>
		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=574#comment-11723</guid>
		<description>Yes, a spline approach could be used to fit the larger data set more closely. This is not really suited to the type of data in the plot, and will lead to overfitting. I generated a string of random numbers from a normal distribution based on the distribution of the original points. Using anything other than a control chart with horizontal mean and control lines (or trending lines) is inappropriate.

It was my assumption to simulate the reader&#039;s many more weeks of data with a random process. From my experience, splines are good for fitting data with much less randomness and a more systematic and meaningful behavior in its variation.</description>
		<content:encoded><![CDATA[<p>Yes, a spline approach could be used to fit the larger data set more closely. This is not really suited to the type of data in the plot, and will lead to overfitting. I generated a string of random numbers from a normal distribution based on the distribution of the original points. Using anything other than a control chart with horizontal mean and control lines (or trending lines) is inappropriate.</p>
<p>It was my assumption to simulate the reader&#8217;s many more weeks of data with a random process. From my experience, splines are good for fitting data with much less randomness and a more systematic and meaningful behavior in its variation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Will Dwinnell</title>
		<link>http://peltiertech.com/WordPress/polynomial-fit-vs-statistical-process-control/comment-page-1/#comment-11706</link>
		<dc:creator>Will Dwinnell</dc:creator>
		<pubDate>Mon, 09 Mar 2009 01:43:42 +0000</pubDate>
		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=574#comment-11706</guid>
		<description>It is possible to fit splines sets of connected simple curves to data (typically collections of cubic polynomials).  Your pair of linear fits would be a linear spline.  Almost always, spline fitting is performed as going exactly through the given data points, but this is not necessary.  A trend for your 80-point series could probably be reasonably fit using 6 or 7 spline &quot;knots&quot; (places where the simple curves are connected.</description>
		<content:encoded><![CDATA[<p>It is possible to fit splines sets of connected simple curves to data (typically collections of cubic polynomials).  Your pair of linear fits would be a linear spline.  Almost always, spline fitting is performed as going exactly through the given data points, but this is not necessary.  A trend for your 80-point series could probably be reasonably fit using 6 or 7 spline &#8220;knots&#8221; (places where the simple curves are connected.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

