## Linear Regression

People are familiar with a standard linear regression (LR) method that makes assumptions about the X and Y variables in the analysis. Excel’s built-in regression methods make these same assumptions.

The X variable, also called the independent variable, is assumed to be known precisely, and all error in the regression is assumed to be in the measurement of the Y variable, the dependent variable.

This assumption is reasonable enough. But there are cases where errors in the values of X cannot be discounted. For example, X and Y may be comparisons of the output of two different machines for given inputs. When the outputs are compared, both machines are subject to errors in measurement, so the usual assumptions are invalid.

## Deming Regression Example

Suppose Machines A and B had the following output. At each nominal output value, the results of two runs of each machine are tabulated. There is some variability in measurements of each machine’s output. The averages of each machine at each nominal output level are shown next to the individual measurements.

We may be tempted to plot each machine’s outputs as Y values against the nominal X values.

We may even compute the lines of best fit for the two machines. We can see that they are close, but not exactly the same. The problem with this analysis is that we are not directly regressing one machine against the other.

In the Deming Regression, outputs of the two machines are plotted against each other. Pairs of actual outputs are measured for each nominal output, and the differences between the two measurements from each machine will be used to estimate the variation of each machine’s outputs. The averages of each pair form the basis of the regression, corrected for the machine variation.

The table below shows three sets of calculations. The first row shows a standard linear regression of the measured data. The second row shows a standard linear regression performed on the averages of each pair of values. The third shows the Deming regression analysis. Details of the Deming regression calculations are not given here, but a motivated reader can check the links in the references at the end of this article.

The data are not too different, and in fact the correlation in the LR of the average values and of the Deming regression are identical. In the plot below, lines have been added to show the three sets of results in the table above, as well as the 45 degree line, where Machine B = Machine A. The lines are practically on top of each other.

In the example above, there is little difference in the output of the various LR models. However, if Machines A and B exhibit greater variation in their output, there can be noticeable differences.

## Deming Regression Utility

I’ve created a small utility that facilitates Deming Regression analysis in Excel. Read about it and follow the download link in Deming Regression Utility.

## Regression and Trendline Articles in this Blog

- Trendlines and Chart Types in Excel
- Add One Trendline for Multiple Series
- Trendline Calculator for Multiple Series
- Trendline Fitting Errors
- Choosing a Trendline Type
- Plot Two Time Series and Trendlines with Different Dates
- Polynomial Fit vs. Statistical Process Control
- Moving Averages
- Stacked Column Chart with Stacked Trendlines
- Deming Regression Utility
- LOESS Smoothing in Excel
- LOESS Utility for Excel

## References

Chapter 17: Creating Custom Functions, Excel for Chemists: A Comprehensive Guide, 2nd Edition. by E. Joseph Billo, Copyright 2001 by John Wiley & Sons, Inc.

Incorrect Least-Squares Regression Coefficients in Method-Comparison Analysis by P. Joanne Cornbleet and Nathan Gochman, Clin. Chem. 25/3, 432-438 (1979).