In Calculation Bug Fixed, an old but ongoing thread in the Daily Dose of Excel blog, I mentioned a problem with Excel 2007’s trendline regression formulas. Apparently small regression coefficients were treated like small errors that occur in the insignificant digits when converting from binary to decimal. I had documented the error in May of 2007, and while following up in Daily Dose, I tested in Excel 2007 SP1, and the error has been corrected. I’m here not to raise unnecessary alarms among users who have already updated to SP1, but to report correction of this problem, and to describe the errors for those who have still not updated.
The problem is illustrated with datasets having small Y values. Here is a table that uses the same X values and Y values that differ by their power of ten:
X | Y1 | Y2 | Y3 | Y4 | Y5 | Y6 | Y7 | Y8 |
0.1111 | 2.872E-19 | 2.872E-18 | 2.872E-17 | 2.872E-16 | 2.872E-15 | 2.872E-14 | 2.872E-13 | 2.872E-12 |
0.0625 | 4.043E-19 | 4.043E-18 | 4.043E-17 | 4.043E-16 | 4.043E-15 | 4.043E-14 | 4.043E-13 | 4.043E-12 |
0.0400 | 4.576E-19 | 4.576E-18 | 4.576E-17 | 4.576E-16 | 4.576E-15 | 4.576E-14 | 4.576E-13 | 4.576E-12 |
0.0277 | 4.814E-19 | 4.814E-18 | 4.814E-17 | 4.814E-16 | 4.814E-15 | 4.814E-14 | 4.814E-13 | 4.814E-12 |
The data is charted below, on a semilog plot to get all curves in the plot on the same basis, with trendline formulas next to each curve.
The regression coefficients are reproduced below.
Magnitude | Intercept | Slope | Correlation (R²) |
E-12 | 5.49429E-12 | -2.35066E-11 | 0.999213 |
E-13 | 5.49429E-13 | -2.35066E-12 | 0.999213 |
E-14 | 5.49429E-14 | -2.35066E-13 | 0.999213 |
E-15 | -2.35066E-14 | 0.999213 | |
E-16 | 5.49429E-16 | 0.999213 | |
E-17 | 5.49429E-17 | 0.999213 | |
E-18 | 5.49429E-18 | 0.999213 | |
E-19 | 5.49429E-19 | 0.999213 |
All of the R² values are the same, and the slope and intercept values differ only by their power of ten, where they appear in the formula. Apparently the algorithm that removes binary to decimal conversion errors overzealously removes some of the regression coefficients in data sets with small values.
The coefficients calculated using SLOPE(), INTERCEPT(), CORREL(), and LINEST() are not affected by this overcorrection.
Patrick says
I’m very glad you posted this…I was beginning to think I was the only person with this problem. I triple-checked my numbers, and they just weren’t making sense. Well-documented. Let’s see if SP1 does it.