In Graphical Approach to a Simple Physics Problem I discussed my daughter’s physics homework. The problem was to determine the relationships between how long it takes water to drain from a hole in the bottom of a bucket, the diameter of this hole, and the initial height of water in the bucket. Given this relationship, the time to drain an initial height of 20 through a hole of diameter 4 was to be computed. The data is shown in this table.
The first step is to plot the data using XY charts. The charts below show time to drain vs height for various hole sizes and time to drain vs hole size for various diameters.
The regression approach can be followed in Excel using the LINEST worksheet function. The help topic for LINEST is pretty good, so I would direct you there first if you have questions.
LINEST Fit 1
I set up the following table based on the initial data. Knowing that we have height0.5 and diameter-2 dependencies on time to drain the bucket, I calculated the transformations in columns C and D. The LINEST regression output is in C22:E26.
To compute the regression output, select a range 5 rows high and N+1 columns wide, where N is the number of X variables used. We have two X variables, 1/dia² and sqrt(ht), so we need a range three columns wide. With the range selected, enter the LINEST formula, and use CTRL+SHIFT+ENTER to accept the formula, not just ENTER, because the output is an array. The specific formula used here is:
=LINEST(E2:E17,C2:D17,,TRUE)
where E2:E17 contains the Y values, C2:D17 contains two columns of X values, the blank third argument tells Excel to calculate the constant normally, and TRUE tells Excel to provide the extra rows of statistical parameters, even though I will pretty much ignore them.
Note that when the LINEST formula contains no syntax error and is properly entered using CTRL+SHIFT+ENTER, Excel wraps it in curly braces to indicate that it is an array formula. If you try typing these curly braces yourself, you will only cause an error.
{=LINEST(E2:E17,C2:D17,,TRUE)}
E22 contains the constant, D22 contains the coefficient for 1/dia², and C22 contains the coefficient for sqrt(ht). The predicted value in F2 uses this formula:
=E$22+D$22*C2+C$22*D2
This formula is filled down to F17. Row 19 shows the prediction from the regression formula for intermediate values of diameter and height. The time of 15.4 is much different than the graphical interpolation of 8.5 found in Graphical Approach to a Simple Physics Problem.
The predicted values don’t match the actual values too closely, but let’s plot them. The following charts use markers and dotted lines for predictions and solid lines for actual values.
Not very good fits at all, and there are some negative values (are we filling the bucket up in these cases?).
The log-log plots below don’t even have the appropriate shapes.
These plots illustrate out problem. The regression calculated a single coefficient (slope) for each of the X variables, and the terms are added together. The consequence of the single slope and added terms is that all curves have the same slope, and the predicted lines pass through the averages of the actual lines. This is the wrong constraint for this problem.
LINEST Fit 2
After the previous fiasco I brushed off myself and my bruised ego, and started again. This time I ignored the powers I’d imposed on diameter and height, and simply took logs of all variables. The LINEST formula is:
=LINEST(F2:F17,D2:E17,,TRUE)
I could actually have skipped the logarithmic transformations in columns D:F and used this LINEST formula:
=LINEST(LOG(C2:C17),LOG(A2:B17),,TRUE)
Remember to select the formula’s output range, type the formula, and array-enter it by holding CTRL+SHIFT while pressing ENTER. Excel puts curly braces around the formula if it’s syntactically correct.
The predictions in column G use this formula:
=F$22+E$22*D2+D$22*E2
These predictions are the logarithms of the times to drain, so they must be converted to their antilogs using:
=10^G2
These predictions are pretty close to the actual values. The prediction for the arbitrary starting values in row 19, 8.5, agrees with the graphical interpolation from Graphical Approach to a Simple Physics Problem. Let’s go to the charts, which use markers and dotted lines for predictions and solid lines for actual values.
The fitting coefficients for log(height) and log(diameter) are constant, so we get parallel lines in the log-log charts.
The corresponding straight lines for time vs. swrt(ht) and 1/diam² are no longer parallel, but instead they converge on the origin.
When the regression is done properly, its results agree closely with those from a simple graphical interpolation. In the next post on this topic, we will look at optimization approaches to determining these relationships.
V. Murugaiah says
Dear Sir,
How do I make a calibration for concentraion of analyte vs absorbance respone with a weighting of 1/concentration. Thank you