In Getting value on Y axis by putting X axis value on the Mr Excel forum, someone wanted to know how to find in-between values of a function, given some known data points. The approach, of course, is to interpolate values given the known points on either side of the value you need.
Interpolation requires some simple algebra. The diagram below shows two points (blue diamonds connected by a blue line) with coordinates (X1, Y1) and (X2, Y2). We need to find the value of Y corresponding to a given X, represented by the red square at (X, Y).
The smaller triangle with hypotenuse (X1, Y1)-(X, Y) is “similar” to the larger triangle with hypotenuse (X1, Y1)-(X2, Y2), so the sides of the triangles are proportionally sized, leading to the first equation below the sketch. We rearrange this to solve for Y, in the second equation.
We’ll set up our interpolation in the example below. Our data is in A5:B18, and the known values are plotted as blue diamonds connected by blue lines in the chart.
The analysis has two parts: first we need to determine which pair of points to interpolate between, second we need to do the interpolation. We will judge the validity of our interpolation by plotting the calculated point on the same chart.
Solving for Y
I’ve put the calculations above the data table. The yellow shaded cell, A2, holds the known X value, and a formula in cell B2 holds the calculated Y value. Cell A3 indicates which pair of points to interpolate between. The formulas are:
A3: =MATCH(A2,A6:A18)
B2: =INDEX(B6:B18,A3) + (A2-INDEX(A6:A18,A3)) * (INDEX(B6:B18,A3+1)-INDEX(B6:B18,A3)) / (INDEX(A6:A18,A3+1)-INDEX(A6:A18,A3))
We want the Gauge value (Y) when the Flow value (X) equals 3, so this is entered into the yellow shaded cell A2. The formula in A3 tells us that our computed point is between the 7th and 8th data point, and the formula in B2 calculate Y=0.444, and the calculated point (A2, B2) is the red square that lies along the plotted data points. Looks good.
Te determine the Gauge value for a Flow value of 12, we enter this into A2. The red square moves along the blue line past the 11th point, where Gauge=0.548.
Solving for X
With a minor rearrangement, we can instead solve for Flow, given a value for Gauge. The known Gauge value is entered into B2, shaded yellow. B3 indicates the pair of points to interpolate between, and A2 provides the value for Flow. The formulas are:
B3: =MATCH(B2,B6:B18)
A2: =INDEX(A6:A18,B3) + (B2-INDEX(B6:B18,B3)) * (INDEX(A6:A18,B3+1)-INDEX(A6:A18,B3)) / (INDEX(B6:B18,B3+1)-INDEX(B6:B18,B3))
For a Gauge (Y) of 0.53, we compute a Flow (X) of 9.790. The red square shows where the calculate point lies along our plotted data.
Changing the Gauge value to 0.35 moves the red square way to the left, to a Flow value of 0.400.
Calvin Graham says
As they say on quiz shows, these things are only easy if you know the answer.
All that’s missing now is an extra series plotting the horizontal and vertical lines to the axis’s. If you then add the label to the y-intercept points and fix the formatting, you can have a nice box where the user enters a number and the graph shows/tells the y-value
Jon Peltier says
Calvin –
All you need to do is put error bars, with percentage of 100 in negative X and Y directions, to reach the axes. Then add a data label.
Dick Kusleika says
What’s the difference between your interpolation formula and FORECAST. Using FORECAST, I get slightly different numbers, but I’m not sure why.
=FORECAST(A2,OFFSET(INDEX(B6:B18,MATCH(A2,A6:A18)),0,0,2,1),OFFSET(INDEX(A6:A18,MATCH(A2,A6:A18)),0,0,2,1))
Jon Peltier says
Dick –
I copied your formula exactly and pasted it into my sheet, then tried a bunch of X values. In all cases, the Y values were identical, whether I used your formula or mine. Of course, this is in Excel 2003, back when Excel was taught math.
Eric says
Is it possible to pull the equation from Trendlines out and use it to extrapolate? So if the best r squared is a polynomial function we could use the trendline equation to extrapolate instead of doing the linear assumption between two points?
Just wondering…
Dick Kusleika says
Oh, that makes me feel better. I hand typed your values and I’ll be there was some precision that was hidden.
Jon Peltier says
Dick – Guess you’re not a very accurate typist.
Jon Peltier says
Eric –
You can use LINEST to compute the coefficients in the worksheet. See Bernard Liengme’s tutorial: Polynomial regression: How can I fit my X, Y data to a polynomial using LINEST?.
That said, you shouldn’t just use R² blindly to decide which fit is best. Beyond order 2, a polynomial fit may have a great R², but it also might deviate wildly from the smooth curve you might draw by hand. I’ve written about these issues in Choosing a Trendline Type, Trendline Fitting Errors, and Polynomial Fit vs. Statistical Process Control.
Eric says
Well, yes, of course. I guess my point was badly made and my real question badly asked.
If the data turns out to be just badly sampled, say it was some high frequency sine wave and you happen to sample at the right points, your data could look like a flat line would be the best fit line. So perhaps a linear estimation of the point you want to estimate, given the poorly sampled data, would not be the best way of going at it. 99 times out of a 100, though, it probably will.
The nice thing about trendlines is you can see how good your fit is to the data fairly quickly. We could use all sorts of ways to more accurately figure out the function that the data actually seems to conform to, but Trendlines seems to do this quickly if not precisely.
Is there some Trendline object that we can get at to get that nice little formula that you can have put onto your chart when you use Trendlines? I since have played with it and found that there is… ActiveChart.SeriesCollection(1).Trendlines(1).DataLabel.Text is what I was looking for. As it is text, you’d have to parse it and so it might be easier to do the other methods you mentioned, but it might be doable.
Anyways, that’s what I was badly referring to.
ikkeman says
why not use trend?
I use it excessively. use index(match()):index(match()+1) for the X and Y range and you’re done.
the error bars are a new one – I usually set up a start and end point, but this can be a bother when your axis aren’t at zero.
ikkeman says
after posting the trend I wondered:
I’ve spend many hours trying to find a nice, concise way to do quadratic or cubic interpolation.
Of course it’s possible to use linest() to find the formula variables… but is there another way?
btw, another nice feature of the trend function is that is allowas you to do multiple variable linear interpolation.
When you have a graph with multiple lines for different conditions (say your flow/guage ratio is dependent on temperature. You can find the four points bracketing your search condition using the match() function, and then input a 2×2 matrix as you input X, and a 1×2 as your target X and watch the magic happen!
Calvin Graham says
Using minus error bars at 100% to show the intercept with the axis? That’s awesome, never thought of that!
lhem says
Ikkeman – Below are some Excel formulas for various types of fit (see Wikipedia interpolation article for more details). These return Y for a given X value, to find X given Y we can just swap the A’s and B’s in the formulas.
As Jon has said before, despite their popularity polynomial trends are a poor choice for fitting many types of data. The other methods are quite close for the given data as the direction of the curve is not changing much around the estimated points.
Interp Trend Linear Cubic Spline
9.789877 0.53174 0.53000 0.53074 0.53025
0.400368 0.34602 0.35000 0.35336 0.35096
Trend – Cubic polynomial (equivalent to polynomial trend line order 3):
=TREND(B6:B18,A6:A18^{1,2,3},A2^{1,2,3})
Linear – For increasing data (or for decreasing put 1- after first comma):
=PERCENTILE(B6:B18,PERCENTRANK(A6:A18,A2,20))
Legendre – A cubic fit based on the surrounding 4 points:*
=TREND(OFFSET(B6,MATCH(A2,A6:A18)-2,,4),OFFSET(A6,MATCH(A2,A6:A18)-2,,4)^{1,2,3},A2^{1,2,3})
Spline – Catmull-Rom (equivalent to “smooth line” option):*
=SUM((1+1/IRR(MMULT({0,0,2,0;0,1,0,-1;-1,4,-5,2;1,-3,3,-1},OFFSET(A6,MATCH(A2,A6:A18)-2,,4)-A2
)))^-{0;1;2;3}*MMULT({0,2,0,0;-1,0,1,0;2,-5,4,-1;-1,3,-3,1},OFFSET(B6,MATCH(A2,A6:A18)-2,,4)))/2
* for estimation in end intervals you can use extrapolated points at each end:
A5:B5 = {-0.10022,0.29} A19:B19 = {21.82408,0.616791}
DaleW says
Ihem – thanks for sharing that compact matrix formula for one type of cubic spline. That may be the most impressive non-array formula that I’ve ever seen in Excel. I’d never thought of using the Internal Rate of Return function for fitting non-financial data.
If anyone ever wants a smoother interpolation than piecewise linear, cubic splines are usually the way to go. Fortunately there are good free add-ins that make cubic splines easy to calculate in Excel.
lhem says
DaleW – Happy to share, took a while to figure out! It’s a little known fact that you can convert nearly any array formula to a non-array formula with MMULT, eg
Array =SUM(IF(A1:A5>2,B1:B5)) Non-Array =SUM(IF(MMULT(N(A1:A5>2),1),B1:B5))
The straight/smooth chart line options have the property that they maintain the same shape if the configuration of points is rotated. This can be seen in the formulas since X on Y followed by Y on X returns the original result with the linear/spline options, but not for trend or Lagrange polynomials (not Legendre, always get those guys mixed up!) VBA code for chart data is here
Splines have the intuitive appeal that they model the shape taken up by a smooth thin flexible strip passing through the set of points. In the linear case, the strip is pulled taught whereas in the cubic case there is no tension at the endpoints.
DaleW says
Ihem – Interesting, I’d never seen MMULT() used to replace an array formula. In real life, we’d both use =SUMIF() for your example to have error checking, as otherwise a string entry in column A gives what we’d typically consider a bug. MMULT does duplicate even such … features.
Natural cubic splines or Catmull-Rom cubic splines for exactly 4 points would follow your zero tension at the endpoints analogy. Other interpolating cubic splines have tension. Cardinal splines use piecewise cubic polynomials with a tension parameter ranging from 0 (for Catmull-Rom) to 1 (where they collapse to piecewise linear interpolation). Thus Cardinal splines can replicate straight/smooth lines, or anywhere in between.
lhem says
DaleW – You make excellent points. For the MMULT method, a better example would be to apply to: =MIN(IF(A1:A5=”a”,B1:B5)). I don’t know of another way to make this type of formula non-array entry, even using INDEX(A1:A5,0) fails.
Calvin Graham says
I find array formulae should be avoided ilke the plague if there’s even a remote possibility anyone else will be using the spreadsheet because you’ll end up tech-supporting it when they hit enter on a cell. Either redesign with extra columns if possible and if not then write a user defined function in one of the modules.
luu says
what about if you have columns and rows. for example 6,8,10( in the column) and 8,10,12 (row). and under those is an array of set of values. like if i want to canculate
8 and 10.4
Jon Peltier says
Luu –
If the two values are between known values (say, 8.4 and 10.4), you have to interpolate first to get values at 8.4 for 10 and for 12 (two interpolations), then interpolate again between these results to get a value at 10.4.
Daniel says
Hi,
I’m no mathematician but i have tried my best to follow your logic………..
I was wondering if there was a way to Interpolate a point on a line between 2 known points. Not knowing the X or Y value? Example of data:
X Y
1 6.3 2.12
2 7.6 2.19
3 8.6 2.19
4 9.6 2.16
On a Scatter with Smooth Lines & Markers graph the peak of the line will be between point 2 & 3. Is there a way to Interpolate that point? (manually graphing it would give me an interpolation point of X=8.0 & Y=2.20) (X rounded to 0.5)
I appreciate your assistance in this matter!
Daniel
Jon Peltier says
Daniel –
You want to find where the smoothed drawn line has its maximum? I usually advise people not to use smoothed lines. They give a false indication of where data may go. They are not based on any physical model of the data, but only on artistically connecting all points in a pretty way, using curve-drawing parameters that “looked nice” to the same kind of designer that gave us Excel’s early color palette. Smoothed lines ignore any measurement error in the data which is actually present, they ignore all points in the data set other than the two being connected and the point on either side, and if markers are not displayed the viewer has no idea where the actual data lies.
That said, for symmetric points, the usual approach is to fit a quadratic curve (2d order polynomial) to the points, and calculate a maximum using the fitted coefficients. A quadratic fit is probably reasonable, since physical processes often can be modeled by linear and quadratic relationships.
Your points are not symmetric, so the fitted parabola does not pass through all the points; instead it lies in such a way as to minimize the total deviation of the parabola from the data points. You could fit a third order polynomial to your data and calculate your maximum, but this starts to get into the realm of overfitting.
DaleW says
Daniel,
Realistically, close to X=8 is your best guess for the maximum.
If you had really accurate measurements, an approach known as cubic splines would be great for fitting a smooth curve exactly to your data. In fact, the smooth lines that Excel charts draw are one type of cubic spline.
With four points, Excel can fit a cubic (3rd order) polynomial smoothly and exactly through the points. If you play with that equation, you’ll see it predicts the maximum at just over 8.08, and some people would consider that the best answer. (It’s really close to the natural cubic spline solution, also.)
Jon’s right that anything past a linear fit without a physical model or lots of supporting data is risky or wishful thinking. But if we are convinced that the data is highly accurate and don’t have a model, cubic splines are commonly used to create the smooth interpolating curve. In Excel, if we have two nearest neighbor points on each side of the region of interest, a cubic polynomial tends to be a very close approximation to the cubic spline fit there.
Tshepo Phutha says
Hi Jon
I have you example above to solve for X using data below and it does not work. I followed your example and the subtituted with my data and it gives me an error message on A2. Any idea why?
X Y
0.12 3
0.15 2.67
0.18 2.45
0.22 2.24
0.28 2.02
0.3 1.94
0.35 1.82
0.38 1.74
0.46 1.6
0.54 1.48
0.6 1.42
0.68 1.34
0.78 1.26
0.9 1.19
1.05 1.12
1.2 1.06
1.4 1
Jon Peltier says
Tshepo –
Did you change the formulas to include your whole data range (mine stopped at row 18, yours continues to row 22)?
Cell A2 is where you enter an X value for which you want to calculate a Y value. It should not contain an error. What X value were you entering?
Was the error in cell A3? Was it because the requested X value was outside of the range of your X data?
A more effective approach for your data might be a power law fit: such a fit with your data has an R² value of 0.9991.
Tshepo Phutha says
Hi Jon
I did expand data to row 22 and I am trying to solve X not Y. I get an error on B3 and A2 but when I use data from your example it work fine.
Jon Peltier says
Tshepo –
So:
B2 has the known Y
B3 =MATCH(B2,B6:B22)
A2 =INDEX(A6:A22,B3) + (B2-INDEX(B6:B22,B3)) * (INDEX(A6:A22,B3+1)-INDEX(A6:A22,B3)) / (INDEX(B6:B22,B3+1)-INDEX(B6:B22,B3))
But my formula assumed the range being matched was monotonically increasing, which I never stated. If I include the default value of the optional third parameter, the formula is
=MATCH(B2,B6:B22,1)
which tells Excel the data is increasing and to return the index the closest value greater than the sought value. Your formula needs to be
=MATCH(B2,B6:B22,-1)
which tells Excel the data is decreasing and to return the index the closest value less than the sought value.
I still suggest using a power law fit.
Tshepo Phutha says
Jon
Thank you very much for help. I am working on learning on how power fit works.
Thank you very much
CharlieL says
Hi Jon,
I’m also finding a very strange error using the spline interpolation formula. I am interpolating the following set of data:
x y
1.798179868 165.3955251
1.804183302 165.0206621
2.230427067 139.8702098
2.428540367 113.9704961
2.5966365 65.08009679
2.752725766 21.02938029
2.842777266 25.76285129
2.932828766 20.8522448
3.022880265 47.45262432
3.148952365 30.61012109
3.275024465 26.35386096
3.443120598 15.5813369
3.869364363 -31.93964513
4.031457063 -44.2596589
4.133515429 -59.06686391
4.235573796 -77.81866744
4.313618429 -109.5812402
4.391663062 -124.4431029
4.469707695 -185.7945248
4.547752328 -209.4166568
4.601783228 -224.5765144
4.667820995 -267.6385983
4.733858761 -341.6070167
4.787889661 -402.0504547
4.841920561 -424.5273038
4.895951461 -484.5497172
4.949982361 -513.475427
5.376226126 -645.3444851
5.394236426 -648.4536159
5.40023986 -650.1201883
When I interpolate, I find that the following x values generate a #NUM! error:
4.901954894
4.901954894
4.901954894
All the other values I am trying (less than and/or greater than these, and one in between the first two) seem to interpolate fine.
The x values at which I am trying to interpolate are part of series of data. I am doing a calculation for some y values, and then interpolating the remaining x values. When I make small changes to the x value range, or use a different number of calculated points, there is often no problem at all. Like the example above, the y values follow a generally decreasing trend, but this is not always the case and y may go up and down. The x values are always monotonically increasing.
Let me know if you want any additional info. Thanks for any help you can provide in diagnosing this. It is a very vexing problem!
-Charlie
CharlieL says
I forgot to mention above that, when I switched to the “Legendre” interpolation formula, THE EXACT SAME POINTS produced the same #NUM! error.
Maybe this is a hint about the source of the problem…
-Charlie
CharlieL says
Ooops, a correction… the “Legendre” formula seems to work just fine. I discovered an error that I had made when entering it. The Spline still does not work, however.
I have created a spreadsheet with the data mentioned above, and a correction to the “problem” data points to help diagnose the problem. It contains the SPLINE and LEGENDRE interpolation formulas, and shows that the SPLINE fails when one of the problem x values is entered.
You can download it at:
Thanks for any help!!!!
-Charlie
DaleW says
Hi CharlieL,
Alas, this particular formula for this particular cubic spline indeed fails for part of your dataset.
That segment of your Catmull-Rom spline happens not to be monotonic for predicting Y from X. This causes the IRR() formula to fail rather interestingly in at least 5 subintervals of that segment’s range in X, for X values that are monovalent instead of trivalent!?
While I was able to calculate and chart the real Catmull-Rom spline directly from its defnition as a t-function-weighted sum of the two nearest points in each direction in X, you possibly don’t want a spline that behaves this way to fit your dataset. Perhaps a natural cubic spline or a monotonically constrained cubic hermite spline would be a better choice? The Newton Excel Bach blog has done a wonderful job setting up VBA code to let Excel users explore a variety of different cubic splines.
DaleW says
PS — Far worse than not being monotonic, it’s not monovalent. This type of spline can reverse direction in a way that gives multiple Y values for the same X value, and the IRR formula doesn’t work in some subsections of segments with that behavior. Wrong spline if you need a unique Y=f(X) fit.
CharlieL says
Hi DaleW,
Thanks for posting an explanation of the problem. I do need a worksheet-formula-only way to do the interpolation (e.g. no VBA or macros), and I would like a smooth curve a la spline or similar. For now I have switched to the Legendre formula, and that seems to work OK for the type of data I am generating. I am still learning about how to implement matrix multiplication via the MMULT function, and I am not up to coming up with something new myself at this point, although I would like to learn how to do it. Also, it’s not clear to me whether the problem that I described in my post was due to a limitation of the Catmull-Rom spline itself, or the implementation via the formula. Can you please elaborate on this?
I need the interpolated line to go through the points in the data set. You mentioned that this is the “wrong” spline – how about a B-Spline? For instance, the Uniform Cubic B-Spline described here:
http://en.wikipedia.org/wiki/B-spline#Uniform_cubic_B-splines
has an explicit matrix form that seems to lend itself well to this approach (from what I can tell). Can you (or someone else reading this post) help me implement it using IIR and MMULT?
Thanks!
-Charlie
DaleW says
Charlie,
B-Splines don’t go through their control points, which is inconvenient when you want to interpolate actual data.
Catmull-Rom splines do interpolate their control points, but they aren’t constrained by the concept that there should be one and only one Y value for each X value. I believe some IRR() problems have more than one rate of return that solves the constraints, but a “well-defined” mathematical function can only return one value for a given value of X. Thus IRR() fails to capture parts of some Catmull-Rom spline segments, which a weighted sum of the control points in terms of a t parameter going from 0 to 1 in each segment does reveal.
Assuming you’d like a “well-defined” and smooth continuous interpolating function for Y on X, I would recommend the classic solution of natural cubic splines. Usually this is implemented in programming languages, and I’ve got more than one VBA solution squirreled away. I don’t imagine that solving the sparse symmetric matrix equations for under 50 points would be that hard to program using the cells of a spreadsheet, if one were so inclined.
The Legendre formula is the cubic polynomial solution, or the simplest natural cubic spline case. The Legendre formula will not give you continuous derivatives if you use it for more than 4 total points. In one mathematical sense, natural cubic splines offer the smoothest possible interpolation. Natural cubic splines aren’t likely to be used for graphics because it isn’t a local solution like Legendre or Catmull-Rom, and every point instead of just 4 local points can have some impact every segment of the spline, making it more computationally intensive.
CharlieL says
Hi DaleW,
Now that I checked into it further, you are correct about B-splines not passing though each data point. A cubic Spline would be the best but my data set can be up to 600 points, and doing the large matrix for that would be too much overhead. The Lagrange interpolation seems to be “good enough” for me, despite the occasional cusp in the interpolation where there is a derivative discontinuity.
-Charlie
DaleW says
Charlie,
Good enough!
Incidentally, your dataset can be used to show that Excel must modify the Catmull-Rom spline algorithm to draw its smoothed lines, because Excel smoothed lines usually overlap that spline yet avoid the pathology that you found. As Answers.Microsoft notes, “An adjustment to the basic method is applied if consecutive points are a factor of three or more apart so as to reduce the amount by which the curve overshoots points.” I interpret this as a sort of false position method, where any outer control point that further that 3x the distance between the inner control points is pulled closer. Thus on your troublesome (X,Y) interval between (4.895951461, -484.5497172) and (4.949982361, -513.475427), the next point is treated as if it were closer at (5.2305,-600.2523) for the purposes of the Catmull-Rom algorithm on that segment only, which removes the strange behavior and restores single-valued functionality and lets the IRR() formula work, as well as duplicates the Excel smoothed line exactly on my plot. This might be more trouble than it is worth for you to duplicate Excel’s native interpolating spline — but it would be a lot easier than solving the tridiagonal matrix equation for natural cubic splines. You’d just need to create a table of the 4 XY control points for each of N-3 inner segments where you say your full table is N=600, and the Catmull-Rom OFFSET logic would have to point to that table. This would provide C1 continuity (of 1st derivatives), but not the C2 continuity (of 2nd derivatives also) that is provided by natural cubic splines. Your Legendre solution only has C0 continuity (of the function itself) at the data points (but C2 continuity elsewhere like all cubic splines within each segment, not necessarily between segments).
BTW, are you sure you want spline interpolation instead of LOESS smoothing?
Jon Peltier says
Dale –
Brian Murphy of UT Austin worked out bezier algorithms that seem to have been used for smoothing series lines in Excel charts. I think the details can be found at xlrotor.com.
DaleW says
Jon – yes, I saw that. I believe that the Bezier algorithms may be very similar, but MS Excel actually uses modified Catmull-Rom splines, per the July 4, 2011 post by Answers.Microsoft in response to Tushar Mehta. The VBA code is there, but I think that I did duplicate quickly the Excel charting smooth curve for this troublesome case by using a false position for a far outer control point as I tried to describe using both Catmull-Rom formulas (the quick IRR() one discussed on your blog, and the formal definition in terms of weighted sums of the 4 basis points using t-parameter polynomials).
DaleW says
Charlie (and Jon),
Those Bezier algorithms match Excel’s smoothing to within a few pixels for the troublesome segment. They are in VBA and parametric, so it might take quite a bit of work to implement them to evaluate a particular X value without VBA.
Without using VBA, I picked up on your idea and implemented Ihem’s IRR() and MMULT() formula for Catmull Rom with more or less the modification suggested by Lori Miller for highly unevenly spaced data, and it looks good to within a pixel or two compared to Excel’s native curve fitting, even for your segment where simple Catmull Rom misbehaved. The key formula to find the adjusted spline Y for any X had to be switched to an array formula to work for me:
{=SUM((1+1/IRR(MMULT({0,0,2,0;0,1,0,-1;-1,4,-5,2;1,-3,3,-1},(TRANSPOSE(OFFSET(xQuad,$Q5-1,0,1,4))-$O5))))^-{0;1;2;3}*MMULT({0,2,0,0;-1,0,1,0;2,-5,4,-1;-1,3,-3,1},TRANSPOSE(OFFSET(yQuad,$Q5-1,0,1,4))))/2}
The transpose was necessary with matrix logic to accommodate a table (starting at xQuad and then yQuad) of possibly linear-interpolation adjusted outer influence points created by IF logic, with 4 XY points defined in 8 columns shortly after each inner interval as defined by the original XY data columns in A & B. None of this extra effort is needed if the X data is roughly evenly spaced, but your data wasn’t. The Q column holds the Index from Match(), again with slight modification for the endpoints. The O column held the arbitrary X points to evaluate.
Lori Miller says
CharleyL – Interesting example and glad the suggested formulas have been useful to you. Strangely, in my Excel 2010 version the spline formula does actually return values for the cases highlighted but further investigation reveals this is due to improvements in the IRR implementation in this version. I find the formula matches the curve for the most part but there are a few spurious values where it is picking up another root. An alternative is to use the more natural version of the formula with 1/IRR(MMULT({0,0,2,0;0,1,0,-1;-1,4,-5,2;1,-3,3,-1} replaced by IRR(MMULT({0,2,0,0;-1,0,1,0;2,-5,4,-1;-1,3,-3,1} which does return the correct values for your example. I seem to remember the point of using 1/IRR of the reversed matrix was related to avoiding 0^0 errors when the formula was evaluated exactly at a data point but these can be easily dealt with by adding a negligible constant like 1E-99. Note also the formula is for a 2D spline but could also be modified to a 1D version which would ensure single-valued results. In many cases where data is fairly evenly spaced there won’t be a large difference between these two methods.
DaleW – Nice analysis. The modification that the charting algorithm makes can be seen in various ways. Suppose there are four consecutive points A,B,C,D and you want to construct the curve segment between B and C. In terms of Hermite splines the tension parameter implemented by the chart algorithm is chosen to be t=max(3,|AC|/|BC| ,|BD|/|BC|)/6 (based on this definition of tension and where |.| denotes distance between points as measured on the chart ). In terms of Bezier curves this condition enforces that the control points are never more than a distance |BC|/2 from B or C so do not cross each other. For most normal cases where points are fairly evenly spaced this reduces to the Catmull-Rom case (t=1/2) . Doug Jenkins has many more useful posts and VBA functions on these topics which are well worth looking at as you say – there are a few formulas and comments related to the connection with natural splines are here.
Jon – The xlrotor code is normally fairly close but can deviate quite substantially in some cases. For example entering these (x,y) values: (9,8) (10,12) (11,12) (14,6) in the Bezier example file link gives clearly incorrect values for 10 < x < 11. The vba code posted at answers.microsoft may be the first accurate algorithm available publicly.
Rajib says
Hi Jon Peltier et al.-
I’m not a math expert. But after trying to fit your equations for the last 2 hrs, finally I gave up! And really don’t understand why the interpolation is not working for my database (excel file link below)
http://www.mediafire.com/view/?hdpdpu641qdpxak
I would be very glad if I find any solution. Thanks- Rajib
Jon Peltier says
There was no data in the posted worksheet, so I couldn’t interpolate values either.
Rajib says
Sorry Jon Peltier, There was something wrong. Please find the data in excel as well as in a tab delimited .txt file below:
http://www.mediafire.com/view/?1bo9he3c3jz05u7
http://www.mediafire.com/view/?tmyuta23ajjj27i
Best/
Rajib
Margaret says
The simple linear interpolation methodology worked for me. I used the equations to linearly interpolate y values for a column of x values.
Many Thanks
Hello Jon says
I’ve been trying to get this to work with a very small set of yearly data so:
X = year in the format 2001
Y = a price
I have this data for 12 years
i want to estimate a monthly value in any of those twelve years.
However, i’m struggling to do this with your spreadsheet.
For example i tried 2001.083 which could predicate a month
however i get a ref error in A2 and i dont think this is the best way of doing it.
Ultimately i want to try and estimate each monthly figure based on a set of yearly final figure. So the value of months 1 through 12 should equal the current yearly figure.
Hope this makes sens.e
Jon Peltier says
Cell A2 should have no error, because it’s the cell you enter the X value for which you want to determine a Y value. So with 2001.083 in A2 you should get a value that is 1/12 of the way from the 2001 value to the 2002 value in cell B2.
Also, this sentence doesn’t make sense if you’re trying to estimate monthly values:
“So the value of months 1 through 12 should equal the current yearly figure.”
Ravi Verma says
Pl indicate Excel program (MAC version) to use the Piece-wise Hermite Interpolation.
Jon Peltier says
Ravi –
Try Google.
HJ says
Clear, concise, and exactly what I needed. Thanks a lot – HJ
satria says
hi
is this formula in excel uses for quadratic spline or another interpolation?
Jon Peltier says
Satria –
The interpolation formulas I introduced in the post are simple linear interpolations.
In the discussion, several commenters cited other kinds of fit, including splines.
Anonymous says
Please any one can do it for me or tell me how to do this
regards
maak_88@yahoo.com
y x X Values to be interpolated
1 99.72183534 99.56709957
2 99.4428926 99.13793103
3 99.1631652 98.71244635
4 98.88264649 98.29059829
5 98.60132972 97.87234043
6 98.31920803 97.45762712
7 98.03627447 97.0464135
8 97.75252199 96.63865546
9 97.46794345 96.23430962
10 97.18253158 95.83333333
11 96.89627902 95.43568465
12 96.60917831 95.04132231
13 96.32122185 94.65020576
14 96.03240194 94.26229508
15 95.74271078 93.87755102
16 95.45214042 93.49593496
17 95.16068282 93.11740891
18 94.86832981 92.74193548
19 94.57507306 92.36947791
20 94.28090416 92
21 93.98581453 91.63346614
22 93.68979548 91.26984127
23 93.39283817 90.90909091
24 93.09493363 90.5511811
25 92.79607271 90.19607843
26 92.49624617 89.84375
27 92.19544457 89.49416342
28 91.89365835 89.14728682
29 91.59087776 88.8030888
30 91.28709292 88.46153846
31 90.98229376 88.12260536
32 90.67647006 87.78625954
33 90.36961141 87.45247148
34 90.06170724 87.12121212
35 89.75274679 86.79245283
36 89.4427191 86.46616541
37 89.13161305 86.1423221
38 88.8194173 85.82089552
39 88.50612032 85.50185874
40 88.19171037 85.18518519
41 87.87617551 84.87084871
42 87.55950358 84.55882353
43 87.24168219 84.24908425
44 86.92269874 83.94160584
45 86.60254038 83.63636364
46 86.28119404 83.33333333
47 85.95864639 83.03249097
48 85.63488386 82.73381295
49 85.30989261 82.43727599
50 84.98365856 82.14285714
51 84.65616733 81.85053381
52 84.32740427 81.56028369
53 83.99735446 81.27208481
54 83.66600265 80.98591549
55 83.33333333 80.70175439
56 82.99933065 80.41958042
57 82.66397845 80.13937282
58 82.32726023 79.86111111
59 81.98915917 79.58477509
60 81.64965809 79.31034483
61 81.30873945 79.03780069
62 80.96638534 78.76712329
63 80.62257748 78.49829352
64 80.27729719 78.23129252
65 79.93052539 77.96610169
66 79.58224258 77.7027027
67 79.23242883 77.44107744
68 78.88106377 77.18120805
69 78.5281266 76.92307692
70 78.173596 76.66666667
71 77.8174502 76.41196013
72 77.45966692 76.1589404
73 77.10022337 75.90759076
74 76.73909622 75.65789474
75 76.37626158 75.40983607
76 76.01169501 75.16339869
77 75.64537145 74.91856678
78 75.27726527 74.67532468
79 74.90735018 74.43365696
80 74.53559925 74.19354839
81 74.16198487 73.95498392
82 73.78647874 73.71794872
83 73.40905182 73.48242812
84 73.02967433 73.24840764
85 72.64831573 73.01587302
86 72.26494463 72.78481013
87 71.87952884 72.55520505
88 71.4920353 72.32704403
89 71.10243003 72.10031348
90 70.71067812 71.875
91 70.3167437 71.65109034
92 69.92058988 71.42857143
93 69.52217872 71.20743034
94 69.12147118 70.98765432
95 68.71842709 70.76923077
96 68.31300511 70.55214724
97 67.90516262 70.33639144
98 67.49485577 70.12195122
99 67.08203932 69.90881459
100 66.66666667 69.6969697
101 66.24868971 69.48640483
102 65.82805886 69.27710843
103 65.4047229 69.06906907
104 64.97862897 68.86227545
105 64.54972244 68.65671642
106 64.11794687 68.45238095
107 63.68324392 68.24925816
108 63.2455532 68.04733728
109 62.80481227 67.84660767
110 62.36095645 67.64705882
111 61.91391874 67.44868035
112 61.46362972 67.25146199
113 61.01001739 67.05539359
114 60.55300708 66.86046512
115 60.09252126 66.66666667
116 59.6284794 66.47398844
117 59.16079783 66.28242075
118 58.68938954 66.09195402
119 58.21416399 65.9025788
120 57.73502692 65.71428571
121 57.25188012 65.52706553
122 56.76462122 65.34090909
123 56.27314339 65.15580737
124 55.7773351 64.97175141
125 55.27707984 64.78873239
126 54.77225575 64.60674157
127 54.26273532 64.42577031
128 53.74838499 64.24581006
129 53.22906474 64.06685237
130 52.70462767 63.88888889
131 52.17491947 63.71191136
132 51.63977795 63.5359116
133 51.09903239 63.36088154
134 50.55250296 63.18681319
135 50 63.01369863
136 49.44132325 62.84153005
137 48.876261 62.67029973
138 48.30458915 62.5
139 47.72607021 62.33062331
140 47.14045208 62.16216216
141 46.54746681 61.99460916
142 45.94682917 61.82795699
143 45.33823503 61.66219839
144 44.72135955 61.4973262
145 44.09585518 61.33333333
146 43.46134937 61.17021277
147 42.81744193 61.00795756
148 42.16370214 60.84656085
149 41.49966533 60.68601583
150 40.82482905 60.52631579
151 40.1386486 60.36745407
152 39.44053189 60.20942408
153 38.72983346 60.05221932
154 38.0058475 59.89583333
155 37.26779962 59.74025974
156 36.51483717 59.58549223
157 35.74601765 59.43152455
158 34.96029494 59.27835052
159 34.15650255 59.12596401
160 33.33333333 58.97435897
161 32.48931448 58.82352941
162 31.6227766 58.67346939
163 30.73181486 58.52417303
164 29.8142397 58.37563452
165 28.86751346 58.2278481
166 27.88866755 58.08080808
167 26.87419249 57.93450882
168 25.81988897 57.78894472
169 24.72066162 57.64411028
170 23.57022604 57.5
171 22.36067977 57.35660848
172 21.08185107 57.21393035
173 19.72026594 57.0719603
174 18.25741858 56.93069307
175 16.66666667 56.79012346
176 14.90711985 56.65024631
177 12.90994449 56.51105651
178 10.54092553 56.37254902
179 7.453559925 56.23471883
180 0 56.09756098
55.96107056
55.82524272
55.69007264
55.55555556
55.42168675
55.28846154
55.1558753
55.02392344
54.89260143
54.76190476
54.63182898
54.50236967
54.37352246
54.24528302
54.11764706
53.99061033
53.86416862
53.73831776
53.61305361
53.48837209
53.36426914
53.24074074
53.11778291
52.99539171
52.87356322
52.75229358
52.63157895
52.51141553
52.39179954
52.27272727
52.15419501
52.0361991
51.91873589
51.8018018
51.68539326
51.56950673
51.4541387
51.33928571
51.22494432
51.11111111
50.99778271
50.88495575
50.77262693
50.66079295
50.54945055
50.43859649
50.32822757
50.21834061
50.10893246
50
49.89154013
49.78354978
49.67602592
49.56896552
49.46236559
49.35622318
49.25053533
49.14529915
49.04051173
48.93617021
48.83227176
48.72881356
48.62579281
48.52320675
48.42105263
48.31932773
48.21802935
48.11715481
48.01670146
47.91666667
47.81704782
47.71784232
47.61904762
47.52066116
47.42268041
47.32510288
47.22792608
47.13114754
47.03476483
46.93877551
46.84317719
46.74796748
46.65314402
46.55870445
46.46464646
46.37096774
46.277666
46.18473896
46.09218437
46
Jon Peltier says
Maak –
I presume you want a new Y value for each X value in your second list.
I put the pairs of X and Y into X-Y order, not Y-X order (columns A:B, see screenshot below).
I put the new X values into another column E.
I used column D for an index, which points to the value in the column of input X values closest to but greater than the new X value. The formula in D2 is as follows, and copied down as far as there are values in column E:
=MATCH(E2,$A$2:$A$181,-1)
The third argument, -1, is needed because the input X values are in decreasing order.
I used column F for the interpolated Y values. The formula in F2 is as follows, copied down as far as needed:
=(E2-INDEX($A$2:$A$181,D2))/(INDEX($A$2:$A$181,D2+1)-INDEX($A$2:$A$181,D2))*(INDEX($B$2:$B$181,D2+1)-INDEX($B$2:$B$181,D2))+INDEX($B$2:$B$181,D2)
It’s basically the same formula as in cell B2 of the example.
wga says
Hi,
I see that the graph never fall back (it ascends). But what if when the curve falls back (quadratic – so that each y value will have two x values). How can we interpolate for x for a given y value? Given: y=0.61357555
The correct answer is x=867.5618709 with this formula: =A20+((L3-B20)/(B21-B20))*(A21-A20). But I don’t need to manually search for the lower and upper bounds of the given value in y and corresponding x. I need the first (smaller x value).
Thanks,
Here is the data.
y x
37.18656
0.255633 103.88304
0.250945 188.28446
0.250528 262.19693
0.259958 375.86513
0.327046 492.14553
0.429768 630.57233
0.584134 782.98033
0.645869 960.33653
0.640969 1137.91573
0.610922 1296.88933
0.588545 1448.95363
0.598201 1595.24113
0.586454 1767.37863
0.579821 1920.10453
0.513784 2063.63463
0.509153 2206.45073
0.515956 2366.94803
0.470963 2507.86673
0.440226 2668.59793
0.415335 2784.40273
0.357285 2873.33485
0.338649 2966.22146
0.331126 3032.07235
0.331747 3082.98812
0.299145 3146.48242
0.291289 3222.21814
0.287335 3268.67082
wga says
37.18656 is under X while the corresponding Y value is missing.
Claudio says
Hi can you help me in this problem, I need to determine the values between minimun x =1 and the maximum x = -1,1 y = 46,99 and maximum y =139…my result it is a linear chart and should be a curve…I tried to determine the values like a progression but, it is clear it ´snt the way…
x_Coord1 = 1 and x_Coord2 = -1,1
y_Coord1 = 46,99 and y_Coord2= 139
x_Coord1= 2 and x_Coord2= 9,8
y_Coord1 = 44,23 and y_Coord2= 218
x_Coord1 3 and x_Coord2 10,4
x_Coord1 4 and x_Coord2 9,2
x_Coord1 5 and x_Coord2 7,5
x_Coord1 6 and x_Coord2 8,8
x_Coord1 7 and x_Coord2 8
y_Coord1 33,96 and y_Coord2 174
y_Coord1 30,36 and y_Coord2 192
y_Coord1 27,69 and y_Coord2 184
y_Coord1 23,84 and y_Coord2 177
y_Coord1 18,86 and y_Coord2 165
Claudio says
I found a solution in the comments, thanks
AK says
I understand the given problem. I’m trying to run this operation in an x-y grid and cannot seem to figure it out… Any tips?
AK says
Elaboration…
This is a stream function on a grid showing a numerical value at each (x,y) coordinate. The blue line represents a streamline that starts at 0.1 and I’m tracing it above the numbers in that row as the value goes down 0.1’s relationship is higher. Now, this is just me eyeballing it. I want to interpolate at 0.1 to find the exact values of this line…
Jon Peltier says
I’m looking for some kind of contours of equal value, which is what interpolation could help with, but that’s not apparently what you’re trying to draw.
AK says
Actually, the contour of equal value would be the same thing as the streamline.
Jon Peltier says
But that’s not a line of equal value, at least of the values in the grid. In the first column, the line is at a value between 0.02 and 0.04. In the second it’s between 0.01 and 0.02. In the third it’s between -0.01 and -0.01. But there can be no value between these three pairs of limits, since they do not overlap. This is why I don’t understand what you are trying to plot.
Anant Murmu says
Hi Jon
Does this formula work, if one the axis in logarithemnic scale to base 10? I have Particle size on X axis with logaritamic scale and %Fine on Y axis. I want to interpolate value on x-axis for a know point on Y axis (say 60).
Particle Size %Finer
4.75 65.52
2 50.98
1.18 36.2
0.6 30.34
0.425 19.78
0.3 9.3
0.15 4.02
0.075 0.2
Would be greatful, if you could help.
Than you
Jon Peltier says
Anant –
Good point. Interpolation gives us the coordinates of a point on a straight line between two known points. But we get a different straight line depending on our coordinate system. The straight line we get using linear X and Y (blue in the charts below) is not the same as the straight line we get when our X axis is logarithmic (orange). These are the last two points of your input data, sorted in increasing order or particle size.
We need to use a modification to our formula if we want to capture the logarithmic straight line. Below are three sets of data: Your input data sorted by increasing particle size (top), three particle sizes and their % finer values calculated using linear formulas (blue), and the same particle sizes and their % finer values calculated using logarithmic formulas (orange).
Formulas in the blue (linear) table are as follows, filled down to compute all three values:
B14:
=MATCH(C14,$C$3:$C$10)
D14:
=INDEX(D$3:D$10,$B14)+(C14-INDEX($C$3:$C$10,B14))*
(INDEX(D$3:D$10,B14+1)-INDEX(D$3:D$10,B14))/
(INDEX(C$3:C$10,B14+1)-INDEX(C$3:C$10,B14))
Formulas in the orange (logarithmic) table are:
B20:
=MATCH(C20,$C$3:$C$10)
D20:
=INDEX(D$3:D$10,$B20)+(LOG($C20)-LOG(INDEX($C$3:$C$10,$B20)))*
(INDEX(D$3:D$10,$B20+1)-INDEX(D$3:D$10,$B20))/
(LOG(INDEX($C$3:$C$10,$B20+1))-LOG(INDEX($C$3:$C$10,$B20)))
All of the particle size measurements are converted to their logarithms in the interpolation formula.
The charts below show how the blue (linear) points fall on the straight lines of the linear chart (left) while the orange (log) points fall on the straight lines of the logarithmic plot (right).
CG says
Has anyone gotten this formula to work in the latest version of excel? I have copied and pasted the exact formula and the exact data into the same cells and I keep getting an error message that long story short, tells me that the formula is messed up in some way.
Jon Peltier says
CG –
There’s no reason any valid formula in one version of Excel will not work in a later version.
Which formula is it? How is it entered? What error message are you getting? In which earlier version of Excel was it working.
RAVI PATEL says
THANK YOU
GC says
Thanks to everyone who has contributed to this thread. The information here has been extremely valuable in understanding how Excel presents the smoothed line chart.
My understanding is that the tension on the Catmull-Rom spline is a function of the chart’s features: width, height and scale. The tension may alter due to a simple resize of the chart. Is this correct? If so, it seems highly undesirable.
bakhtiar says
How can i interpolate value (a,b,c,d,e) of quartic equation in excel?
Jon Peltier says
Bakhtiar –
If you have the coefficients (a, b, c, d, e), you don’t need to interpolate, you can simply plug in the X value and solve for Y.
If you need to determine the coefficients, then you need to use the LINEST worksheet function. See Microsoft’s documentation for LINEST.
Kara Combs says
Is there a way to do this in 3D for more than 2 points?
Jon Peltier says
Kara –
I haven’t done this, but it is certainly possible. It requires a particular data layout and more complicated formulas. A good resource seems to be Interpolation methods. If you need further assistance, you’ll have to try your own Google search (where I got the link I’ve cited).
Gabriel says
Hi all,
My name is Gabriel, I am doing a phd in biomechanic applied to tennis. I have created an excel spreedsheet to analyze tennis strokes in 3d (using an algorithm called direct linear transformation). Now I hace the x, y, z coordenates of one relevant point of the tennis racquet. Knowing the time between frames I want to compute the speed. I want to compute the speed by differentiation of the position. The easiest way is to use the position of two adjacent point but I want to do something more advanced and conpute speed differentiating cubic splines. In biomechanic quintil splines are usually used but I have never seen they implemented in excel. I think cubic splines are also a good solution. I want to do this by using not array formulas but I do not know if it is possible. I think this forum is perfect for asking it as I have seen (but not understand) fantastic formulaes using not array functions. If you think there is an easier way to do this (althought is not the perfect solution) please tell me. One colleague also told me that when the time between frames was not exact (in my case the camera records at 120 hz so the time has periodic decimals) all this proccesed were more complicated (but I think he refered to the application of the IRR filters). As you can deduce by my message I need a dummy solution.
Thanks a lot,
Gabriel
Jon Peltier says
Gabriel –
Your comment sounds like a (draft) statement of work in search of a proposal. If this is your intention, you can submit a request for proposal, with a more detailed statement of work. Otherwise, it is too large as an extension of the techniques in this article.
Gabriel says
Hi Jonh,
Thanks for your reply. I should have been more direct I think. My question is this:
Is this possible to perform quintic splines in excel? I have never seen. I always see cubic splines.
Best regards,
Gabriel
Jon Peltier says
I don’t know the difference between cubic and quartic splines, though I guess it’s a matter of the order of the formula. If that’s all the difference is, then I don’t see why you can’t use quartic splines in the same way that cubic splines are used.
Gabriel says
That is fantastic to here that… Then I could use this formula?:
=TREND(OFFSET($DH$4;MATCH(DP6;$A$4:$A$53)-2;;4);OFFSET($A$4;MATCH(DP6;$A$4:$A$53)-2;;4)^{1,2,3,4,5};DP6^{1,2,3,4,5})
This is Lagrange interpolation… Lagrange interpolation is a cubic interpolation?
Best regards,
Gabriel
Gabriel says
That is fantastic to hear that… Then I could use this formula?:
=TREND(OFFSET($DH$4;MATCH(DP6;$A$4:$A$53)-2;;4);OFFSET($A$4;MATCH(DP6;$A$4:$A$53)-2;;4)^{1,2,3,4,5};DP6^{1,2,3,4,5})
This is Lagrange interpolation… Lagrange interpolation is a spline interpolation?
Best regards,
Gabriel
Jon Peltier says
You’re asking the wrong person. Sorry.
Gabriel says
Oks. Thanks!
Anna says
Hi. I have data that is logarithmic on the x axis. I would like to use Catmull-Rom spline to interpolate points between known points. I would like to be able to input y and output the corresponding x value.
Can you please explain how the Catmull-Rom spline formula can be adapted to work with semilogx data, inputting y, outputting x. The current formula I am using is
=SUM((1+IRR(MMULT({0,2,0,0;-1,0,1,0;2,-5,4,-1;-1,3,-3,1},OFFSET(B6,MATCH(B2,B6:B18)-2,,4)-B2)))^-{0;1;2;3}*MMULT({0,2,0,0;-1,0,1,0;2,-5,4,-1;-1,3,-3,1},OFFSET(A6,MATCH(B2,B6:B18)-2,,4)))/2
Thanks so much.