After almost two years with Washington State’s Spills Prevention, Preparedness, and Response program, I’ve transitioned back to water resources engineering and started a new job with the USACE Portland District. Surprisingly, I found myself working with Excel for one of my first tasks; I’m on a large team with very diverse specialties and skillsets, and it turns out Excel is our common ground for passing data around. The task involved developing a spreadsheet for data interpolation and conversion for about 20 projects in a large river system.
Linear interpolation is a breeze in most programming languages; for
example, R provides
approx()
and Python’s numpy
module includes
numpy.interp()
.
With Excel it’s a little
more complicated; you can use the
FORECAST.LINEAR()
function to do linear interpolation, but you have to limit the known_x's
and known_y's
arguments to the data points bounding the interpolation
point x
. The function is technically doing linear regression, but the
result is equivalent if you limit the input arrays to the bounding
values; you could also just enter the linear interpolation formula
directly, but you’d still need to identify the bounding x and y values.
You can pull the bounding values from the arrays in a few ways, but the
most flexible approach is to use the
INDEX()
/MATCH()
combo.
For this example, assume you have your xs
and ys
in the range A1:B6
and the x
value you want to interpolate to find y
is in cell D1
.
A | B | C | D | |
---|---|---|---|---|
1 | 0 | 0 | 5 | |
2 | 2 | 10 | ||
3 | 4 | 20 | ||
4 | 6 | 30 | ||
5 | 8 | 40 | ||
6 | 10 | 50 |
' search for nearest x value
=INDEX(A1:A6, MATCH(D1, A1:A6, 1), 0) ' returns 4
' search for nearest y value
=INDEX(B1:B6, MATCH(D1, A1:A6, 1), 0) ' returns 20
Well, not quite. There is a major issue with this strategy: searching
for a nearest match (as opposed to an exact match) only works for
sorted data. Even worse, the parameterization above (match_type = 1
)
assumes your values are sorted in ascending order; if you have
descending values, you need to change the match_type
argument to
-1
.
This was a dealbreaker for me because I was building a dynamic Excel sheet where the user would select different datasets to interpolate from—some of which were sorted in ascending order, and others in descending order. Besides, I take issue with relying on a sort order on principle; it’s a somewhat hidden requirement that could easily be broken by users or future developers. I wanted to find more reliable way of performing interpolation that didn’t require any value ordering at all.
The trick is to use the MINIFS()
and MAXIFS()
functions, which
return the minimum (or maximum) value in a range that matches a
specified criteria. In this case, we use MAXIFS()
to find the
maximum value that is less than or equal to our interpolation point
(the lower bound) and MINIFS()
to find the minimum value that is
greater than or equal to our interpolation point (the upper bound).
We can then use these values to search for an exact MATCH()
(match_type = 0
) which does not require any value ordering to work
properly.
' get the largest x value less than or equal to
' the interpolation value
=MAXIFS(A1:A6, A1:A6,"<="&D1) ' returns 4
' get the row number of the largest x value less than or equal to
' the interpolation value
=MATCH(MAXIFS(A1:A6, A1:A6,"<="&D1), A1:A6, 0) ' returns 3
' get the y value corresponding to the largest x value less than or
' equal to the interpolation value
=INDEX(B1:B6, MATCH(MAXIFS(A1:A6, A1:A6,"<="&D1), A1:A6, 0)) ' returns 20
And that’s it! Now you can do nearest value searches on unordered data.
Comments
Want to leave a comment? Visit this post's issue page on GitHub (you'll need a GitHub account).