After almost two years with Washington State’s Spills Prevention, Preparedness, and Response program, I’ve transitioned back to water resources engineering and started a new job with the USACE Portland District. Surprisingly, I found myself working with Excel for one of my first tasks; I’m on a large team with very diverse specialties and skillsets, and it turns out Excel is our common ground for passing data around. The task involved developing a spreadsheet for data interpolation and conversion for about 20 projects in a large river system.
Linear interpolation is a breeze in most programming languages; for
example, R provides
numpy module includes
With Excel it’s a little
more complicated; you can use the
function to do linear interpolation, but you have to limit the
known_y's arguments to the data points bounding the interpolation
x. The function is technically doing linear regression, but the
result is equivalent if you limit the input arrays to the bounding
values; you could also just enter the linear interpolation formula
directly, but you’d still need to identify the bounding x and y values.
You can pull the bounding values from the arrays in a few ways, but the
most flexible approach is to use the
For this example, assume you have your
ys in the range
x value you want to interpolate to find
y is in cell
' search for nearest x value =INDEX(A1:A6, MATCH(D1, A1:A6, 1), 0) ' returns 4 ' search for nearest y value =INDEX(B1:B6, MATCH(D1, A1:A6, 1), 0) ' returns 20
Well, not quite. There is a major issue with this strategy: searching
for a nearest match (as opposed to an exact match) only works for
sorted data. Even worse, the parameterization above (
match_type = 1)
assumes your values are sorted in ascending order; if you have
descending values, you need to change the
match_type argument to
This was a dealbreaker for me because I was building a dynamic Excel sheet where the user would select different datasets to interpolate from—some of which were sorted in ascending order, and others in descending order. Besides, I take issue with relying on a sort order on principle; it’s a somewhat hidden requirement that could easily be broken by users or future developers. I wanted to find more reliable way of performing interpolation that didn’t require any value ordering at all.
The trick is to use the
MAXIFS() functions, which
return the minimum (or maximum) value in a range that matches a
specified criteria. In this case, we use
MAXIFS() to find the
maximum value that is less than or equal to our interpolation point
(the lower bound) and
MINIFS() to find the minimum value that is
greater than or equal to our interpolation point (the upper bound).
We can then use these values to search for an exact
match_type = 0) which does not require any value ordering to work
' get the largest x value less than or equal to ' the interpolation value =MAXIFS(A1:A6, A1:A6,"<="&D1) ' returns 4 ' get the row number of the largest x value less than or equal to ' the interpolation value =MATCH(MAXIFS(A1:A6, A1:A6,"<="&D1), A1:A6, 0) ' returns 3 ' get the y value corresponding to the largest x value less than or ' equal to the interpolation value =INDEX(B1:B6, MATCH(MAXIFS(A1:A6, A1:A6,"<="&D1), A1:A6, 0)) ' returns 20
And that’s it! Now you can do nearest value searches on unordered data.