Using Excel's INDEX/MATCH approach on unsorted columns

After almost two years with Washington State’s Spills Prevention, Preparedness, and Response program, I’ve transitioned back to water resources engineering and started a new job with the USACE Portland District. Surprisingly, I found myself working with Excel for one of my first tasks; I’m on a large team with very diverse specialties and skillsets, and it turns out Excel is our common ground for passing data around. The task involved developing a spreadsheet for data interpolation and conversion for about 20 projects in a large river system.

Linear interpolation is a breeze in most programming languages; for example, R provides approx() and Python’s numpy module includes numpy.interp(). With Excel it’s a little more complicated; you can use the FORECAST.LINEAR() function to do linear interpolation, but you have to limit the known_x's and known_y's arguments to the data points bounding the interpolation point x. The function is technically doing linear regression, but the result is equivalent if you limit the input arrays to the bounding values; you could also just enter the linear interpolation formula directly, but you’d still need to identify the bounding x and y values. You can pull the bounding values from the arrays in a few ways, but the most flexible approach is to use the INDEX()/MATCH() combo.

For this example, assume you have your xs and ys in the range A1:B6 and the x value you want to interpolate to find y is in cell D1.

	A	B	D
1	0	0	5
2	2	10
3	4	20
4	6	30
5	8	40
6	10	50

' search for nearest x value
=INDEX(A1:A6, MATCH(D1, A1:A6, 1), 0) ' returns 4 
' search for nearest y value
=INDEX(B1:B6, MATCH(D1, A1:A6, 1), 0) ' returns 20

Well, not quite. There is a major issue with this strategy: searching for a nearest match (as opposed to an exact match) only works for sorted data. Even worse, the parameterization above (match_type = 1) assumes your values are sorted in ascending order; if you have descending values, you need to change the match_type argument to -1.

This was a dealbreaker for me because I was building a dynamic Excel sheet where the user would select different datasets to interpolate from—some of which were sorted in ascending order, and others in descending order. Besides, I take issue with relying on a sort order on principle; it’s a somewhat hidden requirement that could easily be broken by users or future developers. I wanted to find more reliable way of performing interpolation that didn’t require any value ordering at all.

The trick is to use the MINIFS() and MAXIFS() functions, which return the minimum (or maximum) value in a range that matches a specified criteria. In this case, we use MAXIFS() to find the maximum value that is less than or equal to our interpolation point (the lower bound) and MINIFS() to find the minimum value that is greater than or equal to our interpolation point (the upper bound). We can then use these values to search for an exact MATCH() (match_type = 0) which does not require any value ordering to work properly.

' get the largest x value less than or equal to
' the interpolation value
=MAXIFS(A1:A6, A1:A6,"<="&D1) ' returns 4
' get the row number of the largest x value less than or equal to
' the interpolation value
=MATCH(MAXIFS(A1:A6, A1:A6,"<="&D1), A1:A6, 0) ' returns 3
' get the y value corresponding to the largest x value less than or
' equal to the interpolation value
=INDEX(B1:B6, MATCH(MAXIFS(A1:A6, A1:A6,"<="&D1), A1:A6, 0)) ' returns 20

And that’s it! Now you can do nearest value searches on unordered data.

Using Excel's INDEX/MATCH approach on unsorted columns

Comments