Skip to content

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Sep 6, 2025

I feel like I must have missed some forbidden case here - it seemed too simple to effectively replace the use of the pandas.Index.slice_indexer method...

@TomNicholas TomNicholas changed the title Support method sel float coords Support .sel with method kwarg for slices Sep 6, 2025
@max-sixty
Copy link
Collaborator

this looks pretty good!!

do we need to be concerned about method as a dim we're selecting over?

@TomNicholas
Copy link
Member Author

do we need to be concerned about method as a dim we're selecting over?

method is already in the API, I didn't touch that part, so I assume (hope) that is already handled at a higher-level somewhere.

@keewis
Copy link
Collaborator

keewis commented Sep 7, 2025

either way, selecting from coordinates named method can still be done by passing a dict

@max-sixty
Copy link
Collaborator

method is already in the API, I didn't touch that part, so I assume (hope) that is already handled at a higher-level somewhere.

sorry, you're right!

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This omission was definitely back in the day!

I no longer recall exactly why, but I think the main concern was about potentially ambiguous behavior. In particular, default slicing works similar to method='backfill' for the left bound and method='pad' for the right bound, which isn't possible to express with a single method argument.

Thinking about this now, I think this is probably safe and would be a welcome new feature, though it may be worth considering supporting the "default" method=None with an explicit tolerance.

Comment on lines 569 to 571
slice_index_bounds = index.get_indexer(
[slice_label_start, slice_label_stop], method=method, tolerance=tolerance
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to handle the "no match" case, for which get_indexer() returns -1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I handled this in d05ab1a, with the implementation now returning an empty slice if either endpoint is not found.

f"{coord_name!r} with a slice over integer positions; the index is "
"unsorted or non-unique"

# +1 needed to emulate behaviour of xarray sel with slice without method kwarg, which is inclusive of point at stop label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the index is non-unique?

Copy link
Member Author

@TomNicholas TomNicholas Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour of the existing implementation is confusing. This passes:

data_non_unique = xr.Dataset( coords={"lat": ("lat", [20.1, 21.1, 21.1, 22.1, 22.1, 23.1])} ) expected = xr.Dataset(coords={"lat": ("lat", [21.1, 21.1, 22.1, 22.1])}) actual = data_non_unique.sel(lat=slice(21.1, 22.1)) assert_identical(expected, actual)

but it relies upon calling pandas.Index.slice_indexer on a non-unique index, despite the docstring of that method saying "Index needs to be ordered and unique."!

Also, triggering my implementation (using actual = data_non_unique.sel(lat=slice(21.0, 22.2), method="nearest")) currently fails at .get_indexer with pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects. This restriction seems reasonable, but is not documented in the docstring of .get_indexer.

So to get the current (intuitive) behaviour we are relying on undefined behaviour in pandas, and we can't support method/tolerance for slices on non-unique indexers using .get_indexer because of undocumented restrictions in pandas 🙃

Copy link
Member Author

@TomNicholas TomNicholas Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least I can clearly catch this case, which I've done in 199765e. That means this PR now reduces the NotImplementedError surface from "any Index passed a slice with method" to "any non-unique Index passed a slice with method".

@TomNicholas
Copy link
Member Author

it may be worth considering supporting the "default" method=None with an explicit tolerance.

I added support for this in 8cce331, and a test that confirms that with a big enough value of tolerance this does reduce to the "default" behaviour, comparing against when tolerance is not supplied.

@TomNicholas TomNicholas requested a review from shoyer September 8, 2025 21:34
)

if method is None and tolerance is not None:
# copies default behaviour of slicing with no tolerance, which is to be exclusive at both ends
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default is inclusive?

Suggested change
# copies default behaviour of slicing with no tolerance, which is to be exclusive at both ends
# copies default behaviour of slicing with no tolerance, which is to be inclusive at both ends
expected = xr.Dataset(coords={"lat": ("lat", [21.1, 22.1])})
actual = data_float_coords.sel(
lat=slice(21.0, 22.2),
tolerance=1.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tolerance=0.0 might be a cleaner test here

Comment on lines +2167 to +2173
expected = data.isel(dim2=slice(2, 7))
actual = data.sel(dim2=slice(1, 3), method="ffill")
assert_identical(expected, actual)

expected = data.isel(dim2=slice(2, 7, 2))
actual = data.sel(dim2=slice(1, 3, 2), method="ffill")
assert_identical(expected, actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for method='backfill'?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment