-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
# /// script # requires-python = ">=3.13" # dependencies = [ # "pandas==2.3.3", # ] # /// import numpy as np import pandas as pd # Create a Series with non-contiguous integer index (step of 5) # Index: 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, ... # Values: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... T = np.arange(0, 100, 5) series = pd.Series(np.arange(len(T)), index=T) # Without step: returns all labels from 10 to 50 inclusive (label-based) result_no_step = series.loc[10:50] print("series.loc[10:50] (no step):") print(series.loc[10:50]) print() # With step=1: same as no step print("series.loc[10:50:1] (step=1):") print(series.loc[10:50:1]) print() # With step=2: step is applied positionally, start, stop applied to labels print("series.loc[10:50:2] (step=2):") print(series.loc[10:50:2]) print() # With step=5: Same behavior as step 2 print("series.loc[10:50:5] (step=5):") print(series.loc[10:50:5]) print() # Using arange with same arguments as the slice print("series.loc[np.arange(10,50,5)] (step=5):") print(series.loc[np.arange(10, 50, 5)]) print()Issue Description
When using .loc with a slice start/stop are applied over a different space than step which I found very counterintuitive. I also was not able to find any docs (on this admittedly niche use case)
start/stop are applied over the values of the labels
step is applied positionally over the index
The result of the above script is
series.loc[10:50] (no step): 10 2 15 3 20 4 25 5 30 6 35 7 40 8 45 9 50 10 dtype: int64 series.loc[10:50:1] (step=1): 10 2 15 3 20 4 25 5 30 6 35 7 40 8 45 9 50 10 dtype: int64 series.loc[10:50:2] (step=2): 10 2 20 4 30 6 40 8 50 10 dtype: int64 series.loc[10:50:5] (step=5): 10 2 35 7 dtype: int64 series.loc[np.arange(10,50,5)] (step=5): 10 2 15 3 20 4 25 5 30 6 35 7 40 8 45 9 dtype: int64 Expected Behavior
I would have expected either of the following:
error
Throw an error saying that step is ambiguous and cannot be used here. This seems to be the approach of IntervalIndex:
pandas/pandas/core/indexes/interval.py
Lines 978 to 982 in 499c5d4
| def _convert_slice_indexer(self, key: slice, kind: Literal["loc", "getitem"]): | |
| if not (key.step is None or key.step == 1): | |
| # GH#31658 if label-based, we require step == 1, | |
| # if positional, we disallow float start/stop | |
| msg = "label-based slicing with step!=1 is not supported for IntervalIndex" |
(Though as a sidenote I wasn't able to hit that code path)
Step applies to Label Space
In my example I would expect the slice with step=5 to behave the same as step=1 as it should hit each of the same values. My mental model is that for the case of integers as in my example
series.loc[slice(start, stop, step)]
should be equivalent to
series.loc[np.arange(start, stop+step, step)]
(with a +step to account for pandas inclusivity of bounds in slicing)
and in more amgious cases e.g. slice("a", "f", 3) and error should be thrown
Installed Versions
INSTALLED VERSIONS
------------------ commit : 9c8bc3e55188c8aff37207a74f1dd144980b8874 python : 3.13.0 python-bits : 64 OS : Darwin OS-release : 24.6.0 Version : Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:51 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8112 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.3.3 numpy : 2.3.5 pytz : 2025.2 dateutil : 2.9.0.post0 pip : None Cython : None sphinx : None IPython : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None