-  
-   Notifications  You must be signed in to change notification settings 
- Fork 19.2k
Description
Pandas version checks
-  I have checked that this issue has not already been reported. 
-  I have confirmed this issue exists on the latest version of pandas. 
-  I have confirmed this issue exists on the main branch of pandas. 
Reproducible Example
pd.cut with IntervalIndex is ~ 10x slower than pd.cut with bin edges specified for large arrays. For small arrays, using the IntervalIndex is faster.
This was surprising to me. If this is expected behaviour then it would be nice to update the docstring for pd.cut
import numpy as np import pandas as pd bins = np.arange(-40, 40, 0.1) index = pd.IntervalIndex.from_breaks(bins) N = 1_000 %timeit pd.cut(0 + 20 * np.random.standard_normal(N), bins) %timeit pd.cut(0 + 20 * np.random.standard_normal(N), index) N = 1_000_000 %timeit pd.cut(0 + 20 * np.random.standard_normal(N), bins) %timeit pd.cut(0 + 20 * np.random.standard_normal(N), index)30.2 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 3.18 ms ± 180 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 136 ms ± 7.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 2.25 s ± 331 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Installed Versions
commit : e8093ba
 python : 3.10.2.final.0
 python-bits : 64
 OS : Darwin
 OS-release : 21.5.0
 Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64
 machine : x86_64
 processor : i386
 byteorder : little
 LC_ALL : None
 LANG : en_US.UTF-8
 LOCALE : en_US.UTF-8
pandas : 1.4.3
 numpy : 1.21.6
 pytz : 2021.3
 dateutil : 2.8.2
 setuptools : 59.8.0
 pip : 22.0.3
 Cython : None
 pytest : 6.2.5
 hypothesis : None
 sphinx : None
 blosc : None
 feather : None
 xlsxwriter : None
 lxml.etree : None
 html5lib : None
 pymysql : None
 psycopg2 : None
 jinja2 : 3.0.3
 IPython : 8.0.1
 pandas_datareader: None
 bs4 : None
 bottleneck : None
 brotli :
 fastparquet : None
 fsspec : 2022.01.0
 gcsfs : None
 markupsafe : 2.0.1
 matplotlib : 3.5.1
 numba : 0.55.0
 numexpr : None
 odfpy : None
 openpyxl : None
 pandas_gbq : None
 pyarrow : None
 pyreadstat : None
 pyxlsb : None
 s3fs : None
 scipy : None
 snappy : None
 sqlalchemy : None
 tables : None
 tabulate : None
 xarray : 2022.6.0rc1.dev16+g6c8db5ed0
 xlrd : None
 xlwt : None
 zstandard : None
Prior Performance
No response