-
-
Couldn't load subscription status.
- Fork 19.2k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd import pyarrow as pa s = pd.Series([1, 2], dtype=pd.ArrowDtype(pa.int32())) r1 = s.rank(method="min") df = s.to_frame(name="a") r2 = df.rank(method="min") >>> s 0 1 1 2 dtype: int32[pyarrow] >>> df.dtypes a int32[pyarrow] dtype: object >>> r1 0 1 1 2 dtype: uint64[pyarrow] >>> r2 a 0 1.0 1 2.0 >>> r2.dtypes a float64 dtype: objectIssue Description
When we have a dataframe backed with pyarrow type data, when we call df.rank(method="min"), returned result is not arrow backed dataframe. This behavior does not happen for Series.rank(), we could see Series.rank() returned result is still arrow backed Series.
Incorrect:
df.dtypes
a int32[pyarrow]
dtype: object
r2 = df.rank(method="min")
r2.dtypes
a float64
dtype: object
Correct:
s
0 1
1 2
dtype: int32[pyarrow]
r1 = s.rank(method="min")
r1.dtype
uint64[pyarrow]
Expected Behavior
DataFrame.rank should return pyarrow backed dataframe when original dataframe filled with pyarrow.
Installed Versions
pd.version
'2.0.0'