-
- Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
Some comparisons between different classes of string (e.g. string[pyarrow]
and str
) raise. Resolving this is straightforward except for what class should be returned. I would expect it should always be the left obj, e.g. string[pyarrow] == str
should return string[pyarrow]
whereas str == string[pyarrow]
should return str
. Is this the concensus?
We currently run into issues with how Python handles subclasses with comparison dunders.
lhs = pd.array(["x", pd.NA, "y"], dtype="string[pyarrow]") rhs = pd.array(["x", pd.NA, "y"], dtype=pd.StringDtype("pyarrow", np.nan)) print(lhs.__eq__(rhs)) # <ArrowExtensionArray> # [True, <NA>, True] # Length: 3, dtype: bool[pyarrow] print(lhs == rhs) # [ True False True]
The two results above differ because ArrowStringArrayNumpySemantics
is a proper subclass of ArrowStringArray
and therefore Python first calls rhs.__eq__(lhs)
.
We can avoid this by special casing this particular case in ArrowStringArrayNumpySemantics
, but I wanted to open up an issue for discussion before proceeding.