API (string dtype): comparisons between different string classes

@WillAyd

Some comparisons between different classes of string (e.g. string[pyarrow] and str) raise. Resolving this is straightforward except for what class should be returned. I would expect it should always be the left obj, e.g. string[pyarrow] == str should return string[pyarrow] whereas str == string[pyarrow] should return str. Is this the concensus?

We currently run into issues with how Python handles subclasses with comparison dunders.

lhs = pd.array(["x", pd.NA, "y"], dtype="string[pyarrow]") rhs = pd.array(["x", pd.NA, "y"], dtype=pd.StringDtype("pyarrow", np.nan)) print(lhs.__eq__(rhs)) # <ArrowExtensionArray> # [True, <NA>, True] # Length: 3, dtype: bool[pyarrow] print(lhs == rhs) # [ True False True]

The two results above differ because ArrowStringArrayNumpySemantics is a proper subclass of ArrowStringArray and therefore Python first calls rhs.__eq__(lhs).

We can avoid this by special casing this particular case in ArrowStringArrayNumpySemantics, but I wanted to open up an issue for discussion before proceeding.

cc @WillAyd @jorisvandenbossche

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API (string dtype): comparisons between different string classes #60639

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API (string dtype): comparisons between different string classes #60639

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions