Skip to content

Unstable hashtable / duplicated algo for object dtype #27035

@jorisvandenbossche

Description

@jorisvandenbossche

From a flaky test in geopandas, I observed the following behaviour:

In [1]: pd.__version__ Out[1]: '0.25.0.dev0+791.gf0919f272' In [2]: from shapely.geometry import Point In [3]: a = np.array([Point(1, 1), Point(1, 1)], dtype=object) In [4]: pd.Series(a).duplicated() Out[4]: 0 False 1 True dtype: bool In [6]: print(pd.Series(a).duplicated()) ...: print(pd.Series(a).duplicated()) 0 False 1 True dtype: bool 0 False 1 False dtype: bool 

So you see that sometimes it works, sometimes it does not work.

I am also not fully sure how the object hashtable works (assuming duplicated uses the hashtable), as the shapely Point objects are not hashable:

In [9]: pd.Series(a).unique() ... TypeError: unhashable type: 'Point' 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bugduplicatedduplicated, drop_duplicates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions