Skip to content

BUG: series.astype(str) vs Index.astype(str) inconsistency #45326

@jbrockmendel

Description

@jbrockmendel
ser = pd.Series([b'foo'], dtype=object) idx = pd.Index(ser) In [3]: ser.astype(str)[0] Out[3]: "b'foo'" In [4]: idx.astype(str)[0] Out[4]: 'foo'

Series goes through astype_nansafe and from there through ensure_string_array. Index just calls calls the ndarray's .astype method. The Index behavior is specifically tested in test_astype_str_from_bytes xref #38607.

Index.astype can raise on non-ascii bytestrs:

val = 'あ' bval = val.encode("UTF-8") ser2 = pd.Series([bval], dtype=object) idx2 = pd.Index(ser2) In [21]: ser2.astype(str)[0] Out[21]: "b'\\xe3\\x81\\x82'" In [22]: idx2.astype(str) --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) [...] TypeError: Cannot cast Index to dtype <U0 

These behaviors should match. I don't really have a preference between them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions