-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
Closed
Labels
AstypeBugDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversions
Milestone
Description
ser = pd.Series([b'foo'], dtype=object) idx = pd.Index(ser) In [3]: ser.astype(str)[0] Out[3]: "b'foo'" In [4]: idx.astype(str)[0] Out[4]: 'foo'Series goes through astype_nansafe and from there through ensure_string_array. Index just calls calls the ndarray's .astype method. The Index behavior is specifically tested in test_astype_str_from_bytes xref #38607.
Index.astype can raise on non-ascii bytestrs:
val = 'あ' bval = val.encode("UTF-8") ser2 = pd.Series([bval], dtype=object) idx2 = pd.Index(ser2) In [21]: ser2.astype(str)[0] Out[21]: "b'\\xe3\\x81\\x82'" In [22]: idx2.astype(str) --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) [...] TypeError: Cannot cast Index to dtype <U0 These behaviors should match. I don't really have a preference between them.
Metadata
Metadata
Assignees
Labels
AstypeBugDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversions