Skip to content

to_dict() on a boolean series sometimes returns numpy types instead of Python types #27616

@maximz

Description

@maximz

Problem description

I construct a Series in several ways that should give the same output from to_dict(), but instead I get different output types. In my case, this breaks downstream JSON serializers.

The code sample below includes cases with correct output (bool) and incorrect (numpy.bool_) -- see inline comments.

Related issues, though none seem exactly the same: #13258, #13830, #16048, #17491, #19381, #20791, #23753, #23921, #24908, #25969

Code sample

In [1]: import pandas as pd In [2]: df = pd.DataFrame({ 'a': [True, False], 'b': [0, 1]} ) In [3]: df Out[3]: a b 0 True 0 1 False 1 In [27]: type(df['a'].iloc[0]) Out[27]: numpy.bool_ In [48]: type(df[['a']].iloc[0, 0]) Out[48]: numpy.bool_ In [33]: type(df.iloc[0,0]) Out[33]: numpy.bool_ In [24]: type(df.iloc[0]['a']) Out[24]: numpy.bool_ # ---- In [4]: df[['a']].iloc[0].to_dict() Out[4]: {'a': True} # correct In [5]: type(df[['a']].iloc[0].to_dict()['a']) Out[5]: bool In [6]: df.iloc[0][['a']].to_dict() Out[6]: {'a': True} # this one is incorrect, should return bool In [7]: type(df.iloc[0][['a']].to_dict()['a']) Out[7]: numpy.bool_ # ---- In [8]: df[['a', 'b']].to_dict(orient='records')[0] Out[8]: {'a': True, 'b': 0} # correct In [9]: type(df[['a', 'b']].to_dict(orient='records')[0]['a']) Out[9]: bool In [10]: df[['a', 'b']].iloc[0].to_dict() Out[10]: {'a': True, 'b': 0} # this one is incorrect, should return bool In [11]: type(df[['a', 'b']].iloc[0].to_dict()['a']) Out[11]: numpy.bool_

This may explain what's going on:

In [54]: df.iloc[0][['a']] Out[54]: a True Name: 0, dtype: object In [56]: df[['a']].iloc[0] Out[56]: a True Name: 0, dtype: bool 

That relates to #25969, where @mroeschke commented about a similar dtype discrepancy:

This probably occurs because s2 is object dtype and it's trying to preserve the dtype of each input argument while the arguments in s1 can both be coerced to int64.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 18.6.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 0.25.0 numpy : 1.16.4 pytz : 2019.1 dateutil : 2.8.0 pip : 19.0.3 setuptools : 40.8.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.1 IPython : 7.6.1 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : 2.6.9 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : None sqlalchemy : None tables : 3.5.2 xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions