pd.read_parquet causing Python to crash

These are the commands I ran in IPython:

In [1]: import pandas as pd In [2]: df = pd.DataFrame(data={'x': [1,2,3]}) In [3]: df.to_parquet('test.parquet') In [4]: with open('test.parquet', 'rb') as f: ...: df2 = pd.read_parquet(f) ...:

At which point it exits IPython without displaying an error or stacktrace. Maybe this is some kind of segfault? It also fails when running from a script, and also when reading from a BytesIO instead of a file.

This is using pandas==1.2.0, pyarrow==2.0.0, and Python 3.7.3. Also this was run in Windows from Powershell.

If it helps, this is the output from pd.show_versions():

INSTALLED VERSIONS ------------------ commit : 3e89b4c4b1580aa890023fc550774e63d499da25 python : 3.7.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.17763 machine : AMD64 processor : Intel64 Family 6 Model 45 Stepping 7, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 1.2.0 numpy : 1.19.4 pytz : 2020.5 dateutil : 2.8.1 pip : 20.3.3 setuptools : 51.1.0.post20201221 Cython : 0.29.14 pytest : None hypothesis : None sphinx : 2.0.0 blosc : None feather : None xlsxwriter : 1.1.5 lxml.etree : 4.4.2 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.4.0 pandas_datareader: None bs4 : 4.7.1 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.1.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 2.0.0 pyxlsb : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.2 tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pd.read_parquet causing Python to crash #39031

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

pd.read_parquet causing Python to crash #39031

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions