Forums

pyarrow hangs in web app

Hello,

In my application, I am reading parquet files using pandas such as :

pd.read_parquet(fp, engine='pyarrow', columns=tags) 

It works well in a python console on pythonanywhere but once the function is integrated in my webapp (based on plotly-dash), the app is hanging at this specific line of code.

This is very strange as the app is working perfectly locally.

I suspect a confict somewhere in pythonanywhere on which I have no access.

The problem seems to be solved if I switch to the engine to 'fastparquet' but I need specific features in pyarrow that are not available in fastparquet. For example, the possibility to filter the data at row level, while fastparquet only allows it at group level (see "filter" section at https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html).

I am struggling with this for days and I would really appreciate some support from pythonanywhere team.

Thanks a lot in advance,

Patrick

Do you see any relevant entries in your web app's error / server logs?

Thank you for your reply.

There are no app or server errors...

After many trials, I concluded that the problem seems to be that pyarrow uses threads behind the scenes, whereas fastparquet does not. I also read that PythonAnywhere does not support threads. This is really annoying as it will be a key obstacle to scaling up the app later on :-(

Can you confirm that threads is not supported by pythonanywhere ?

Threads are not supported in web apps. They will work in other contexts