Skip to content

Conversation

@arw2019
Copy link
Contributor

@arw2019 arw2019 commented Dec 19, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

#38370 adds a pyarrow engine to the csv reader. Only a fraction of the io/parser tests pass when pyarrow is used and the rest has to be xfailed/skipped, resulting in a large diff on the PR.

xref #38370 (comment) suggests reorganizing the tests into classes so groups of tests can be xfailed with a single mark.

I'm grouping the tests logically (not based on whether or not they pass with pyarrow) but merging this this will reduce the diff in #38370 substantively.

Likely I will submit a follow-on with some further reorg. I'm happy to push that to this PR if that's preferred, though.

Verifying that total number of tests is unchanged:

(pandas-dev) andrewwieteska@Andrews-MacBook-Pro pandas % pytest pandas/tests/io/parser/test_dtypes.py pandas/tests/io/parser/test_usecols.py ========================================================================= test session starts ========================================================================= platform darwin -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 rootdir: /Users/andrewwieteska/repos/pandas, configfile: setup.cfg plugins: forked-1.2.0, xdist-2.1.0, cov-2.10.1, asyncio-0.14.0, hypothesis-5.41.2, instafail-0.4.1 collected 372 items pandas/tests/io/parser/test_dtypes.py ......................................................................................................................... [ 32%] ............................................................................................ [ 57%] pandas/tests/io/parser/test_usecols.py ........................................................................................................................ [ 89%] ....................................... [100%] ======================================================================== 372 passed in 32.49s ========================================================================= 

versus on master:

(pandas-dev) andrewwieteska@Andrews-MacBook-Pro pandas % pytest pandas/tests/io/parser/test_dtypes.py pandas/tests/io/parser/test_usecols.py ========================================================================= test session starts ========================================================================= platform darwin -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 rootdir: /Users/andrewwieteska/repos/pandas, configfile: setup.cfg plugins: forked-1.2.0, xdist-2.1.0, cov-2.10.1, asyncio-0.14.0, hypothesis-5.41.2, instafail-0.4.1 collected 372 items pandas/tests/io/parser/test_dtypes.py ......................................................................................................................... [ 32%] ............................................................................................ [ 57%] pandas/tests/io/parser/test_usecols.py ........................................................................................................................ [ 89%] ....................................... [100%] ======================================================================== 372 passed in 32.90s ========================================================================= 
@arw2019 arw2019 added Testing pandas testing functions or related to the test suite Clean Refactor Internal refactoring of code IO CSV read_csv, to_csv and removed Clean labels Dec 21, 2020
@jreback
Copy link
Contributor

jreback commented Dec 21, 2020

instead of using classes, can you just create / rename more module files? its the same effect.

cc @gfyoung

@jreback jreback added this to the 1.3 milestone Dec 21, 2020
@gfyoung
Copy link
Member

gfyoung commented Dec 22, 2020

100% agree with @jreback feedback. Functions are more idiomatic with pytest.

A module will achieve what you are trying to do with these classes. Otherwise, a custom fixture will do.

@arw2019
Copy link
Contributor Author

arw2019 commented Dec 22, 2020

Sgtm - will put these into modules

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

looks like something is failing, merge master. pls confrm same number of tests on master as on here.

@arw2019
Copy link
Contributor Author

arw2019 commented Dec 31, 2020

looks like something is failing

I used the same name for a base file between two folders - fixed now

@arw2019
Copy link
Contributor Author

arw2019 commented Dec 31, 2020

This patch:

(pandas-dev) andrewwieteska@Andrews-MacBook-Pro pandas % pytest pandas/tests/io/parser/dtypes pandas/tests/io/parser/usecols ===================================================== test session starts ===================================================== platform darwin -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 rootdir: /Users/andrewwieteska/repos/pandas, configfile: setup.cfg plugins: forked-1.2.0, xdist-2.1.0, cov-2.10.1, asyncio-0.14.0, hypothesis-5.41.2, instafail-0.4.1 collected 372 items pandas/tests/io/parser/dtypes/test_categorical.py ..................................................................... [ 18%] ..................... [ 24%] pandas/tests/io/parser/dtypes/test_dtypes_basic.py .................................................................... [ 42%] ....... [ 44%] pandas/tests/io/parser/dtypes/test_empty.py ................................................ [ 57%] pandas/tests/io/parser/usecols/test_parse_dates.py ........................... [ 64%] pandas/tests/io/parser/usecols/test_strings.py .................. [ 69%] pandas/tests/io/parser/usecols/test_usecols_basic.py .................................................................. [ 87%] ................................................ [100%] ==================================================== 372 passed in 30.93s ===================================================== 

On master:

(pandas-dev) andrewwieteska@Andrews-MacBook-Pro pandas % pytest pandas/tests/io/parser/test_dtypes.py pandas/tests/io/parser/test_usecols.py ===================================================== test session starts ===================================================== platform darwin -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 rootdir: /Users/andrewwieteska/repos/pandas, configfile: setup.cfg plugins: forked-1.2.0, xdist-2.1.0, cov-2.10.1, asyncio-0.14.0, hypothesis-5.41.2, instafail-0.4.1 collected 372 items pandas/tests/io/parser/test_dtypes.py ................................................................................. [ 21%] ....................................................................................................................... [ 53%] ............. [ 57%] pandas/tests/io/parser/test_usecols.py ................................................................................ [ 78%] ............................................................................... [100%] ==================================================== 372 passed in 28.89s ===================================================== 
@arw2019
Copy link
Contributor Author

arw2019 commented Dec 31, 2020

Green

@jreback jreback merged commit 9b47091 into pandas-dev:master Dec 31, 2020
@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

thanks

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
* test reorg * test reorg * split test_dtypes.py into multiple files * split test_usecols.py into multiple files * dedeuplicate base filenames * complete file renaming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO CSV read_csv, to_csv Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite

3 participants