Skip to content

Conversation

@dsaxton
Copy link
Contributor

@dsaxton dsaxton commented Oct 21, 2020

Follow up to #37188

@dsaxton dsaxton added CI Continuous Integration Code Style Code style, linting, code_checks labels Oct 21, 2020
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Oct 21, 2020

Nice! Would it work to make this a local pygrep hook?

Something like

 - id: unwanted-namespace-usage name: Check for unwanted namespace usage in tests language: pygrep entry: |  (?x)  pd\.DataFrame|  pd\.Series  types: [python] files: ^pandas/tests

?

(if I've understood what this does - are you jus checking for instances of pd.DataFrame in the test suite?)

@jreback jreback added this to the 1.2 milestone Oct 21, 2020
@jreback jreback merged commit 021d831 into pandas-dev:master Oct 21, 2020
@jreback
Copy link
Contributor

jreback commented Oct 21, 2020

thanks @dsaxton

yes making this a pre-commit hook would be good

@dsaxton
Copy link
Contributor Author

dsaxton commented Oct 21, 2020

Nice! Would it work to make this a local pygrep hook?

Something like

 - id: unwanted-namespace-usage name: Check for unwanted namespace usage in tests language: pygrep entry: |  (?x)  pd\.DataFrame|  pd\.Series  types: [python] files: ^pandas/tests

?

(if I've understood what this does - are you jus checking for instances of pd.DataFrame in the test suite?)

@MarcoGorelli This is trying to make sure that we aren't using both pd.DataFrame and DataFrame in the same test file. So it's okay if we use pd.DataFrame, just not in this way:

import pandas as pd from pandas import DataFrame df = pd.DataFrame(...)

As far as I can tell there's no way to write this as a single grep command, but maybe there is still a way to add it to pre-commit?

@dsaxton dsaxton deleted the ci-unwanted-pattern-dataframe branch October 21, 2020 13:58
@MarcoGorelli
Copy link
Member

@dsaxton Thanks for explaining! You could do

 - id: namespace name: namespace language: pygrep entry: |  (?x)  (  pd\.DataFrame\( # pd.DataFrame(  .* # match anything  (?<!pd\.) # negative lookbehind: pd.  (?<!\w) # negative lookbehind: any character  DataFrame\( # DataFrame(  )|  (  (?<!pd\.) # negative lookbehind: pd.  (?<!\w) # negative lookbehind: any character  DataFrame\( # DataFrame(  .* # match anything  pd\.DataFrame\( # pd.DataFrame(  )  types: [python] args: [--multiline] files: ^pandas/tests

This brings up a further inconsistency in

diff --git a/pandas/tests/io/test_compression.py b/pandas/tests/io/test_compression.py index 31e9ad4cf..8cbe8d264 100644 --- a/pandas/tests/io/test_compression.py +++ b/pandas/tests/io/test_compression.py @@ -205,8 +205,8 @@ def test_with_missing_lzma_runtime(): import sys import pytest sys.modules['lzma'] = None - import pandas - df = pandas.DataFrame() + import pandas as pd + df = pd.DataFrame()

I'd be tempted to make a custom script if we want to add this to pre-commit though. IMO it'd be worth it, as accidentally using pd.DataFrame instead of DataFrame in tests is likely quite common, and waiting for the CI checks to all pass isn't idea...I'll pick this up later

Anyway, thanks for having done this!

@dsaxton
Copy link
Contributor Author

dsaxton commented Oct 21, 2020

 - id: namespace name: namespace language: pygrep entry: |  (?x)  (  pd\.DataFrame\( # pd.DataFrame(  .* # match anything  (?<!pd\.) # negative lookbehind: pd.  (?<!\w) # negative lookbehind: any character  DataFrame\( # DataFrame(  )|  (  (?<!pd\.) # negative lookbehind: pd.  (?<!\w) # negative lookbehind: any character  DataFrame\( # DataFrame(  .* # match anything  pd\.DataFrame\( # pd.DataFrame(  )  types: [python] args: [--multiline] files: ^pandas/tests

Nice, yeah if it could be done in pre-commit that's probably better. Is there a way to make a single regex extensible to other classes (e.g., Series, Index, etc.) or would those be added hooks?

@MarcoGorelli
Copy link
Member

I'm not sure, I think we'd need a custom Python script which gets called with an argument (e.g. DataFrame), but I'll come back to this

@MarcoGorelli
Copy link
Member

@dsaxton got it, we can do

 language: pygrep entry: | (?x) ( pd\.(\w+)\( # pd.DataFrame( .* # match anything (?<!pd\.) # negative lookbehind: pd. (?<!\w) # negative lookbehind: any character \2\( # DataFrame( )| ( (?<!pd\.) # negative lookbehind: pd. (?<!\w) # negative lookbehind: any character (\w+)\( # DataFrame( .* # match anything pd\.\4\( # pd.DataFrame( ) types: [python] args: [--multiline] files: ^pandas/tests 
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Continuous Integration Code Style Code style, linting, code_checks

3 participants