Skip to content

Conversation

@larsoner
Copy link
Collaborator

Not sure this is the best solution but it at least allows things in MNE-Python to pass. Turns out the real issue was us subclassing list and that having to associated file. But I added another test that failed a different way along the way as well for where we create functions dynamically.

cc @stefmolin @stefanv since you've probably thought about this more than I have! And @jarrodmillman since we should probably get some form of this in for 1.6.

BTW the time to test docstrings in MNE-python went from this before #476:

5.15s call mne/tests/test_docstring_parameters.py::test_docstring_parameters 

to:

49.58s call mne/tests/test_docstring_parameters.py::test_docstring_parameters 

So we should indeed prioritize time/efficiency at some point soon :)

dict[int, list[str]]
Mapping of line number to a list of checks to ignore.
"""
with open(filepath) as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tweak this slightly.

numpydoc_ignore_comments = {} try: with open(filepath) as file: last_declaration = 1 declarations = ["def", "class"] for token in tokenize.generate_tokens(file.readline): if token.type == tokenize.NAME and token.string in declarations: last_declaration = token.start[0] if token.type == tokenize.COMMENT: match = re.match(IGNORE_COMMENT_PATTERN, token.string) if match: rules = match.group(1).split(",") numpydoc_ignore_comments[last_declaration] = rules except (OSError, TypeError): # can be None, nonexistent, or unreadable pass return numpydoc_ignore_comments
Copy link
Collaborator Author

@larsoner larsoner Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer the current approach as I always try to limit the scope of try/except as much as possible. With your code there is a single return path, which is nice. However the scoping for the try/except is broader, which means you risk accidentally catching a TypeError somewhere in the remaining 10 lines that do more than just "try to open the file". And I don't think the interpretation of the current code is too bad: "try to open the file and if you can't, return an empty dict. if you can, proceed to parse it"

@larsoner
Copy link
Collaborator Author

I also realized that extract_ignore_validation_comments was being called for every :py:obj: to be validated, and that this is probably a big overhead when you only need to do it once per source file (as each of those could have tens/hundreds/thousands(?) of docstrings in them). Based on this:

$ find mne-python/mne -name "*.py" | wc -l 609 $ find matplotlib/lib/ -name "*.py" | wc -l 290 $ find pandas/pandas/ -name "*.py" | wc -l 1376 $ find scipy/scipy/ -name "*.py" | wc -l 924 $ find numpy/numpy/ -name "*.py" | wc -l 495 

I added a @functools.lru_cache(maxsize=2000) to extract_ignore_validation_comments. This dropped the time from ~50s to ~10s for MNE-Python, which seems well worth the slight memory usage increase required to store the output dicts.

@jarrodmillman
Copy link
Member

Thanks!

@jarrodmillman jarrodmillman merged commit d9a3dfe into numpy:main Aug 21, 2023
@stefanv stefanv added this to the 1.6.0 milestone Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants