Closed
Description
Bug report
There is a private function _splitlines_no_ff
which is only ever called in ast.get_source_segment
. This functions splits the entire source given to it, but ast.get_source_segment
only needs at most node.end_lineo
lines to work.
Lines 308 to 330 in 1acdfec
Lines 344 to 378 in 1acdfec
If, for example, you want to extract an import line from a very long file, this can seriously degrade performance.
The introduction of a max_lines
kwarg in _splitlines_no_ff
which functions like maxsplit
in str.split
would minimize unneeded work. An implementation of the proposed fix is below (which makes my use case twice as fast):
--- a/Lib/ast.py +++ b/Lib/ast.py @@ -305,11 +305,16 @@ def get_docstring(node, clean=True): return text -def _splitlines_no_ff(source): +def _splitlines_no_ff(source, max_lines=-1): """Split a string into lines ignoring form feed and other chars. This mimics how the Python parser splits source code. + + If max_lines is given, at most max_lines will be returned. If max_lines is not + specified or negative, then there is no limit on the number of lines returned. """ + if not max_lines: + return [] idx = 0 lines = [] next_line = '' @@ -323,6 +328,8 @@ def _splitlines_no_ff(source): idx += 1 if c in '\r\n': lines.append(next_line) + if max_lines == len(lines): + return lines next_line = '' if next_line: @@ -360,7 +367,7 @@ def get_source_segment(source, node, *, padded=False): except AttributeError: return None - lines = _splitlines_no_ff(source) + lines = _splitlines_no_ff(source, max_lines=end_lineno + 1) if end_lineno == lineno: return lines[lineno].encode()[col_offset:end_col_offset].decode()
Your environment
- CPython versions tested on: 3.11