-
- Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Created after this discussion - #13413
OSes, python and pytest versions
OS: macOS 15.4.1, Ubuntu 22.04
Python 3.12.8
Pytest 8.3.4
Problem description
I need to execute a lot of non-python tests that are stored in folders with lots of nesting. And I found that Pytest struggles during the collection.
Some code context:
@pytest.hookimpl(wrapper=True) def pytest_collection(session): resolved_paths = resolve_suites(session) session.config.args.extend(resolved_paths) return (yield) def pytest_collect_file(parent, file_path): if file_path.suffix == ".yaml": return YamlFile.from_parent(parent, path=file_path) class YamlFile(pytest.File): def collect(self) -> Iterable[pytest.Item | pytest.Collector]: test_cases = YamlTestResolver().from_file(f"{self.path}") # leftover from previous runner, but resolves needed stuff. for tc in test_cases: yield YamlTest.from_parent(self, name=tc.name, tc_spec=tc) class YamlTest(pytest.Item): def __init__(self, ptul_tc, **kwargs) -> None: super().__init__(**kwargs) self.tc_spec = tc_spec
Consider this folder structure:
root_working_folder ├── framework_repo │ └── framework_internal_folder └── repo_with_tests └── tests ├── test_folder_1 │ └── inner_folder └── test_folder_2 └── inner_folder └── even_more_depth
But even more subfolders in repo_with_tests
Pytest call is the following: pytest --collect only ${list with 1k non-python tests}
. (1 test per file)
When I execute the above from framework_internal_folder
, the execution time is 56 minutes with cProfile
, 23 minutes without. When I make the same call from root_working_folder
or repo_with_tests
, the execution time is ~2 minutes with with cProfile
/ 38 seconds without.
The most significant time difference in the two calls is in the cumulative time of that function - nodes.py:546(_check_initialpaths_for_relpath)
# from framework_internal_folder ncalls tottime percall cumtime percall filename:lineno(function) 237033 85.508 0.000 3176.004 0.013 ../_pytest/nodes.py:546(_check_initialpaths_for_relpath) # from root_working_folder ncalls tottime percall cumtime percall filename:lineno(function) 135 0.051 0.000 1.772 0.013 ../_pytest/nodes.py:546(_check_initialpaths_for_relpath)
According to stats, when executing from framework_internal_folder, the most struggling function is here:
ncalls tottime percall cumtime percall filename:lineno(function) 206063304 262.471 0.000 1580.672 0.000 ../_pytest/pathlib.py:990(commonpath) # and stats for callers of that function: Function was called by... ncalls tottime cumtime pathlib.py:990(commonpath) <- 205937164/3722648 262.308 28.844 nodes.py:546(_check_initialpaths_for_relpath)
Possible solutions
Cache for _check_initialpaths_for_relpath
I experimented with adding lru_cache
to _check_initialpaths_for_relpath
:
@lru_cache(maxsize=1000) def _check_initialpaths_for_relpath(initialpaths: frozenset[Path], path: Path) -> str | None: for initial_path in initialpaths: if commonpath(path, initial_path) == initial_path: rel = str(path.relative_to(initial_path)) return "" if rel == "." else rel return None
That change decreased the overall collection time to 4 minutes.
Stats are also impressive:
ncalls tottime percall cumtime percall filename:lineno(function) 5798 2.109 0.000 79.265 0.014 nodes.py:545(_check_initialpaths_for_relpath)
I'm not sure if commonpath
needs caching as well.
Anything else on the collection mechanism?
Other optimizations in directory/file collections