Skip to content

Slow collection time when tests are not in a relative folder to the current working folder #13420

@sashko1988

Description

@sashko1988

Created after this discussion - #13413

OSes, python and pytest versions

OS: macOS 15.4.1, Ubuntu 22.04
Python 3.12.8
Pytest 8.3.4

Problem description

I need to execute a lot of non-python tests that are stored in folders with lots of nesting. And I found that Pytest struggles during the collection.

Some code context:

@pytest.hookimpl(wrapper=True) def pytest_collection(session): resolved_paths = resolve_suites(session) session.config.args.extend(resolved_paths) return (yield) def pytest_collect_file(parent, file_path): if file_path.suffix == ".yaml": return YamlFile.from_parent(parent, path=file_path) class YamlFile(pytest.File): def collect(self) -> Iterable[pytest.Item | pytest.Collector]: test_cases = YamlTestResolver().from_file(f"{self.path}") # leftover from previous runner, but resolves needed stuff. for tc in test_cases: yield YamlTest.from_parent(self, name=tc.name, tc_spec=tc) class YamlTest(pytest.Item): def __init__(self, ptul_tc, **kwargs) -> None: super().__init__(**kwargs) self.tc_spec = tc_spec

Consider this folder structure:

root_working_folder ├── framework_repo │ └── framework_internal_folder └── repo_with_tests └── tests ├── test_folder_1 │ └── inner_folder └── test_folder_2 └── inner_folder └── even_more_depth 

But even more subfolders in repo_with_tests

Pytest call is the following: pytest --collect only ${list with 1k non-python tests}. (1 test per file)

When I execute the above from framework_internal_folder, the execution time is 56 minutes with cProfile, 23 minutes without. When I make the same call from root_working_folder or repo_with_tests, the execution time is ~2 minutes with with cProfile / 38 seconds without.

The most significant time difference in the two calls is in the cumulative time of that function - nodes.py:546(_check_initialpaths_for_relpath)

# from framework_internal_folder ncalls tottime percall cumtime percall filename:lineno(function) 237033 85.508 0.000 3176.004 0.013 ../_pytest/nodes.py:546(_check_initialpaths_for_relpath) # from root_working_folder ncalls tottime percall cumtime percall filename:lineno(function) 135 0.051 0.000 1.772 0.013 ../_pytest/nodes.py:546(_check_initialpaths_for_relpath) 

According to stats, when executing from framework_internal_folder, the most struggling function is here:

ncalls tottime percall cumtime percall filename:lineno(function) 206063304 262.471 0.000 1580.672 0.000 ../_pytest/pathlib.py:990(commonpath) # and stats for callers of that function: Function was called by... ncalls tottime cumtime pathlib.py:990(commonpath) <- 205937164/3722648 262.308 28.844 nodes.py:546(_check_initialpaths_for_relpath) 

Possible solutions

Cache for _check_initialpaths_for_relpath

I experimented with adding lru_cache to _check_initialpaths_for_relpath:

@lru_cache(maxsize=1000) def _check_initialpaths_for_relpath(initialpaths: frozenset[Path], path: Path) -> str | None: for initial_path in initialpaths: if commonpath(path, initial_path) == initial_path: rel = str(path.relative_to(initial_path)) return "" if rel == "." else rel return None

That change decreased the overall collection time to 4 minutes.

Stats are also impressive:

 ncalls tottime percall cumtime percall filename:lineno(function) 5798 2.109 0.000 79.265 0.014 nodes.py:545(_check_initialpaths_for_relpath) 

I'm not sure if commonpath needs caching as well.

Anything else on the collection mechanism?

Other optimizations in directory/file collections

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: collectionrelated to the collection phasetype: performanceperformance or memory problem/improvement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions