Skip to content

Conversation

@pfebrer
Copy link
Contributor

@pfebrer pfebrer commented Oct 31, 2025

The tests for PET were all using and some of them modifying global variables (DEFAULT_HYPERS and MODEL_HYPERS) which is of course very bad. I realised because running the tests of a single file was passing but running the tests of all files was failing (in a branch).

I just took the approach that some of the tests were already taking, which is to deepcopy the hypers before modifying them.


📚 Documentation preview 📚: https://metatrain--882.org.readthedocs.build/en/882/

@pfebrer pfebrer requested a review from abmazitov as a code owner October 31, 2025 16:02
@pfebrer pfebrer force-pushed the fix_test_notfrozen branch from 6c5f766 to 060e9ae Compare October 31, 2025 16:06
@pfebrer
Copy link
Contributor Author

pfebrer commented Oct 31, 2025

Now that the tests are isolated, they fail 😆 If someone knows how to fix them please go ahead.

@HaoZeke
Copy link
Member

HaoZeke commented Nov 1, 2025

I did a little bit of digging into this... FWIW there are only two tests which need to be run in strict order, the rest can be run independently:

tox -e pet-tests -- src/metatrain/pet/tests/test_{finetuning,regression}.py -vvv -s

So test_finetuing must be run before test_regression otherwise test_regression will fail. I couldn't figure out what exactly finetuning is mutating to cause this mess. FWIW test_finetuning alone does work independently...

So maybe we need a more targeted fix than scattering deep copies, because that's a major performance hit.

@pfebrer
Copy link
Contributor Author

pfebrer commented Nov 1, 2025

Is deepcopying a very shallow dict really a "major performance hit"? I don't see how.

I think the fact that one unit test depends on having ran some other tests on some other file in a given order is much more worrying...

@HaoZeke
Copy link
Member

HaoZeke commented Nov 2, 2025

Is deepcopying a very shallow dict really a "major performance hit"? I don't see how.

I think the fact that one unit test depends on having ran some other tests on some other file in a given order is much more worrying...

Oh definitely the order dependence is much worse. I just meant the fix should probably be targeted at the two tests I suggested

Edit : can't remember what OTOH but we should use a test order randomizer

@pfebrer
Copy link
Contributor Author

pfebrer commented Nov 4, 2025

@abmazitov can you try to fix the PET tests? (while keeping them isolated) I don't know if the PET tests were designed to be ran in a given sequence (in which case we would need to rethink how we define that order) or it was just by pure chance that the tests were passing

@abmazitov
Copy link
Contributor

What do you mean by "keeping them isolated"?

@pfebrer
Copy link
Contributor Author

pfebrer commented Nov 4, 2025

That the fact that a test passes does not depend on having ran some other test previously, unless this order is specified somehow. The isolation is already done, we just needed to do deep copies of the DEFAULT_HYPERS so that each test does not modify this global variable. Now isolating them has shown that some tests were passing just by chance or because there is an implicit order that was assumed.

@pfebrer
Copy link
Contributor Author

pfebrer commented Nov 4, 2025

For full context I was modifying a functionality that is tested by a unit test. After the modification, this test continues to pass. But if I run the full suite of tests it doesn't because there is leakage from one test to another (through this DEFAULT_HYPERS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants