- Notifications
You must be signed in to change notification settings - Fork 1.9k
add fail retry for failed tests #4510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@ ## master #4510 +/- ## ========================================= Coverage ? 75.08% ========================================= Files ? 913 Lines ? 160347 Branches ? 17269 ========================================= Hits ? 120390 Misses ? 35140 Partials ? 4817
|
| public class FileLoaderTests | ||
| { | ||
| [Fact] | ||
| [MLNETFact] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should only be applied to tests that are known to be flaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. And I think that was the second proposal. To apply this only to tests known to be flaky and if we still see failures, to retry the whole configuration.
Can you please amend the pull request as per your second proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Actually in the dotnet/runtime repo we kept it simple instead of adding a FactAttribute a RetryHelper class which forces people to use that when a test is flaky and it has to be justified.
Also, that way you can also use it for theory tests not only facts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we stick with an attribute I believe the name should express what the attribute does.
For example: RetryFactAttribute or RetryAbleFactAttribute.
| } | ||
| | ||
| //[Fact] | ||
| //[RetryFact] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 Stale comment.
💡 Tests should be disabled with a Skip attribute, not by commenting out the code.
| { | ||
| /// <summary> | ||
| /// Number of retries allowed for a failed test. If unset (or set less than 1), will | ||
| /// default to 3 attempts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two issues here:
- If this attribute is used for all tests, it needs to default to 1
- If this attribute is used only for tests known to be flaky, the maximum acceptable default is 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intend to retry all failed test cases and log all the failing for investigate. All retry is to unblock every one from passing their PR build for now as failing is random and frequent and investigating flaky test can be time consuming.
In reply to: 354508994 [](ancestors = 354508994)
| | ||
| namespace Microsoft.ML.TestFrameworkCommon | ||
| { | ||
| public static class Centrallizedlogger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ Is this used by local developer builds or only on CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| TestName = test.TestCase.TestMethod.Method.Name; | ||
| | ||
| // write to the console when a test starts and stops so we can identify any test hangs/deadlocks in CI | ||
| Console.WriteLine($"Starting test: {FullTestName}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ Are these getting logged anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, logged in MLNETTestCase class, not all test class inherinet from this base test class but we can change all tests to use MLNETFact to logging start and end of test
In reply to: 354511751 [](ancestors = 354511751)
| { | ||
| public class FileLoaderTests | ||
| { | ||
| [Fact] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Fact can do more than just retry failed tests, example: better place to log start/end test, set default timeout for every test, exception handling etc...
we can filter tests known to be flaky and only retry them in MLNETTestCase Class but I found some test case are randomly failing no obvious pattern except some few tests are failing a lot. So what I do here is retry every failed tests and log all fails into a db even the test success after retry. This can be more reliable way to tell which tests are flaky.
Add MLNETFactAttribute and replace default Fact Attribute: