add fail retry for failed tests #4510

frank-dong-ms-zz · 2019-11-28T00:09:53Z

Add MLNETFactAttribute and replace default Fact Attribute:

every test case will retry for 3 times at max by default if the test fails
log flaky tests to a sql db to better estimate which tests are fail often
add default timeout for every tests
refine logging

codecov · 2019-12-04T01:27:57Z

Codecov Report

❗ No coverage uploaded for pull request base (master@9fc8f1c). Click here to learn what that means.
The diff coverage is 41.04%.

@@ Coverage Diff @@ ## master #4510 +/- ## ========================================= Coverage ? 75.08% ========================================= Files ? 913 Lines ? 160347 Branches ? 17269 ========================================= Hits ? 120390 Misses ? 35140 Partials ? 4817

Flag	Coverage Δ
#Debug	`75.08% <41.04%> (?)`
#production	`70.51% <ø> (?)`
#test	`90.08% <41.04%> (?)`

Impacted Files	Coverage Δ
test/Microsoft.ML.Functional.Tests/DataIO.cs	`100% <ø> (ø)`
test/Microsoft.ML.Functional.Tests/Training.cs	`100% <ø> (ø)`
...osoft.ML.CodeAnalyzer.Tests/Code/BestFriendTest.cs	`100% <ø> (ø)`
...Microsoft.ML.Tests/Transformers/NormalizerTests.cs	`100% <ø> (ø)`
...crosoft.ML.AutoML.Tests/TransformInferenceTests.cs	`100% <ø> (ø)`
...est/Microsoft.ML.Core.Tests/UnitTests/DataTypes.cs	`99.35% <ø> (ø)`
...est/Microsoft.ML.Core.Tests/UnitTests/TestHosts.cs	`100% <ø> (ø)`
...ios/IrisPlantClassificationWithStringLabelTests.cs	`98.63% <ø> (ø)`
...t/Microsoft.ML.Core.Tests/UnitTests/ColumnTypes.cs	`75.3% <ø> (ø)`
test/Microsoft.ML.TimeSeries.Tests/TimeSeries.cs	`87.85% <ø> (ø)`
... and 134 more

sharwell · 2019-12-05T19:32:02Z

test/Microsoft.Extensions.ML.Tests/FileLoaderTests.cs

 public class FileLoaderTests
 {
- [Fact]
+ [MLNETFact]


IMO this should only be applied to tests that are known to be flaky.

I agree. And I think that was the second proposal. To apply this only to tests known to be flaky and if we still see failures, to retry the whole configuration.
Can you please amend the pull request as per your second proposal?

I agree. Actually in the dotnet/runtime repo we kept it simple instead of adding a FactAttribute a RetryHelper class which forces people to use that when a test is flaky and it has to be justified.

Also, that way you can also use it for theory tests not only facts.

https://github.com/dotnet/runtime/blob/master/src/libraries/Common/tests/CoreFx.Private.TestUtilities/System/RetryHelper.cs#L11

Also, if we stick with an attribute I believe the name should express what the attribute does.

For example: RetryFactAttribute or RetryAbleFactAttribute.

sharwell · 2019-12-05T19:34:50Z

test/Microsoft.ML.Core.Tests/UnitTests/TestEntryPoints.cs

 }

- //[Fact]
+ //[RetryFact]


📝 Stale comment.

💡 Tests should be disabled with a Skip attribute, not by commenting out the code.

sharwell · 2019-12-05T19:37:53Z

test/Microsoft.ML.TestFrameworkCommon/Attributes/MLNETFactAttribute.cs

+ {
+ /// <summary>
+ /// Number of retries allowed for a failed test. If unset (or set less than 1), will
+ /// default to 3 attempts.


Two issues here:

If this attribute is used for all tests, it needs to default to 1

If this attribute is used only for tests known to be flaky, the maximum acceptable default is 2

We intend to retry all failed test cases and log all the failing for investigate. All retry is to unblock every one from passing their PR build for now as failing is random and frequent and investigating flaky test can be time consuming.

In reply to: 354508994 [](ancestors = 354508994)

sharwell · 2019-12-05T19:38:49Z

test/Microsoft.ML.TestFrameworkCommon/Centrallizedlogger.cs

+
+namespace Microsoft.ML.TestFrameworkCommon
+{
+ public static class Centrallizedlogger


❔ Is this used by local developer builds or only on CI?

both

In reply to: 354509412 [](ancestors = 354509412)

sharwell · 2019-12-05T19:43:51Z

test/Microsoft.ML.Functional.Tests/FunctionalTestBaseClass.cs

 TestName = test.TestCase.TestMethod.Method.Name;

- // write to the console when a test starts and stops so we can identify any test hangs/deadlocks in CI
- Console.WriteLine($"Starting test: {FullTestName}");


❔ Are these getting logged anywhere else?

Yes, logged in MLNETTestCase class, not all test class inherinet from this base test class but we can change all tests to use MLNETFact to logging start and end of test

In reply to: 354511751 [](ancestors = 354511751)

frank-dong-ms-zz · 2019-12-06T18:22:14Z

test/Microsoft.Extensions.ML.Tests/FileLoaderTests.cs

 {
 public class FileLoaderTests
 {
- [Fact]


This Fact can do more than just retry failed tests, example: better place to log start/end test, set default timeout for every test, exception handling etc...
we can filter tests known to be flaky and only retry them in MLNETTestCase Class but I found some test case are randomly failing no obvious pattern except some few tests are failing a lot. So what I do here is retry every failed tests and log all fails into a db even the test success after retry. This can be more reliable way to tell which tests are flaky.

frank-dong-ms added 7 commits November 27, 2019 16:07

add retry for all fact tests, default to 3 retry times if test fails

cdcabe1

add try catch to prevent unhandle exception crash test process

b39c50b

add default timeout for each test case

c8ec3a7

add timeout for benchmark tests

9425d8e

rename retry fact to mlnet fact, add retry test list

2ee8681

remove unnecessary reference

6e2900f

logging failed test to db

42612e2

frank-dong-ms added 7 commits December 3, 2019 18:05

try catch native exception

d2f2d7e

fix build error

62d9ffa

fix and rename

2261f1f

try to catch native exception to avoid test process crash

568e708

move the test db to our team's subscription

5c59e2b

undo change to benchmark test

4f05e6c

sync and merge

e14b243

frank-dong-ms-zz marked this pull request as ready for review December 5, 2019 04:14

frank-dong-ms-zz requested review from a team as code owners December 5, 2019 04:14

frank-dong-ms-zz changed the title ~~Only for test - test add fail retry for all tests~~ add fail retry for failed tests Dec 5, 2019

frank-dong-ms-zz requested review from codemzs, eerhardt and harishsk December 5, 2019 19:33

sharwell suggested changes Dec 5, 2019

View reviewed changes

frank-dong-ms-zz closed this Dec 5, 2019

frank-dong-ms-zz reopened this Dec 5, 2019

frank-dong-ms-zz commented Dec 6, 2019

View reviewed changes

frank-dong-ms-zz closed this Dec 13, 2019

frank-dong-ms-zz deleted the retry-tests branch February 3, 2020 21:36

ghost locked as resolved and limited conversation to collaborators Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add fail retry for failed tests #4510

add fail retry for failed tests #4510

Uh oh!

frank-dong-ms-zz commented Nov 28, 2019 •

edited

Loading

codecov bot commented Dec 4, 2019 •

edited

Loading

sharwell Dec 5, 2019

eerhardt Dec 5, 2019

harishsk Dec 5, 2019

safern Dec 5, 2019

safern Dec 5, 2019

sharwell Dec 5, 2019

sharwell Dec 5, 2019

frank-dong-ms-zz Dec 6, 2019

sharwell Dec 5, 2019

frank-dong-ms-zz Dec 6, 2019

sharwell Dec 5, 2019

frank-dong-ms-zz Dec 6, 2019

frank-dong-ms-zz Dec 6, 2019

Labels

6 participants

add fail retry for failed tests #4510

add fail retry for failed tests #4510

Uh oh!

Conversation

frank-dong-ms-zz commented Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

codecov bot commented Dec 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Labels

6 participants

frank-dong-ms-zz commented Nov 28, 2019 •

edited

Loading

codecov bot commented Dec 4, 2019 •

edited

Loading