Skip to content

Conversation

@frank-dong-ms-zz
Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz commented Nov 28, 2019

Add MLNETFactAttribute and replace default Fact Attribute:

  1. every test case will retry for 3 times at max by default if the test fails
  2. log flaky tests to a sql db to better estimate which tests are fail often
  3. add default timeout for every tests
  4. refine logging
@codecov
Copy link

codecov bot commented Dec 4, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@9fc8f1c). Click here to learn what that means.
The diff coverage is 41.04%.

@@ Coverage Diff @@ ## master #4510 +/- ## ========================================= Coverage ? 75.08% ========================================= Files ? 913 Lines ? 160347 Branches ? 17269 ========================================= Hits ? 120390 Misses ? 35140 Partials ? 4817
Flag Coverage Δ
#Debug 75.08% <41.04%> (?)
#production 70.51% <ø> (?)
#test 90.08% <41.04%> (?)
Impacted Files Coverage Δ
test/Microsoft.ML.Functional.Tests/DataIO.cs 100% <ø> (ø)
test/Microsoft.ML.Functional.Tests/Training.cs 100% <ø> (ø)
...osoft.ML.CodeAnalyzer.Tests/Code/BestFriendTest.cs 100% <ø> (ø)
...Microsoft.ML.Tests/Transformers/NormalizerTests.cs 100% <ø> (ø)
...crosoft.ML.AutoML.Tests/TransformInferenceTests.cs 100% <ø> (ø)
...est/Microsoft.ML.Core.Tests/UnitTests/DataTypes.cs 99.35% <ø> (ø)
...est/Microsoft.ML.Core.Tests/UnitTests/TestHosts.cs 100% <ø> (ø)
...ios/IrisPlantClassificationWithStringLabelTests.cs 98.63% <ø> (ø)
...t/Microsoft.ML.Core.Tests/UnitTests/ColumnTypes.cs 75.3% <ø> (ø)
test/Microsoft.ML.TimeSeries.Tests/TimeSeries.cs 87.85% <ø> (ø)
... and 134 more
@frank-dong-ms-zz frank-dong-ms-zz marked this pull request as ready for review December 5, 2019 04:14
@frank-dong-ms-zz frank-dong-ms-zz requested review from a team as code owners December 5, 2019 04:14
@frank-dong-ms-zz frank-dong-ms-zz changed the title Only for test - test add fail retry for all tests add fail retry for failed tests Dec 5, 2019
public class FileLoaderTests
{
[Fact]
[MLNETFact]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should only be applied to tests that are known to be flaky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+💯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. And I think that was the second proposal. To apply this only to tests known to be flaky and if we still see failures, to retry the whole configuration.
Can you please amend the pull request as per your second proposal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Actually in the dotnet/runtime repo we kept it simple instead of adding a FactAttribute a RetryHelper class which forces people to use that when a test is flaky and it has to be justified.

Also, that way you can also use it for theory tests not only facts.

https://github.com/dotnet/runtime/blob/master/src/libraries/Common/tests/CoreFx.Private.TestUtilities/System/RetryHelper.cs#L11

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we stick with an attribute I believe the name should express what the attribute does.

For example: RetryFactAttribute or RetryAbleFactAttribute.

}

//[Fact]
//[RetryFact]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Stale comment.

💡 Tests should be disabled with a Skip attribute, not by commenting out the code.

{
/// <summary>
/// Number of retries allowed for a failed test. If unset (or set less than 1), will
/// default to 3 attempts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues here:

  1. If this attribute is used for all tests, it needs to default to 1
  2. If this attribute is used only for tests known to be flaky, the maximum acceptable default is 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intend to retry all failed test cases and log all the failing for investigate. All retry is to unblock every one from passing their PR build for now as failing is random and frequent and investigating flaky test can be time consuming.


In reply to: 354508994 [](ancestors = 354508994)


namespace Microsoft.ML.TestFrameworkCommon
{
public static class Centrallizedlogger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ Is this used by local developer builds or only on CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both


In reply to: 354509412 [](ancestors = 354509412)

TestName = test.TestCase.TestMethod.Method.Name;

// write to the console when a test starts and stops so we can identify any test hangs/deadlocks in CI
Console.WriteLine($"Starting test: {FullTestName}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ Are these getting logged anywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, logged in MLNETTestCase class, not all test class inherinet from this base test class but we can change all tests to use MLNETFact to logging start and end of test


In reply to: 354511751 [](ancestors = 354511751)

{
public class FileLoaderTests
{
[Fact]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Fact can do more than just retry failed tests, example: better place to log start/end test, set default timeout for every test, exception handling etc...
we can filter tests known to be flaky and only retry them in MLNETTestCase Class but I found some test case are randomly failing no obvious pattern except some few tests are failing a lot. So what I do here is retry every failed tests and log all fails into a db even the test success after retry. This can be more reliable way to tell which tests are flaky.

@frank-dong-ms-zz frank-dong-ms-zz deleted the retry-tests branch February 3, 2020 21:36
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

6 participants