[ML] ML stats failures should not stop the usage API working #91917

droberts195 · 2022-11-24T16:00:07Z

It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system.

This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank.

Fixes #91893

It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Fixes elastic#91893

elasticsearchmachine · 2022-11-24T16:00:31Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2022-11-24T16:00:32Z

Hi @droberts195, I've created a changelog YAML for you.

davidkyle · 2022-11-24T16:33:17Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearningUsageTransportAction.java

+ })),
+ e -> {
+ logger.warn("Failed to get job stats to include in ML usage", e);
+ client.execute(GetDatafeedsStatsAction.INSTANCE, datafeedStatsRequest, datafeedStatsListener);


This is difficult to read but are you intentionally skipping the call to get job configs if get job stats fails.

Yes, because the addJobsUsage method requires both stats and configs, so if either fails then we can't call it.

The X-Pack usage endpoint requires the `monitor` privilege, but we don't want to grant that in our security tests as it would also grant access to ML endpoints which would interfere with testing of access rights them.

…usage_api' into ml_errors_should_not_block_usage_api

dimitris-athanasiou

LGTM

It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of elastic#91917

…91932) It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of #91917

…91933) It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of #91917

…91936) It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of #91917

droberts195 added >bug :ml Machine learning cloud-deploy Publish cloud docker image for Cloud-First-Testing v7.17.8 v8.6.1 v8.7.0 v8.5.3 labels Nov 24, 2022

elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 24, 2022

Update docs/changelog/91917.yaml

17ae4ed

droberts195 added the auto-backport-and-merge label Nov 24, 2022

davidkyle reviewed Nov 24, 2022

View reviewed changes

David Roberts added 2 commits November 24, 2022 16:54

Skip the YAML test with security enabled

dce724e

The X-Pack usage endpoint requires the `monitor` privilege, but we don't want to grant that in our security tests as it would also grant access to ML endpoints which would interfere with testing of access rights them.

Merge remote-tracking branch 'droberts195/ml_errors_should_not_block_…

40d121c

…usage_api' into ml_errors_should_not_block_usage_api

dimitris-athanasiou approved these changes Nov 24, 2022

View reviewed changes

droberts195 mentioned this pull request Nov 24, 2022

.ml indices that are closed prevent Kibana monitoring from displaying. #91893

Closed

droberts195 merged commit 2d74bb7 into elastic:main Nov 24, 2022

droberts195 deleted the ml_errors_should_not_block_usage_api branch November 24, 2022 17:53

droberts195 mentioned this pull request Nov 24, 2022

[8.6] [ML] ML stats failures should not stop the usage API working #91932

Merged

This was referenced Nov 24, 2022

[8.5] [ML] ML stats failures should not stop the usage API working #91933

Merged

[7.17] [ML] ML stats failures should not stop the usage API working #91936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] ML stats failures should not stop the usage API working #91917

[ML] ML stats failures should not stop the usage API working #91917

Uh oh!

droberts195 commented Nov 24, 2022

elasticsearchmachine commented Nov 24, 2022

elasticsearchmachine commented Nov 24, 2022

davidkyle Nov 24, 2022

droberts195 Nov 24, 2022

dimitris-athanasiou left a comment

Labels

4 participants

[ML] ML stats failures should not stop the usage API working #91917

[ML] ML stats failures should not stop the usage API working #91917

Uh oh!

Conversation

droberts195 commented Nov 24, 2022

elasticsearchmachine commented Nov 24, 2022

elasticsearchmachine commented Nov 24, 2022

davidkyle Nov 24, 2022

Choose a reason for hiding this comment

droberts195 Nov 24, 2022

Choose a reason for hiding this comment

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Labels

4 participants