- Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] ML stats failures should not stop the usage API working #91917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] ML stats failures should not stop the usage API working #91917
Conversation
It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Fixes elastic#91893
Pinging @elastic/ml-core (Team:ML) |
Hi @droberts195, I've created a changelog YAML for you. |
})), | ||
e -> { | ||
logger.warn("Failed to get job stats to include in ML usage", e); | ||
client.execute(GetDatafeedsStatsAction.INSTANCE, datafeedStatsRequest, datafeedStatsListener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is difficult to read but are you intentionally skipping the call to get job configs if get job stats fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because the addJobsUsage
method requires both stats and configs, so if either fails then we can't call it.
The X-Pack usage endpoint requires the `monitor` privilege, but we don't want to grant that in our security tests as it would also grant access to ML endpoints which would interfere with testing of access rights them.
…usage_api' into ml_errors_should_not_block_usage_api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of elastic#91917
It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of elastic#91917
…91932) It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of #91917
…91933) It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of #91917
…91936) It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system. This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank. Backport of #91917
It is possible to meddle with internal ML state such that calls to the ML stats APIs return errors. It is justifiable for these single purpose APIs to return errors when the internal state of ML is corrupted. However, it is undesirable for these low level problems to completely prevent the overall usage API from returning, because then callers cannot find out usage information from any part of the system.
This change makes errors in the ML stats APIs non-fatal to the overall response of the usage API. When an ML stats APIs returns an error, the corresponding section of the ML usage information will be blank.
Fixes #91893