- Notifications
You must be signed in to change notification settings - Fork 25.6k
Correctly log exceptions that are thrown during cache prewarming #74419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Pinging @elastic/es-distributed (Team:Distributed) |
henningandersen left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| } catch (IOException e) { | ||
| logger.warn(() -> new ParameterizedMessage("{} unable to prewarm file [{}]", shardId, file.physicalName()), e); | ||
| if (submitted == false) { | ||
| completionListener.onFailure(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this attempts to ensure that the recovery state moves to DONE? I guess that is more correct than FINALIZE, but I think we should add a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It attempts to complete the grouped listener but has not other side effect besides logging a warning message if the shard is a relocation target:
Lines 78 to 90 in a5d2251
| if (success && shardRouting.isRelocationTarget()) { | |
| final Runnable preWarmCondition = indexShard.addCleanFilesDependency(); | |
| preWarmListener.whenComplete(v -> preWarmCondition.run(), e -> { | |
| logger.warn( | |
| new ParameterizedMessage( | |
| "pre-warm operation failed for [{}] while it was the target of primary relocation [{}]", | |
| shardRouting.shardId(), | |
| shardRouting | |
| ), | |
| e | |
| ); | |
| preWarmCondition.run(); | |
| }); |
I'm OK with leaving the recovery state in FINALIZE as this is what allowed to detect a non-fully prewarmed shard at the first place. I think that adding more logs would have helped to troubleshoot some issues when accessing repository files.
| Thanks @henningandersen - I've merged it like this, see #74419 (comment) |
…stic#74419) Today if an exception is thrown during the prewarming of (parts of) the files it is simply swallowed. This is confusing because the shard recovery state is left in a FINALIZE state but nothing in the logs can help troubleshooting why the recovery has been interrupted. This commit adds log traces to warn when an exception occurred during the prewarming of a part of a file.
…stic#74419) Today if an exception is thrown during the prewarming of (parts of) the files it is simply swallowed. This is confusing because the shard recovery state is left in a FINALIZE state but nothing in the logs can help troubleshooting why the recovery has been interrupted. This commit adds log traces to warn when an exception occurred during the prewarming of a part of a file.
#74606 Today if an exception is thrown during the prewarming of (parts of) the files it is simply swallowed. This is confusing because the shard recovery state is left in a FINALIZE state but nothing in the logs can help troubleshooting why the recovery has been interrupted. This commit adds log traces to warn when an exception occurred during the prewarming of a part of a file. Backport of #74419
…#74604 Today if an exception is thrown during the prewarming of (parts of) the files it is simply swallowed. This is confusing because the shard recovery state is left in a FINALIZE state but nothing in the logs can help troubleshooting why the recovery has been interrupted. This commit adds log traces to warn when an exception occurred during the prewarming of a part of a file. Backport of #74419
Today if an exception is thrown during the prewarming of (parts of) the files it is simply swallowed. This is confusing because the shard recovery state is left in a
FINALIZEstate but nothing in the logs can help troubleshooting why the recovery has been interrupted.This pull request adds log traces to warn when an exception occurred during the prewarming of a part of a file. In the issue we
encountered it would have logged something like this: