- Notifications
You must be signed in to change notification settings - Fork 25.5k
Track & log when there is insufficient disk space available to execute merges #131711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track & log when there is insufficient disk space available to execute merges #131711
Conversation
enqueued merge tasks
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
Hi @albertzaharovits, I've created a changelog YAML for you. |
assert super.lock.isHeldByCurrentThread(); | ||
Tuple<MergeTask, Long> head = enqueuedByBudget.peek(); | ||
if (head != null && head.v2() > availableBudget) { | ||
LOGGER.warn("There are merge tasks enqueued but there's insufficient disk space available to execute them"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These log messages will be printed every indices.merge.disk.check_interval
(5 sec). That sounds Ok to you, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to include the availableBudget
and head.v2()
values in the log message too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, thanks for suggesting! Pushed: d334433
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me but I'm probably not best-placed to review this properly. Could you try someone from the distrib/indexing team instead?
Thanks for looking, David! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left minor comments
assert super.lock.isHeldByCurrentThread(); | ||
Tuple<MergeTask, Long> head = enqueuedByBudget.peek(); | ||
if (head != null && head.v2() > availableBudget) { | ||
LOGGER.warn("There are merge tasks enqueued but there's insufficient disk space available to execute them"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to include the availableBudget
and head.v2()
values in the log message too?
Locale.ROOT, | ||
"There are no merge tasks currently running, " | ||
+ "but there are [%d] enqueued ones that are blocked because of insufficient disk space", | ||
enqueuedByBudget.size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, would it be useful to have the availableBudget and head.v2() values in the log message here?
messageBuilder.append( | ||
"], and there are [" | ||
+ enqueuedByBudget.size() | ||
+ "] additional enqueued ones that are blocked because of insufficient disk space" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here too?
} | ||
} | ||
| ||
void postBudgetUpdate() {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assert super.lock.isHeldByCurrentThread(); here too
…e merges (elastic#131711) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates elastic#88606 (comment)
…e merges (elastic#131711) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates elastic#88606 (comment)
…e merges (#131711) (#131772) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates #88606 (comment)
…e merges (#131711) (#131773) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates #88606 (comment)
…e merges (#131711) (#131774) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates #88606 (comment)
This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps).
This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks
indices.merge.disk.watermark.high
, and possibly consider implementing aborting already executing merges too.Relates #88606 (comment)