Skip to content

Conversation

DonalEvans
Copy link
Contributor

AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs.

  • Refactor notifyQueueRunnables() to allow PriorityProcessWorkerExecutorService to notify the AbstractRunnable contained within queued OrderedRunnables
  • Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start()
  • Add unit tests

Closes #134651

AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Refactor notifyQueueRunnables() to allow PriorityProcessWorkerExecutorService to notify the AbstractRunnable contained within queued OrderedRunnables - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests Closes elastic#134651
@DonalEvans DonalEvans added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.6 v9.1.6 v9.0.9 v8.18.9 v9.2.1 v9.3.0 labels Oct 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @DonalEvans, I've created a changelog YAML for you.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, left one suggestion

}
}

protected abstract void notifyIfAbstractRunnable(T runnable, Exception ex, String msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having AbstractProcessWorkerExecutorService contain the logic for notifyIfAbstractRunnable and relying on an abstract method like getAsAbstractRunnable which either returns the AbstractRunnable or null. That way child classes don't need to call back into the parent to do the notification, they would only return the abstract runnable if it was one.

Something like:

 protected abstract AbstractRunnable getAsAbstractRunnable(T runnable); private void notifyIfAbstractRunnable(T runnable, Exception ex, String msg) { var abstractRunnable = getAsAbstractRunnable(runnable); if (abstractRunnable != null) { notifyAbstractRunnable(ex, msg, abstractRunnable); } } 

Then PriorityProcessWorkerExecutorService would have something like:

 @Override protected AbstractRunnable getAsAbstractRunnable(OrderedRunnable orderedRunnable, Exception ex, String msg) { // The runnable contained within OrderedRunnable is always an AbstractRunnable, so no need to check the type return orderedRunnable.runnable(); } 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this is better

* most cases will be the insertion order
*/
public record OrderedRunnable(RequestPriority priority, long tieBreaker, Runnable runnable)
protected record OrderedRunnable(RequestPriority priority, long tieBreaker, AbstractRunnable runnable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is make OrderedRunnable extend AbstractRunnable and implement the required methods as calling the runnable parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution adds more code than the original one, but I think it might be better, since it simplifies the logic for determining what we need to notify.

@DonalEvans DonalEvans merged commit d3d013e into elastic:main Oct 7, 2025
34 checks passed
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Oct 7, 2025
…elastic#135966) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes elastic#134651
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.2
8.19 Commit could not be cherrypicked due to conflicts
9.1 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 135966

DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Oct 7, 2025
…elastic#135966) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes elastic#134651 (cherry picked from commit d3d013e)
@DonalEvans
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.1
9.0
8.19
8.18

Questions ?

Please refer to the Backport tool documentation

DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Oct 7, 2025
…elastic#135966) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes elastic#134651 (cherry picked from commit d3d013e)
DonalEvans added a commit to DonalEvans/elasticsearch that referenced this pull request Oct 7, 2025
…elastic#135966) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes elastic#134651 (cherry picked from commit d3d013e)
elasticsearchmachine pushed a commit that referenced this pull request Oct 7, 2025
…#135966) (#136122) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes #134651
elasticsearchmachine pushed a commit that referenced this pull request Oct 7, 2025
…#135966) (#136126) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes #134651 (cherry picked from commit d3d013e)
elasticsearchmachine pushed a commit that referenced this pull request Oct 7, 2025
…#135966) (#136128) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes #134651 (cherry picked from commit d3d013e)
elasticsearchmachine pushed a commit that referenced this pull request Oct 7, 2025
…#135966) (#136125) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes #134651 (cherry picked from commit d3d013e)
elasticsearchmachine pushed a commit that referenced this pull request Oct 7, 2025
…#135966) (#136127) AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs. - Make OrderedRunnable extend AbstractRunnable and pass onFailure() and onRejection() calls to the AbstractRunnable it wraps - Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start() - Add unit tests - Update docs/changelog/135966.yaml Closes #134651 (cherry picked from commit d3d013e)
@DonalEvans DonalEvans deleted the 134651-reindex-hang branch October 8, 2025 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >bug :ml Machine learning Team:ML Meta label for the ML team v8.18.9 v8.19.6 v9.0.9 v9.1.6 v9.2.0 v9.2.1 v9.3.0

4 participants