Prevent starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 #124732

mosche · 2025-03-13T11:18:41Z

This PR changes EsExecutors.newScaling to not use ExecutorScalingQueue if max pool size is 1 or equals core pool size and rather use a regular LinkedTransferQueue. This fixes the known cases of #124667.

The critical configuration having caused Scaling EsExecutors with core size 0 might starve work due to missing workers #124667 in masterService#updateTask (and similar) is core pool size = 0 and max pool size = 1. After rejecting a task offer in ExecutorScalingQueue while being at max pool size (of 1) so that a new worker cannot be added. This worker might timeout just about at the same time the task is then force queued via ForceQueuePolicy, causing it to starve on the queue until forever (or another task is submitted).
If core pool size = max pool size, ExecutorScalingQueue behaves the same as a regular LinkedTransferQueue. While this isn't affected by the bug, we shouldn't be using ExecutorScalingQueue unless explicitly required to scale up until max pool size.

If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in ForceQueuePolicy.

Note, at the core this issue arises from using unbounded queues with scaling executors. As explained in further details in the Javadocs this requires error-prone customizations rather than relying on solid and proven JDK impls. #18613 captures necessary, but long outstanding improvements.

Additionally, validation of thread_pool settings was improved in ScalingExecutorBuilder, enforcing:

core size >= 0
max size >= 1
keep alive time >= 0

These are non-breaking improvements. Invalid values previously resulted in a cryptic IllegalArgumentException when initializing the thread pool and prevented a node to start.

Relates to ES-10640

…sk and TimestampFieldMapperService#updateTask

…ToZero if core = 0 / max = 1

…ving workers

elasticsearchmachine · 2025-03-13T11:19:07Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticsearchmachine · 2025-03-13T11:19:32Z

Hi @mosche, I've created a changelog YAML for you. Note that since this PR is labelled >breaking, you need to update the changelog YAML to fill out the extended information sections.

DaveCTurner

This area just gets more and more mysterious tbh. I'm sure I could reason through it again, given enough time, but still I'd rather we put some effort into describing the overall design and the constraints that mean we cannot just use standard JDK stuff here throughout.

server/src/main/java/org/elasticsearch/threadpool/ScalingExecutorBuilder.java

server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java

mosche · 2025-03-13T12:04:05Z

Wow, this is getting more and more interesting, the test case is failing on CI even using core=1/max=1 with allowCoreThreadTimeOut=true. In this case, it's the default JDK behavior without any ES added magic. Wondering if it's a JDK bug after all ....

elasticsearchmachine · 2025-03-13T13:19:53Z

Hi @mosche, I've updated the changelog YAML for you.

mosche · 2025-03-16T16:07:32Z

server/src/test/java/org/elasticsearch/search/SearchServiceSingleNodeTests.java

- return Settings.builder().put("search.default_search_timeout", "5s").build();
+ return Settings.builder()
+ .put("search.default_search_timeout", "5s")
+ .put("thread_pool.search.size", SEARCH_POOL_SIZE) // customized search pool size, reconfiguring at runtime is unsupported
+ .build();


I think that without adjusting the max pool size, we may not be testing what we want to test here, because the creation of slices to parallelize search execution depends on max pool size (as well as other details)

@javanna max (=core) pool size is adjusted by means of above. This is now done deterministically for all tests, previously the change applied to all tests running after testSlicingBehaviourForParallelCollection.

…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613

elasticsearchmachine · 2025-03-16T16:45:47Z

💔 Backport failed

Status	Branch	Result
❌	8.18	Commit could not be cherrypicked due to conflicts
❌	8.x	Commit could not be cherrypicked due to conflicts
✅	9.0

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 124732

…h core pool size = 0 (#124732) (#124965) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613

* main: (95 commits) Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testLifecycleAppliedToFailureStore elastic#124999 Merge template mappings properly during validation (elastic#124784) [Build] Rework internal build plugin plugin to work with Isolated Projects (elastic#123461) [Build] Require reason for usesDefaultDistribution (elastic#124707) Mute org.elasticsearch.packaging.test.DockerTests test011SecurityEnabledStatus elastic#124990 Mute org.elasticsearch.xpack.ilm.TimeSeriesDataStreamsIT testRolloverAction elastic#124987 Mute org.elasticsearch.packaging.test.BootstrapCheckTests test10Install elastic#124957 Mute org.elasticsearch.integration.DataStreamLifecycleServiceRuntimeSecurityIT testRolloverLifecycleAndForceMergeAuthorized elastic#124978 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQuery elastic#124977 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672 Mention zero-window state in networking docs (elastic#124967) Remove remoteAddress field from TransportResponse (elastic#120016) Include failures in partial response (elastic#124929) Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (elastic#124732) Re-enable analysis stemmer test (elastic#124961) Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocalNoRemotes elastic#124959 ESQL: Catch parsing exception (elastic#124958) ESQL: Improve error message for ( and [ (elastic#124177) Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {lookup-join.MvJoinKeyFromRow SYNC} elastic#124951 Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testErrorRecordingOnRetention elastic#124950 ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java # server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java

mosche · 2025-03-17T17:33:38Z

@rjernst for how far back should this be backported?

rjernst · 2025-03-18T01:02:02Z

If we can get it to 8.18/8.19 that would be good for long term maintainability.

…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613 (cherry picked from commit 36874e8) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java

mosche · 2025-03-18T08:04:54Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x
✅	8.18

Questions ?

Please refer to the Backport tool documentation

…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613 (cherry picked from commit 36874e8) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java

mosche · 2025-03-18T08:13:58Z

💚 All backports created successfully

Status	Branch	Result
✅	8.17
✅	8.16

Questions ?

Please refer to the Backport tool documentation

…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613 (cherry picked from commit 36874e8) # Conflicts: # server/src/main/java/org/elasticsearch/threadpool/ScalingExecutorBuilder.java # test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java

…h core pool size = 0 (#124732) (#125066) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613 (cherry picked from commit 36874e8) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java

…h core pool size = 0 (#124732) (#125067) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613 (cherry picked from commit 36874e8) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java

…tor with core pool size = 0 (#124732) (#125069) * Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613 (cherry picked from commit 36874e8) # Conflicts: # server/src/main/java/org/elasticsearch/threadpool/ScalingExecutorBuilder.java # test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java * remove timeout * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

…tor with core pool size = 0 (#124732) (#125068) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613

…utor with core pool size = 0 (elastic#124732)" This reverts commit 36874e8.

…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613

mosche added 5 commits March 13, 2025 12:01

Reproduction by DCT

c01a600

Use new EsExecutors.newSingleScalingToZero for masterService#updateTa…

ab05173

…sk and TimestampFieldMapperService#updateTask

increase min core size for various test executors from 0 to 1

a59bdc7

Improve validation in ScalingExecutorBuilder and use newSingleScaling…

090e104

…ToZero if core = 0 / max = 1

Don't allow EsExecutors.newScaling for min=0 / max=1 to not risk star…

6ce4fb7

…ving workers

mosche added >bug >breaking :Core/Infra/Core Core issues without another label labels Mar 13, 2025

mosche requested review from a team and DaveCTurner March 13, 2025 11:18

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Mar 13, 2025

elasticsearchmachine added the v9.1.0 label Mar 13, 2025

Update docs/changelog/124732.yaml

59ccab1

mosche added auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.0.1 v8.17.4 and removed v8.17.4 labels Mar 13, 2025

DaveCTurner reviewed Mar 13, 2025

View reviewed changes

remove newSingleScalingToZero / usage of allowCoreThreadTimeOut

5b9be6c

mosche removed the >breaking label Mar 13, 2025

Update docs/changelog/124732.yaml

77a44bb

mosche added 3 commits March 13, 2025 14:25

revert obsolete changes

b7ff07d

revert obsolete changes

7954b6b

fix rawtype

8fe724c

mosche added 2 commits March 14, 2025 12:19

PR comments

97fefa6

Merge branch 'main' into ktlo/esExecutorBug

19d1765

mosche commented Mar 16, 2025

View reviewed changes

mosche merged commit 36874e8 into elastic:main Mar 16, 2025
16 checks passed

mosche mentioned this pull request Mar 16, 2025

[9.0] Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (#124732) #124965

Merged

elasticsearchmachine added the backport pending label Mar 16, 2025

mosche mentioned this pull request Mar 18, 2025

[8.x] Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (#124732) #125066

Merged

mosche mentioned this pull request Mar 18, 2025

[8.18] Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (#124732) #125067

Merged

mosche added v8.16.6 v8.17.4 labels Mar 18, 2025

mosche mentioned this pull request Mar 18, 2025

[8.17] Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (#124732) #125068

Merged

mosche mentioned this pull request Mar 18, 2025

[8.16] Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (#124732) #125069

Merged

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 24, 2025

Revert "Prevent work starvation bug if using scaling EsThreadPoolExec…

be15267

…utor with core pool size = 0 (elastic#124732)" This reverts commit 36874e8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 #124732

Prevent starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 #124732

Uh oh!

mosche commented Mar 13, 2025 •

edited

Loading

elasticsearchmachine commented Mar 13, 2025

elasticsearchmachine commented Mar 13, 2025

DaveCTurner left a comment

Uh oh!

Uh oh!

Uh oh!

mosche commented Mar 13, 2025 •

edited

Loading

elasticsearchmachine commented Mar 13, 2025

mosche Mar 16, 2025

Uh oh!

elasticsearchmachine commented Mar 16, 2025

mosche commented Mar 17, 2025

rjernst commented Mar 18, 2025

mosche commented Mar 18, 2025

mosche commented Mar 18, 2025

Labels

6 participants

Prevent starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 #124732

Prevent starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 #124732

Uh oh!

Conversation

mosche commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Mar 13, 2025

elasticsearchmachine commented Mar 13, 2025

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mosche commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Mar 13, 2025

mosche Mar 16, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Mar 16, 2025

💔 Backport failed

mosche commented Mar 17, 2025

rjernst commented Mar 18, 2025

mosche commented Mar 18, 2025

💚 All backports created successfully

Questions ?

mosche commented Mar 18, 2025

💚 All backports created successfully

Questions ?

Labels

6 participants

mosche commented Mar 13, 2025 •

edited

Loading

mosche commented Mar 13, 2025 •

edited

Loading