Skip to content

Conversation

mosche
Copy link
Contributor

@mosche mosche commented Mar 13, 2025

This PR changes EsExecutors.newScaling to not use ExecutorScalingQueue if max pool size is 1 or equals core pool size and rather use a regular LinkedTransferQueue. This fixes the known cases of #124667.

  • The critical configuration having caused Scaling EsExecutors with core size 0 might starve work due to missing workers #124667 in masterService#updateTask (and similar) is core pool size = 0 and max pool size = 1. After rejecting a task offer in ExecutorScalingQueue while being at max pool size (of 1) so that a new worker cannot be added. This worker might timeout just about at the same time the task is then force queued via ForceQueuePolicy, causing it to starve on the queue until forever (or another task is submitted).

  • If core pool size = max pool size, ExecutorScalingQueue behaves the same as a regular LinkedTransferQueue. While this isn't affected by the bug, we shouldn't be using ExecutorScalingQueue unless explicitly required to scale up until max pool size.

If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in ForceQueuePolicy.

Note, at the core this issue arises from using unbounded queues with scaling executors. As explained in further details in the Javadocs this requires error-prone customizations rather than relying on solid and proven JDK impls. #18613 captures necessary, but long outstanding improvements.

Additionally, validation of thread_pool settings was improved in ScalingExecutorBuilder, enforcing:

  • core size >= 0
  • max size >= 1
  • keep alive time >= 0

These are non-breaking improvements. Invalid values previously resulted in a cryptic IllegalArgumentException when initializing the thread pool and prevented a node to start.

Relates to ES-10640

@mosche mosche added >bug >breaking :Core/Infra/Core Core issues without another label labels Mar 13, 2025
@mosche mosche requested review from a team and DaveCTurner March 13, 2025 11:18
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Mar 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @mosche, I've created a changelog YAML for you. Note that since this PR is labelled >breaking, you need to update the changelog YAML to fill out the extended information sections.

@mosche mosche added auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.0.1 v8.17.4 and removed v8.17.4 labels Mar 13, 2025
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This area just gets more and more mysterious tbh. I'm sure I could reason through it again, given enough time, but still I'd rather we put some effort into describing the overall design and the constraints that mean we cannot just use standard JDK stuff here throughout.

@mosche
Copy link
Contributor Author

mosche commented Mar 13, 2025

Wow, this is getting more and more interesting, the test case is failing on CI even using core=1/max=1 with allowCoreThreadTimeOut=true. In this case, it's the default JDK behavior without any ES added magic. Wondering if it's a JDK bug after all ....

@mosche mosche removed the >breaking label Mar 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @mosche, I've updated the changelog YAML for you.

Comment on lines -266 to +271
return Settings.builder().put("search.default_search_timeout", "5s").build();
return Settings.builder()
.put("search.default_search_timeout", "5s")
.put("thread_pool.search.size", SEARCH_POOL_SIZE) // customized search pool size, reconfiguring at runtime is unsupported
.build();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that without adjusting the max pool size, we may not be testing what we want to test here, because the creation of slices to parallelize search execution depends on max pool size (as well as other details)

@javanna max (=core) pool size is adjusted by means of above. This is now done deterministically for all tests, previously the change applied to all tests running after testSlicingBehaviourForParallelCollection.

@mosche mosche merged commit 36874e8 into elastic:main Mar 16, 2025
16 checks passed
mosche added a commit to mosche/elasticsearch that referenced this pull request Mar 16, 2025
…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.18 Commit could not be cherrypicked due to conflicts
8.x Commit could not be cherrypicked due to conflicts
9.0

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 124732

elasticsearchmachine pushed a commit that referenced this pull request Mar 16, 2025
…h core pool size = 0 (#124732) (#124965) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Mar 17, 2025
* main: (95 commits) Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testLifecycleAppliedToFailureStore elastic#124999 Merge template mappings properly during validation (elastic#124784) [Build] Rework internal build plugin plugin to work with Isolated Projects (elastic#123461) [Build] Require reason for usesDefaultDistribution (elastic#124707) Mute org.elasticsearch.packaging.test.DockerTests test011SecurityEnabledStatus elastic#124990 Mute org.elasticsearch.xpack.ilm.TimeSeriesDataStreamsIT testRolloverAction elastic#124987 Mute org.elasticsearch.packaging.test.BootstrapCheckTests test10Install elastic#124957 Mute org.elasticsearch.integration.DataStreamLifecycleServiceRuntimeSecurityIT testRolloverLifecycleAndForceMergeAuthorized elastic#124978 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQuery elastic#124977 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672 Mention zero-window state in networking docs (elastic#124967) Remove remoteAddress field from TransportResponse (elastic#120016) Include failures in partial response (elastic#124929) Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (elastic#124732) Re-enable analysis stemmer test (elastic#124961) Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocalNoRemotes elastic#124959 ESQL: Catch parsing exception (elastic#124958) ESQL: Improve error message for ( and [ (elastic#124177) Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {lookup-join.MvJoinKeyFromRow SYNC} elastic#124951 Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testErrorRecordingOnRetention elastic#124950 ... # Conflicts: #	server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java #	server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
@mosche
Copy link
Contributor Author

mosche commented Mar 17, 2025

@rjernst for how far back should this be backported?

@rjernst
Copy link
Member

rjernst commented Mar 18, 2025

If we can get it to 8.18/8.19 that would be good for long term maintainability.

mosche added a commit to mosche/elasticsearch that referenced this pull request Mar 18, 2025
…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613 (cherry picked from commit 36874e8) # Conflicts: #	test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java
@mosche
Copy link
Contributor Author

mosche commented Mar 18, 2025

💚 All backports created successfully

Status Branch Result
8.x
8.18

Questions ?

Please refer to the Backport tool documentation

mosche added a commit to mosche/elasticsearch that referenced this pull request Mar 18, 2025
…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613 (cherry picked from commit 36874e8) # Conflicts: #	test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java
mosche added a commit to mosche/elasticsearch that referenced this pull request Mar 18, 2025
…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613 (cherry picked from commit 36874e8) # Conflicts: #	test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java
@mosche
Copy link
Contributor Author

mosche commented Mar 18, 2025

💚 All backports created successfully

Status Branch Result
8.17
8.16

Questions ?

Please refer to the Backport tool documentation

mosche added a commit to mosche/elasticsearch that referenced this pull request Mar 18, 2025
…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613 (cherry picked from commit 36874e8) # Conflicts: #	server/src/main/java/org/elasticsearch/threadpool/ScalingExecutorBuilder.java #	test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java
elasticsearchmachine pushed a commit that referenced this pull request Mar 18, 2025
…h core pool size = 0 (#124732) (#125066) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613 (cherry picked from commit 36874e8) # Conflicts: #	test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java
mosche added a commit that referenced this pull request Mar 18, 2025
…h core pool size = 0 (#124732) (#125067) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613 (cherry picked from commit 36874e8) # Conflicts: #	test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java
elasticsearchmachine pushed a commit that referenced this pull request Mar 18, 2025
…tor with core pool size = 0 (#124732) (#125069) * Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613 (cherry picked from commit 36874e8) # Conflicts: #	server/src/main/java/org/elasticsearch/threadpool/ScalingExecutorBuilder.java #	test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java * remove timeout * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
mosche added a commit that referenced this pull request Mar 18, 2025
…tor with core pool size = 0 (#124732) (#125068) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes #124667 Relates to #18613
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 24, 2025
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
…h core pool size = 0 (elastic#124732) When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in elastic#124667 where max pool size 1 is used. This configuration is most likely to expose the bug. This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case. If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`. Fixes elastic#124667 Relates to elastic#18613
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team v8.16.6 v8.17.4 v8.18.1 v8.19.0 v9.0.1 v9.1.0

6 participants