- Notifications
You must be signed in to change notification settings - Fork 25.6k
Tighten on when THROTTLE decision can be returned #136794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tighten on when THROTTLE decision can be returned #136794
Conversation
Other than HasFrozenCacheAllocationDecider, allocation decision should not be THROTTLE unless there is prior shard assignment/movement. This PR makes this statement explicit by adding relevant assertions and fixing test allocation decider. Relates: ES-12955
| Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
| Initially, I wanted to ensure that no THROTTLE can be returned at all unless there is prior shard movement. But HasFrozenCacheAllocationDecider is not compatible with the idea. We also have existing code explicitly checking for it. So it seems that we have to live with this exception. As a result, I added assertions and fixed |
| // 2. Or, a partial searchable snapshot index due to HasFrozenCacheAllocationDecider | ||
| assert allocation.isSimulating() == false || indexMetadata.isPartialSearchableSnapshot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could check that the THROTTLE decision is actually from the HasFrozenCacheAllocationDecider. But it's not really feasible. Instead, we can check it's a frozen index. Not really ideal, but seems to be the best we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of putting knowledge of specific deciders in here, as discussed, I wonder if HasFrozenAllocationDecider is abusing THROTTLE anyway. My reasoning was:
In the
canAllocateUnassignedlogic we'll wait for the outcome of a lower-weight THROTTLE when we have higher-weight YES so that would mean this decider can defer the allocation of a partial searchable snapshot while a non-frozen node was initialising doesn't it?
I would think we'd return NO until the setting was loaded?
It seems like returning THROTTLED while we wait to find out if the node can host frozen shards could mean we defer allocating the shard to another node that we KNOW can host frozen shards, while waiting to find out if an initialising node is a frozen node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we defer allocating the shard to another node that we KNOW can host frozen shards, while waiting to find out if an initialising node is a frozen node.
That is true. I also wonder whether it is intentional since otherwise the shards will be assigned to an existing node only to relocate later momentarily. That said, after reading through code around HasFrozenAllocationDecider, I believe it is OK to return NO since it should not change the beahviour. That is because we always call allocateExistingUnassignedShards before calling into the allocator. This method uses SearchableSnapshotAllocator to allocate unassigne shards first. It also checks cache status and reject allocation when cache state is still being fetched. It in turns places the shard in the ignored list and BalancedShardAllocator ignores them as well. In short, the allocation decision for paritially mounted shards are made earlier in a different place so that it should not matter for the decider to return NO.
...g/elasticsearch/cluster/routing/allocation/decider/ConcurrentRebalanceAllocationDecider.java Show resolved Hide resolved
nicktindall left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this change LGTM pending the question of whether we can make HasFrozenCacheAllocationDecider return NO while we're initialising the node so that we can remove the special knowledge of it from here?
DiannaHohensee left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this task is to gain confidence that the THROTTLE decision is not encountered, but right now this patch doesn't provide that confidence. This isn't in the hot-path, so I don't think we need to rush the change. I'd rather sort out the Decider before proceeding.
...sticsearch/xpack/searchablesnapshots/allocation/decider/HasFrozenCacheAllocationDecider.java Outdated Show resolved Hide resolved
...sticsearch/xpack/searchablesnapshots/allocation/decider/HasFrozenCacheAllocationDecider.java Outdated Show resolved Hide resolved
...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java Show resolved Hide resolved
...g/elasticsearch/cluster/routing/allocation/decider/ConcurrentRebalanceAllocationDecider.java Show resolved Hide resolved
| @nicktindall @DiannaHohensee The PR is ready for another look. In summary, I have taken Nick's suggestion for returning I spent quite some time reading through relevant code for searchable snapshot indices allocation. I believe it is safe to do so.
|
| default -> STILL_FETCHING; | ||
| case FETCHING -> allocation.isSimulating() ? NO_STILL_FETCHING : STILL_FETCHING; | ||
| case UNKNOWN -> UNKNOWN_NODE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasFrozenAllocationDecider does not differentiate between fetching node state (FETCHING) and no-such-node (UNKNOWN). The later can only happen when a node leaves the cluster which seems more justified to say NO. In any case, they do not really impact SearchableSnapshotAllocator which is what matters for allocating unassigned searchable snapshot shards.
Btw, I can also undo the split if it is considered unrelated.
| if (hasChanges == false && info.lastAllocationStatus() == UnassignedInfo.AllocationStatus.DECIDERS_THROTTLED) { | ||
| // Unassigned ignored shards must be based on the provided set of ignoredShards | ||
| assert ignoredShards.contains(discardAllocationStatus(shard)) | ||
| || ignoredShards.stream().filter(ShardRouting::primary).anyMatch(primary -> primary.shardId().equals(shard.shardId())) | ||
| : "ignored shard " | ||
| + shard | ||
| + " unexpectedly has THROTTLE status and no counterpart in the provided ignoredShards set " | ||
| + ignoredShards; | ||
| // Simulation could not progress due to missing information in any of the deciders. | ||
| // Currently, this could happen if `HasFrozenCacheAllocationDecider` is still fetching the data. | ||
| // Progress would be made after the followup reroute call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about HasFrozenCacheAllocationDecider had me concerned for some time. Adding this assertion to make it clear that throttled status can only come from DesiredBalanceInput#ignoredShards, i.e. they are not produced by BalancedShardAllocator.
nicktindall left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only comment is I wonder if we should do the change to NO for "still fetching" and the additional assertions as separate PRs, but happy with whatever you decide there.
| @nicktindall I refactor the change for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments about nested assertions. In general, can we move the assertions out of the conditional paths and assert that not encountering THROTTLE during simulation is generally true, as opposed to conditionally? Unless I've misunderstood the necessity?
| + allocation.isSimulating() | ||
| + ") when balancing index [" | ||
| + index | ||
| + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing the assert in a conditional branch makes this seem like certain paths are okay for THROTTLE. Suppose we were in a simulation and somehow got more than 1 relocating shard at once: that would be unsafe, right? Is there any reason we can't safely always have the assert in place? Like,
assert allocation.isSimulating() == false || routingNodes.getRelocatingShardCount() <= 1 : "...."; if (routingNodes.getRelocatingShardCount() > 0) { shardBalanced = true; } if (completeEarlyOnShardAssignmentChange && shardBalanced) { .... | + allocation.isSimulating() | ||
| + ") with no prior assignment when allocating unassigned shard [" | ||
| + shard | ||
| + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than putting this in a conditional, can we assert that either isSimulating==false || decision is not throttle up on line 1272? Or better yet, right after the decideAllocateUnassigned call, outside another layer of if-else condition?
| } | ||
| | ||
| // A THROTTLE allocation decision can happen when not simulating | ||
| assert moveDecision.getAllocationDecision() != AllocationDecision.THROTTLED || allocation.isSimulating() == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done right after the decideMove call? There's an early return path above on which we'd miss this assert.
…137232) Since SearchableSnapshotAllocator updates frozen cache service before allocation, the unknown state for a node can only happen when the node leaves the cluster. In this case, it makes more sense to reject the allocation instead of throttling since the node may not come back. This is also more consistent with the behaviour of SearchableSnapshotAllocator which does not consider any node that is not currently in the cluster (effectively a No). If the node does come back, it will go through again the cache info fetching steps. Relates: #136794
…lastic#137232) Since SearchableSnapshotAllocator updates frozen cache service before allocation, the unknown state for a node can only happen when the node leaves the cluster. In this case, it makes more sense to reject the allocation instead of throttling since the node may not come back. This is also more consistent with the behaviour of SearchableSnapshotAllocator which does not consider any node that is not currently in the cluster (effectively a No). If the node does come back, it will go through again the cache info fetching steps. Relates: elastic#136794
| @DiannaHohensee Sorry for the late reply here. Got distracted by other work and then PTO. I pushed c62be8e based on your feedback for asserting in a more general path. Please let me know if the latest change works better for you. |
| // TODO: considering returning NO as well for non-simulation https://elasticco.atlassian.net/browse/ES-13378 | ||
| case FETCHING -> allocation.isSimulating() ? NO_STILL_FETCHING : STILL_FETCHING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over the other PR. I am keeping throttle for non-simulation case for the time being. I added a link to the JIRA ticket for considering its removal.
DiannaHohensee left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. I left some additional recommendations. Thanks for moving up the asserts to apply more generally.
| lowIdx = 0; | ||
| highIdx = relevantNodes - 1; | ||
| | ||
| assert allocation.isSimulating() == false || routingNodes.getRelocatingShardCount() > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have asked this in an earlier review round, but shouldn't we expect routingNodes.getRelocatingShardCount() to be exactly 1 when isSimulating is true? IIUC, tryRelocateShard will increment routingNodes.getRelocatingShardCount() when there's a YES answer but leave it unchanged for THROTTLE, per this code
Lines 1499 to 1502 in e8a9eb1
| canAllocateOrRebalance == Type.YES | |
| /* only allocate on the cluster if we are not throttled */ | |
| ? routingNodes.relocateShard(shard, minNode.getNodeId(), shardSize, "rebalance", allocation.changes()).v1() | |
| : shard.relocate(minNode.getNodeId(), shardSize) |
Combined with the one shard move and then exit logic added recently, the assert would then be
simulation is false
OR
relocatingShardCount == 1 (and implicitly simulation is true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can tighten it further see 6cdfdc1
| // A THROTTLE allocation decision can happen when not simulating | ||
| assert moveDecision.isDecisionTaken() == false | ||
| || moveDecision.getAllocationDecision() != AllocationDecision.THROTTLED | ||
| || allocation.isSimulating() == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opt nit: might flip isSimulating above != THROTTLED check, seems easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure see 1726061
| // 2. Or, there is shard assigned before this one | ||
| assert allocationDecision.getAllocationStatus() != AllocationStatus.DECIDERS_THROTTLED | ||
| || allocation.isSimulating() == false | ||
| || shardAssignmentChanged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unassigned replica shards can returning THROTTLE because only the primary shard path is THROTTLE free during simulation. At least until I get ES-12942 complete. I take it that's the problem here?
Since we have more or less ignored the allocateUnassigned code path for not-preferred, feel free to skip this assert: I'm not sure what benefit it adds right now. Or you can add a TODO referencing ES-12942 and I expect I should be able to tighten this up. We'll revisit allocateUnassigned for not-preferred in ES-13279, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unassigned replica shards can returning THROTTLE because only the primary shard path is THROTTLE free during simulation. At least until I get ES-12942 complete. I take it that's the problem here?
Yes ThrottlingAllocationDecider can still return throttling decision for unassigne replica shards. But ES-12942 is a separate issue. It's about early return in DesiredBalanceComputer instead of BalancedShardAllocator. It currently short-circuits only for primary shard assignement. Hence, changing ThrottlingAllocationDecider returning YES for replica shards is not really relevant since replica shards can already be assigned today but we don't publish the balance quickly enough after they are assigned.
In summary, let me raise a follow-up to fix ThrottlingAllocationDecider so that it does not return throttling in simulation. It's all part of the current JIRA ES-12955
| case NO_CACHE -> NO_FROZEN_CACHE; | ||
| case FAILED -> UNKNOWN_FROZEN_CACHE; | ||
| case FETCHING -> STILL_FETCHING; | ||
| // TODO: considering returning NO as well for non-simulation https://elasticco.atlassian.net/browse/ES-13378 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a public repo: should we shorten to ES-13378 since the link won't work for external users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Pushed 228c35d
For allocation simulation, allocation decisions should not be THROTTLE for moveShards and balance. Allocate unassigned shards should not see THROTTLE unless there is prior shard assignment. This PR makes this statement explicit by fixing test allocation decider and HasFrozenAllocationDecider as well as adding relevant assertions. Relates: ES-12955
…-json * upstream/main: Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691 Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690 [LTR] Fix feature display order when using explain. (elastic#137671) Remove extra RemoteClusterService instances in unit test (elastic#137647) Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669) Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233) Update bundled JDK to 25.0.1 (elastic#137640) resolve indices for prefixed _all expressions (elastic#137330) ESQL: Add TopN support for exponential histograms (elastic#137313) allows field caps to be cross project (elastic#137530) ESQL: Add exponential histogram percentile function (elastic#137553) Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636) Tighten on when THROTTLE decision can be returned (elastic#136794) Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655 Add a test for two little known conditional processor paths (elastic#137645) Extract a common ORIGIN constant (elastic#137612) Remove early phase failure in batched (elastic#136889) Returning correct index mode from get data streams api (elastic#137646) [ML] Manage AD results indices (elastic#136065) …-json * upstream/main: Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691 Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690 [LTR] Fix feature display order when using explain. (elastic#137671) Remove extra RemoteClusterService instances in unit test (elastic#137647) Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669) Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233) Update bundled JDK to 25.0.1 (elastic#137640) resolve indices for prefixed _all expressions (elastic#137330) ESQL: Add TopN support for exponential histograms (elastic#137313) allows field caps to be cross project (elastic#137530) ESQL: Add exponential histogram percentile function (elastic#137553) Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636) Tighten on when THROTTLE decision can be returned (elastic#136794) Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655 Add a test for two little known conditional processor paths (elastic#137645) Extract a common ORIGIN constant (elastic#137612) Remove early phase failure in batched (elastic#136889) Returning correct index mode from get data streams api (elastic#137646) [ML] Manage AD results indices (elastic#136065) For allocation simulation, allocation decisions should not be THROTTLE for moveShards and balance. Allocate unassigned shards should not see THROTTLE unless there is prior shard assignment. This PR makes this statement explicit by fixing test allocation decider and HasFrozenAllocationDecider as well as adding relevant assertions. Relates: ES-12955
For allocation simulation, allocation decisions should not be THROTTLE for moveShards and balance. Allocate unassigned shards should not see THROTTLE unless there is prior shard assignment. This PR makes this statement explicit by fixing test allocation decider and HasFrozenAllocationDecider as well as adding relevant assertions.
Relates: ES-12955