Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Oct 28, 2025

Since SearchableSnapshotAllocator updates frozen cache service before allocation, the unknown state for a node can only happen when the node leaves the cluster. In this case, it makes more sense to reject the allocation instead of throttling since the node may not come back. This is also more consistent with the behaviour of
SearchableSnapshotAllocator which does not consider any node that is not currently in the cluster (effectively a No). If the node does come back, it will go through again the cache info fetching steps.

Relates: #136794

Since SearchableSnapshotAllocator updates frozen cache service before allocation, the unknown state for a node can only happen when the node leaves the cluster. In this case, it makes more sense to reject the allocation instead of throttling since the node may not come back. This is also more consistent with the behaviour of SearchableSnapshotAllocator which does not consider any node that is not currently in the cluster (effectively a No).
@ywangd ywangd requested a review from nicktindall October 28, 2025 07:14
@ywangd ywangd added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v9.3.0 labels Oct 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @ywangd, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 28, 2025
@ywangd
Copy link
Member Author

ywangd commented Oct 28, 2025

Labelling this as non-issue because:

  • Unassigned frozen shards are handled by SearchableSnapshotAllocator and hence should see no impact
  • Assigned frozen can now see No instead of Throttle for moveShards and balance where Throttle is effectively a No, i.e. it does not move shards on the cluster. With desired balance allocator and the early return mechanism, throttle becomes even less relevant.
case NO_CACHE -> NO_FROZEN_CACHE;
case FAILED -> UNKNOWN_FROZEN_CACHE;
default -> STILL_FETCHING;
case FETCHING -> STILL_FETCHING;
Copy link
Contributor

@DiannaHohensee DiannaHohensee Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only briefly looked at this code area, but I'm wondering why we'd use THROTTLE instead of NO. I would expect the FrozenCacheService to make a reroute when it receives the update, but it doesn't look like that happens. So THROTTLE looks like a hack. Could we make that reroute happen in a new FrozenCacheService#clusterChanged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps not a clusterChanged, but whenever the node's settings become known -- I didn't actually follow the code that far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other PR (#136794), I indeed changed it to No when it is called in simulation. That should give us what we want. I left it unchanged for non-simulation since that is not important to us and less change.

As commented on the other PR, FrozenCacheInfoService#updateNodes is called by SearchableSnapshotAllocator on the master thread as part of a reroute call which effectively does what you are suggesting. If there is any node still fetching cache state, unassigned frozen shards are ignored which means BalancedShardAllocator ignores them as well, i.e., whether it returns No or Throttle does not matter for unassigned shards in simulation. For started frozen shards, BalancedShardAllocator can see a Throttle at balacing time with the current code. That's where the other PR comes in, i.e. it changes the decision to No which works effectively the same as Throttle in simluation especially now that we have the early return mechanism. So it is a non-issue change.

I hope this makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it unchanged for non-simulation since that is not important to us and less change.

So it's safe to make it always NO, but you're trying to minimize the change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered in elsewhere. Reposting here for easy consumption

I looked into the simulation part more carefully so that I am pretty sure it is a non-issue (to change to N)). I believe it should be safe in both cases. But less change is my preference since non-simulation case is unrelated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, if the split between UNKNOWN and FETCHING is considered unrelated and adding confusion than its worth, I can also undo it so that the change will just be

default -> allocation.isSimulating() ? NO : STILL_FETCHING;

as suggested over the other PR. Just let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split / explicit switches per enum is all good to me, I'm more hesitant about not fully changing over to NO. I think a remaining throttle case leaves the code in a confusing state: it's technical debt.

I'm good with pushing the change as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion! I will be merging this PR since it's about the split and we are OK with it. We can continue the discussion about remaining Throttle over the other PR.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd like to fully remove THROTTLE later, so I filed https://elasticco.atlassian.net/browse/ES-13378

case NO_CACHE -> NO_FROZEN_CACHE;
case FAILED -> UNKNOWN_FROZEN_CACHE;
default -> STILL_FETCHING;
case FETCHING -> STILL_FETCHING;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split / explicit switches per enum is all good to me, I'm more hesitant about not fully changing over to NO. I think a remaining throttle case leaves the code in a confusing state: it's technical debt.

I'm good with pushing the change as is.

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 30, 2025
@elasticsearchmachine elasticsearchmachine merged commit 6c33c94 into elastic:main Oct 30, 2025
34 checks passed
@ywangd ywangd deleted the no-decision-for-unknown-node branch October 30, 2025 06:07
chrisparrinello pushed a commit to chrisparrinello/elasticsearch that referenced this pull request Nov 3, 2025
…lastic#137232) Since SearchableSnapshotAllocator updates frozen cache service before allocation, the unknown state for a node can only happen when the node leaves the cluster. In this case, it makes more sense to reject the allocation instead of throttling since the node may not come back. This is also more consistent with the behaviour of SearchableSnapshotAllocator which does not consider any node that is not currently in the cluster (effectively a No). If the node does come back, it will go through again the cache info fetching steps. Relates: elastic#136794
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0

4 participants