Skip to content

Conversation

parkertimmins
Copy link
Contributor

@parkertimmins parkertimmins commented Aug 21, 2024

Changes the way we calculate if all replicas are unassigned when primary is recently created.
Related to #107794 which did not treat replica as unassigned if primary was not yet active.

This changes extend the time when replicas are not treated as unassigned to a buffer time period after the primary has become active. This buffer time period is controlled through the setting health.shards_availability.replica_unassigned_buffer_time. This is only used in serverless; on stateful the behavior remains the same where new unassigned replicas will only not be treated was unassigned while the primary is not yet active.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @parkertimmins, I've created a changelog YAML for you.

- pass now millis instead of clock - update setting with addSettingsUpdateConsumer - simplify cutoff condition logic
- need to add settings to mocked ClusterService
@parkertimmins parkertimmins requested a review from dakrone August 22, 2024 18:07
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the Serverless piece is merged and won't cause CI failures. I left one comment but nothing major, thanks Parker!

@parkertimmins parkertimmins merged commit b776cf6 into elastic:main Aug 28, 2024
16 checks passed
@parkertimmins parkertimmins deleted the shard-availability-replica-time-buffer branch August 28, 2024 19:18
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
…lastic#112066) Changes the way we calculate if all replicas are unassigned when primary is recently created. This change will only be used in serverless, not in stateful. When a primary is new, if the primary is active, but the replica is unassigned for less than a buffer time period, do not treat is as unassigned. Control time period through health.shards_availability.replica_unassigned_buffer_time setting.
parkertimmins added a commit that referenced this pull request Sep 12, 2024
Increase the default value of health.shards_availability.replica_unassigned_buffer_time to 5 seconds. This values in the identification of unavailable shards on serverless. Increasing the value to 5s keep more shards from going red transiently, while still being low enough to go red quickly if there is an actual availability issue. Related to #112066
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment