- Notifications
You must be signed in to change notification settings - Fork 25.5k
Do not treat replica as unassigned if primary recently created and unassigned time is below a threshold. #112066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
parkertimmins merged 10 commits into elastic:main from parkertimmins:shard-availability-replica-time-buffer Aug 28, 2024
Merged
Do not treat replica as unassigned if primary recently created and unassigned time is below a threshold. #112066
parkertimmins merged 10 commits into elastic:main from parkertimmins:shard-availability-replica-time-buffer Aug 28, 2024
+271 −80
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @parkertimmins, I've created a changelog YAML for you. |
dakrone requested changes Aug 21, 2024
...lasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorService.java Outdated Show resolved Hide resolved
...lasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorService.java Outdated Show resolved Hide resolved
...lasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorService.java Outdated Show resolved Hide resolved
- pass now millis instead of clock - update setting with addSettingsUpdateConsumer - simplify cutoff condition logic
- need to add settings to mocked ClusterService
parkertimmins commented Aug 22, 2024
...lasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorService.java Outdated Show resolved Hide resolved
parkertimmins commented Aug 22, 2024
...lasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorService.java Outdated Show resolved Hide resolved
dakrone approved these changes Aug 26, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the Serverless piece is merged and won't cause CI failures. I left one comment but nothing major, thanks Parker!
...lasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorService.java Outdated Show resolved Hide resolved
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
…lastic#112066) Changes the way we calculate if all replicas are unassigned when primary is recently created. This change will only be used in serverless, not in stateful. When a primary is new, if the primary is active, but the replica is unassigned for less than a buffer time period, do not treat is as unassigned. Control time period through health.shards_availability.replica_unassigned_buffer_time setting.
parkertimmins added a commit that referenced this pull request Sep 12, 2024
Increase the default value of health.shards_availability.replica_unassigned_buffer_time to 5 seconds. This values in the identification of unavailable shards on serverless. Increasing the value to 5s keep more shards from going red transiently, while still being low enough to go red quickly if there is an actual availability issue. Related to #112066
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Health >enhancement Team:Data Management Meta label for data/management team test-update-serverless v8.16.0
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Changes the way we calculate if all replicas are unassigned when primary is recently created.
Related to #107794 which did not treat replica as unassigned if primary was not yet active.
This changes extend the time when replicas are not treated as unassigned to a buffer time period after the primary has become active. This buffer time period is controlled through the setting
health.shards_availability.replica_unassigned_buffer_time
. This is only used in serverless; on stateful the behavior remains the same where new unassigned replicas will only not be treated was unassigned while the primary is not yet active.