- Notifications
You must be signed in to change notification settings - Fork 25.6k
Autoscale frozen tier into existence #73435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscale frozen tier into existence #73435
Conversation
| Pinging @elastic/es-core-features (Team:Core/Features) |
| Pinging @elastic/es-distributed (Team:Distributed) |
7642989 to 275b81a Compare This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes elastic#72771
275b81a to b4a2df5 Compare
dakrone left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some initial comments about the ILM-specific stuff
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecycleExecutionState.java Show resolved Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/WaitForDataTierStep.java Outdated Show resolved Hide resolved
x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/IndexLifecycle.java Show resolved Hide resolved
| // here before generating snapshots that can't be used if the user doesn't have the right license level. | ||
| BranchingStep conditionalSkipActionStep = new BranchingStep(preActionBranchingKey, checkNoWriteIndex, nextStepKey, | ||
| (index, clusterState) -> { | ||
| XPackLicenseState licenseState = XPackPlugin.getSharedLicenseState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why we need to move the license state back up into the top level instead of getting it here? It added quite a few lines injecting it into every ILM action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that XPackPlugin.getSharedLicenseState() is backed by a static SetOnce field making it unsuitable for use in tests. This pattern of passing in the licenseState was taken from ML. In CCR and Autoscaling, the licenceState is wrapped in a checker that is passed around instead. For ILM, the ML approach felt most natural since it allows any license check in the future.
I think there is no internal cluster tests today that trigger the SearchableSnapshotAction, however, I needed to add one in FrozenExistenceDeciderIT, necessitating this change.
If there is a different and/or better pattern for doing this I should be more than happy to adapt this.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/SearchableSnapshotAction.java Outdated Show resolved Hide resolved
Now wait for hot tier in hot phase rather than any of cold,warm,hot.
| @elasticmachine run elasticsearch-ci/packaging-tests-windows-sample |
By adding a default method to LifecycleAction we can reduce the impact on every step in ILM.
dakrone left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ILM parts LGTM, thanks for iterating on this Henning
DaveCTurner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I think, sorry it's taken a while to get my head around this. I left only trivial and optional comments.
| | ||
| @Override | ||
| public List<Step> toSteps(Client client, String phase, StepKey nextStepKey) { | ||
| throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert false here too? For the record I'd rather just add the parameter to toSteps() everywhere rather than have this uncalled overload, i.e, the opposite of what we did in d4b2e32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 63816a0. I left the overload as is.
| * performing a full mount the preference is cold - warm - hot. When | ||
| * performing a partial mount the preference is only frozen | ||
| */ | ||
| public static String getDataTiersPreference(Storage type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/qq: why not an instance method on Storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, thanks, fixed in 32371ac
| * | ||
| * @param client The Elasticsearch Client to use during execution of {@link AsyncActionStep} | ||
| * and {@link AsyncWaitStep} steps. | ||
| * @param licenseState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: undocumented, suggest removal?
| * @param licenseState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bit of doc in aec84ec
| assertThat(steps.get(12).getKey(), is(expectedSteps.get(12))); | ||
| assertThat(steps.get(13).getKey(), is(expectedSteps.get(13))); | ||
| assertThat(steps.get(14).getKey(), is(expectedSteps.get(14))); | ||
| assertThat(steps.get(15).getKey(), is(expectedSteps.get(15))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: what's the deal with all this unrolling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly followed the same methodology. I will leave it to @dakrone to comment on the idea behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason these have to be unrolled, as long as we have the expected steps comparing them all at once is fine. I don't know why they were initially unrolled.
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes elastic#72771
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes #72771
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes elastic#72771
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes #72771
This commit adds two related changes:
The first part ensures that we wait mounting an index until a node that
can hold the index is available, avoiding a failed restore and red
cluster state. This is in particular important for the frozen phase, but
is done generically in the searchable snapshot action.
The second part triggers on indices in the ILM frozen phase to scale the
tier into existence by requiring a minimal amount of memory and storage.
Closes #72771