Skip to content

Conversation

@henningandersen
Copy link
Contributor

This commit adds two related changes:

  • ILM WaitForDataTierStep
  • Autoscaling frozen_existence decider

The first part ensures that we wait mounting an index until a node that
can hold the index is available, avoiding a failed restore and red
cluster state. This is in particular important for the frozen phase, but
is done generically in the searchable snapshot action.

The second part triggers on indices in the ILM frozen phase to scale the
tier into existence by requiring a minimal amount of memory and storage.

Closes #72771

@elasticmachine elasticmachine added Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels May 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@henningandersen henningandersen force-pushed the fix_wait_for_frozen_tier branch from 7642989 to 275b81a Compare May 26, 2021 18:59
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes elastic#72771
@henningandersen henningandersen force-pushed the fix_wait_for_frozen_tier branch from 275b81a to b4a2df5 Compare May 26, 2021 19:04
@dakrone dakrone self-requested a review May 26, 2021 20:39
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some initial comments about the ILM-specific stuff

// here before generating snapshots that can't be used if the user doesn't have the right license level.
BranchingStep conditionalSkipActionStep = new BranchingStep(preActionBranchingKey, checkNoWriteIndex, nextStepKey,
(index, clusterState) -> {
XPackLicenseState licenseState = XPackPlugin.getSharedLicenseState();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we need to move the license state back up into the top level instead of getting it here? It added quite a few lines injecting it into every ILM action

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that XPackPlugin.getSharedLicenseState() is backed by a static SetOnce field making it unsuitable for use in tests. This pattern of passing in the licenseState was taken from ML. In CCR and Autoscaling, the licenceState is wrapped in a checker that is passed around instead. For ILM, the ML approach felt most natural since it allows any license check in the future.

I think there is no internal cluster tests today that trigger the SearchableSnapshotAction, however, I needed to add one in FrozenExistenceDeciderIT, necessitating this change.

If there is a different and/or better pattern for doing this I should be more than happy to adapt this.

Now wait for hot tier in hot phase rather than any of cold,warm,hot.
@henningandersen
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

@henningandersen henningandersen requested a review from dakrone May 29, 2021 06:16
@csoulios csoulios added v7.13.2 and removed v7.13.1 labels Jun 2, 2021
By adding a default method to LifecycleAction we can reduce the impact on every step in ILM.
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ILM parts LGTM, thanks for iterating on this Henning

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I think, sorry it's taken a while to get my head around this. I left only trivial and optional comments.


@Override
public List<Step> toSteps(Client client, String phase, StepKey nextStepKey) {
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assert false here too? For the record I'd rather just add the parameter to toSteps() everywhere rather than have this uncalled overload, i.e, the opposite of what we did in d4b2e32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 63816a0. I left the overload as is.

* performing a full mount the preference is cold - warm - hot. When
* performing a partial mount the preference is only frozen
*/
public static String getDataTiersPreference(Storage type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/qq: why not an instance method on Storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, thanks, fixed in 32371ac

*
* @param client The Elasticsearch Client to use during execution of {@link AsyncActionStep}
* and {@link AsyncWaitStep} steps.
* @param licenseState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: undocumented, suggest removal?

Suggested change
* @param licenseState
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit of doc in aec84ec

assertThat(steps.get(12).getKey(), is(expectedSteps.get(12)));
assertThat(steps.get(13).getKey(), is(expectedSteps.get(13)));
assertThat(steps.get(14).getKey(), is(expectedSteps.get(14)));
assertThat(steps.get(15).getKey(), is(expectedSteps.get(15)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what's the deal with all this unrolling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly followed the same methodology. I will leave it to @dakrone to comment on the idea behind this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason these have to be unrolled, as long as we have the expected steps comparing them all at once is fine. I don't know why they were initially unrolled.

@henningandersen henningandersen merged commit 7793838 into elastic:master Jun 22, 2021
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Jun 22, 2021
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes elastic#72771
henningandersen added a commit that referenced this pull request Jun 22, 2021
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes #72771
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Jun 22, 2021
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes elastic#72771
henningandersen added a commit that referenced this pull request Jun 22, 2021
This commit adds two related changes: * ILM WaitForDataTierStep * Autoscaling frozen_existence decider The first part ensures that we wait mounting an index until a node that can hold the index is available, avoiding a failed restore and red cluster state. This is in particular important for the frozen phase, but is done generically in the searchable snapshot action. The second part triggers on indices in the ILM frozen phase to scale the tier into existence by requiring a minimal amount of memory and storage. Closes #72771
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/ILM+SLM Index and Snapshot lifecycle management :Distributed Coordination/Autoscaling >enhancement Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.13.3 v7.14.0 v8.0.0-alpha1

7 participants