Skip to content

Conversation

@luyuncheng
Copy link
Contributor

When data is time series indices, it may concurrent create many index and delete index using ILM

i see DeleteAction, ShrinkAction need to check ccr status in WaitForNoFollowersStep as following code shows:

public List<Step> toSteps(Client client, String phase, Step.StepKey nextStepKey) {
Step.StepKey waitForNoFollowerStepKey = new Step.StepKey(phase, NAME, WaitForNoFollowersStep.NAME);
Step.StepKey deleteStepKey = new Step.StepKey(phase, NAME, DeleteStep.NAME);
Step.StepKey cleanSnapshotKey = new Step.StepKey(phase, NAME, CleanupSnapshotStep.NAME);
if (deleteSearchableSnapshot) {
WaitForNoFollowersStep waitForNoFollowersStep = new WaitForNoFollowersStep(waitForNoFollowerStepKey, cleanSnapshotKey, client);
CleanupSnapshotStep cleanupSnapshotStep = new CleanupSnapshotStep(cleanSnapshotKey, deleteStepKey, client);
DeleteStep deleteStep = new DeleteStep(deleteStepKey, nextStepKey, client);
return Arrays.asList(waitForNoFollowersStep, cleanupSnapshotStep, deleteStep);
} else {
WaitForNoFollowersStep waitForNoFollowersStep = new WaitForNoFollowersStep(waitForNoFollowerStepKey, deleteStepKey, client);
DeleteStep deleteStep = new DeleteStep(deleteStepKey, nextStepKey, client);
return Arrays.asList(waitForNoFollowersStep, deleteStep);
}
}

WaitForNoFollowersStep do collect indices stats to evaluate the condition

getClient().admin().indices().stats(request, ActionListener.wrap((response) -> {
IndexStats indexStats = response.getIndex(indexName);
if (indexStats == null) {
// Index was probably deleted
logger.debug("got null shard stats for index {}, proceeding on the assumption it has been deleted", indexName);
listener.onResponse(true, null);
return;
}

BUT indices stats api need broadcast to many nodes and collect shardStats which is time consuming, even we know the CCR is disabled

image

i think, we can use feature api get feature info, and reduce indices stats request when the CCR is disabled

@elasticsearchmachine elasticsearchmachine added v8.8.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 15, 2023
@romseygeek romseygeek added :Data Management/ILM+SLM Index and Snapshot lifecycle management and removed needs:triage Requires assignment of a team area label labels Mar 22, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 22, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@andreidan andreidan self-requested a review March 23, 2023 15:39
@luyuncheng
Copy link
Contributor Author

@andreidan i know there are many optimizations in ilm batch task AND master batch queue like 78547, 92021, 94325 etc. BUT when many deleteAction triggered in ILM, all phrase would wait in WaitForNoFollowersStep because it should do collect shard stats from master which is time consuming, and other phase can batch execute.

@andreidan andreidan self-assigned this Apr 21, 2023
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this @luyuncheng

I left an alternative solution suggestion. Hope it makes sense.

@gmarouli gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @luyuncheng

I've left a few rather minor suggestions (and let's see if CI is happy)

XPackInfoResponse.FeatureSetsInfo featureSetsInfo = xPackInfoResponse.getFeatureSetsInfo();
if (featureSetsInfo != null) {
XPackInfoResponse.FeatureSetsInfo.FeatureSet featureSet = featureSetsInfo.getFeatureSets().get(CCR_LEASE_KEY);
if (featureSet != null && (featureSet.available() == false || featureSet.enabled() == false)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please just check if the featureSet is enabled here.
e.g.

if (featureSet != null && featureSet.enabled() == false) 

We'd like to maintain the same behaviour for this step even if CCR is unavailable (e.g. license is expired)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd like to maintain the same behaviour for this step even if CCR is unavailable (e.g. license is expired)

LGTM

}

public void testConditionMet() {
public void testConditionMetWhenEnable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we name this testConditionMetWhenCCREnabled ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

assertNull(stepInfoHolder.get());
}

public void testConditionMetWhenDisabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, shall we name this test to include CCRDisabled ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@andreidan
Copy link
Contributor

@elasticmachine ok to test

@andreidan
Copy link
Contributor

@elasticmachine update branch

@luyuncheng
Copy link
Contributor Author

luyuncheng commented Apr 28, 2023

@andreidan The previous ci test https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request+part-2/33165/ fail in intenalClusterTest because ActionType not found. i think this because XpackInfoFeatureAction scan all xpack module like following code shows:

public TransportXPackInfoAction(
TransportService transportService,
ActionFilters actionFilters,
LicenseService licenseService,
NodeClient client
) {
super(XPackInfoAction.NAME, transportService, actionFilters, XPackInfoRequest::new);
this.licenseService = licenseService;
this.client = client;
this.infoActions = infoActions();
}
// overrideable for tests
protected List<XPackInfoFeatureAction> infoActions() {
return XPackInfoFeatureAction.ALL;
}

and internalClusterTest in ILM not include other module into plugins when override nodePlugins:

@Override
protected Collection<Class<? extends Plugin>> nodePlugins() {
return Arrays.asList(LocalStateCompositeXPackPlugin.class, IndexLifecycle.class);
}

in commits d3c5f99 i added dependency ccr into build.gradle. and i also added internalClusterTest to check this

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this @luyuncheng

Left a few minor comments.

Can you please also add a changelog entry ? (similar to this https://github.com/elastic/elasticsearch/blob/main/docs/changelog/93524.yaml )


@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0, numClientNodes = 0)
public class ILMMultiNodeIT extends ESIntegTestCase {
public class ILMMultiNodeWithCCREnabledIT extends ESIntegTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have to rename this as it's not meant to test anything CCR related

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

startWarmOnlyNode();
ensureGreen();

RolloverAction rolloverAction = new RolloverAction(null, null, null, 1L, null, null, null, null, null, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip the hot phase completely to save a few cycles in this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip the hot phase completely to save a few cycles in this test

I need ShrinkAction to trigger WaitForNoFollowersStep and shrink action need a read-only index. so i combine the RolloverAction which is for enable read-only index AND ShrinkAction into one hot phase. and skip the warm phase

@luyuncheng
Copy link
Contributor Author

Can you please also add a changelog entry ?

++

@luyuncheng luyuncheng requested a review from andreidan May 9, 2023 01:31
@andreidan
Copy link
Contributor

@elasticmachine update branch

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution @luyuncheng

@andreidan andreidan merged commit 9f8593e into elastic:main May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.9.0

6 participants