- Notifications
You must be signed in to change notification settings - Fork 25.5k
Optimize usage calculation in ILM policies retrieval API #106953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @nielsbauman, I've created a changelog YAML for you. |
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUtils.java Outdated Show resolved Hide resolved
I really like the idea of reducing the number of templates we are going through to speed up things. However, I am afraid this optimisation is not correct, I created a test that demonstrates the bug:
In this test we see a data stream whose We can explore more ideas like this, for example:
|
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Show resolved Hide resolved
/** A map from policy name to list of data streams that use that policy. */ | ||
private final Map<String, List<String>> policyToDataStream; | ||
/** A map from composable template name to the policy name it uses (or null) */ | ||
private final Map<String, String> templateToPolicy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The templateToPolicy
map is basically only there to save some MetadataIndexTemplateService.resolveSettings
calls (as we have to loop over all templates in calculateUsage
anyway), but resolving the settings seemed to be relatively expensive, so templateToPolicy
basically serves as a cache for that.
List<String> indices = new ArrayList<>(); | ||
for (IndexMetadata indexMetadata : state.metadata().indices().values()) { | ||
if (policyName.equals(indexMetadata.getLifecyclePolicyName())) { | ||
indices.add(indexMetadata.getIndex().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could pre-compute a map of policy to indices as well, but I'm not sure the memory vs. speed trade-off is worth it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, since we have to build this list for every policy anyway, pre-computing these lists and putting them in a map shouldn't be too much additional memory overhead. I'll wait till someone has done a first review before making more changes (in case my whole approach is off).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the same goes for the composable templates of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think about this. You only need to keep a cache of what is necessary. Right?
For example:
Composable templates:
Cache the resolved templates that match our policies and their index patterns: template -> policy
, probably you can also store policy -> templates
.
Data streams
Go over the data streams like you already do, find the template for each data stream and check the cache template -> policy
. You do not need to keep track of nulls anymore, since you have collected the relevant templates, if the template is not in your cache then the data stream shouldn't be either. So you create the policy -> data stream
.
Indices
I do not see an issue with going over the indices during initialisation because from what I saw the information is pre-calculated and then serialised. So, here you create policy -> indices
.
Then retrieving the data is just picking them up from the cache. Thoughts?
...core/src/test/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculatorTests.java Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Show resolved Hide resolved
.orElse(Set.of()); | ||
| ||
// Filter and sort all composable templates in advance, to speed up template retrieval later on. | ||
var templates = state.metadata() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this confused me a little bit, both the name and the comment. It wasn't clear to me what is filtered and when we use the word templates, which templates do we mean. So, what if we rename:
templateNames
torequestedTemplateNames
, signalling that these was provided by the caller.templates
torestTemplateCandidates
orotherTemplateCandidates
, signalling that these are the rest, so not the ones provided by the user and candidates becauseisGlobalAndHasIndexHiddenSetting
is filtering out the non candidates.
Names are up for debate but I hope I made clear the direction I am considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've renamed the templates
variable and updated the comments. Let me know if this is more clear.
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
final Predicate<String> patternMatchPredicate = pattern -> Regex.simpleMatch(pattern, resolvedIndexName); | ||
final List<Tuple<String, ComposableIndexTemplate>> candidates = new ArrayList<>(); | ||
for (Map.Entry<String, ComposableIndexTemplate> entry : metadata.templatesV2().entrySet()) { | ||
outerLoop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm..... I am not going to lie, seeing this does not give me a good feeling. Is this really the best alternative to structure the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've found a much better alternative, see 293ff9d. Let me know whether you agree that this approach is better.
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the late review! Really happy to see this work @nielsbauman, this is awesome! Let me know what you think on the comments.
I like the direction that you have taken :)
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Show resolved Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculator.java Outdated Show resolved Hide resolved
...core/src/test/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculatorTests.java Outdated Show resolved Hide resolved
...core/src/test/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculatorTests.java Outdated Show resolved Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculator.java Outdated Show resolved Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
List<String> indices = new ArrayList<>(); | ||
for (IndexMetadata indexMetadata : state.metadata().indices().values()) { | ||
if (policyName.equals(indexMetadata.getLifecyclePolicyName())) { | ||
indices.add(indexMetadata.getIndex().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think about this. You only need to keep a cache of what is necessary. Right?
For example:
Composable templates:
Cache the resolved templates that match our policies and their index patterns: template -> policy
, probably you can also store policy -> templates
.
Data streams
Go over the data streams like you already do, find the template for each data stream and check the cache template -> policy
. You do not need to keep track of nulls anymore, since you have collected the relevant templates, if the template is not in your cache then the data stream shouldn't be either. So you create the policy -> data stream
.
Indices
I do not see an issue with going over the indices during initialisation because from what I saw the information is pre-calculated and then serialised. So, here you create policy -> indices
.
Then retrieving the data is just picking them up from the cache. Thoughts?
@gmarouli thanks for your feedback! I've adjusted the code a bit more to just cache everything (as you proposed in one of the discussions above). I processed your other feedback as well and made some small adjustments. Let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want to run some benchmarks again, to validate that my changes are still as effective as I think they are. That'll take some time because I have to configure the cluster in the right way again, so I'll get to that later.
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved
| ||
// We keep a map from composable template name to policy name to avoid having to resolve the template settings to determine | ||
// the template's policy twice. | ||
final Map<String, String> templateToPolicy = Maps.newHashMapWithExpectedSize(indexTemplates.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried whether this expected size is too large. We're looping over all templates in the cluster, but I don't think we'll often actually put all templates in that map -- especially when there is only one requested policy. At the same time, I don't really have any suggestions for a better expected size, so I'd be either this or just an empty map. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I would vote for simplicity. So this has an extra statement and a comment explaining that statement, it's not super difficult but it is an extra thing to process. So if the benefit is not significant, then I would prefer to have just a map with the default size that reads effortlessly.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this line to just use an empty map. I'm not sure what "statement" and "comment" you're referring to exactly. The comment in this diff justifies the map itself, which we need anyway.
I just ran another benchmark and with the setup I created, a call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great improvement @nielsbauman . I added some minor comments but they are not blocking.
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Show resolved Hide resolved
); | ||
for (String dataStream : allDataStreams) { | ||
// Find the index template with the highest priority that matches this data stream's name. | ||
String indexTemplate = MetadataIndexTemplateService.findV2TemplateFromSortedList(project, indexTemplates, dataStream, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking out loud, shouldn't we pass here the ds.isHidden()
flag instead of just setting it to false
? I understand that for now this doesn't make any difference, but it does seem misleading. What do you think?
Disclaimer, this is not blocking considering it was already working like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's a good point. I think passing ds.isHidden()
is not the right way to go, as it indeed wouldn't make a difference (because we include data stream indices anyway, regardless of whether the data stream is hidden or not), so I'd argue that passing that is confusing as well. Instead, I'd probably be more inclined to extract/separate some methods (in MetadataIndexTemplateService
). We could either 1. overload the existing findV2Template
method(s) to not require the isHidden
parameter and always pass false
- with a comment explaining why we do that, or 2. we extract/refactor te findV2CandidateTemplates
method to have a version that truly doesn't use the isHidden
parameter - although that might get a bit ugly. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I do not think we need to change the code. Maybe a comment here and a mention in the javadoc of the method could be enough. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to findV2CandidateTemplates
. There are several usages of that method that pass false
instead of ds.isHidden()
, so adding comment on the caller side didn't make much sense IMO. I think the comment on findV2CandidateTemplates
is a good addition, thanks for the suggestion.
Optimize calculating the usage of ILM policies in the
GET _ilm/policy
andGET _ilm/policy/<policy_id>
endpoints. I loaded some dummy data into my single-node locally-running ES cluster using the following script (numbers are inspired by a customer who was facing timeouts in Kibana due to this endpoint being slow):Bash snippet
I also played around with some aspects of the script such as making the templates use policies or not, as that represents different scenarios that users might face.
I generated a flamegraph on
main
, which showed that theMetadataIndexTemplateService.findV2Template(...)
call was by far the most expensive.Flamegraph
After some experimentation, I ended up extracting a separate class that pre-computes some parts on initialization (i.e. only once per request) and then uses those pre-computed parts when calculating the usage for an individual policy. This is of course a trade-off between memory usage and run-time performance. But, I think the added memory usage (which shouldn't be crazy much) is worth the significant speed improvement here.
Fixes #105773