- Notifications
You must be signed in to change notification settings - Fork 25.5k
Process ILM cluster state updates on another thread #123712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of processing cluster state updates on the cluster state applier thread, we fork to a different thread where ILM's runtime of processing the cluster state update does not affect the speed at which the cluster can apply new cluster states. That does not mean we don't need to optimize ILM's cluster state processing, as the overall amount of processing is generally unaffected by this fork approach (unless we skip some cluster states), but it does mean we're saving a significant amount of processing on the critical cluster state applier thread. Additionally, by running ILM's state processing asynchronously, we allow ILM to skip some cluster states if the management thread pool is saturated or ILM's processing is taking too long.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @nielsbauman, I've created a changelog YAML for you. |
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
@elasticmachine update branch |
if (currentState == null) { | ||
return; | ||
} | ||
triggerPolicies(currentState, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this throws? I'm worried that it could leave lastSeenState
set while processing is not actually happening, meaning that we'll never trigger another thread to do future processing. Maybe we should do something with either try-catch or try-finally here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are we okay with the fact that exceptions won't get propagated back to the ClusterApplierService
? It looks like that would just log and discard the exception anyway, so I guess it's okay. But maybe we should also explicitly do the equivalent log and discard behaviour here, rather than relying on whatever the executor is going to do with unhandled exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ Let's make this an AbstractRunnable
and also make it force execute. Otherwise if we run into a rejection we'll never do the work for a cluster state update and implicitly rely on the fact that there will be another update to get us out of the broken state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about AbstractRunnable
, thanks @original-brownbear !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, @PeteGillinElastic! And thanks both! I changed the runnable into an AbstractRunnable
to make it force execute and to take care of failures.
} | ||
// If the last seen cluster state changed while this thread was running, it means a new cluster state came in and we need to | ||
// process it. We do that by kicking off a new thread, which will pick up the new cluster state when the thread gets executed. | ||
processClusterState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of recursion here is making me slightly nervous. I was initially worried about stack overflow scenarios, but I think we're safe here because the recursive call will just schedule the work on another thread and return immediately, right? (Assuming we never call this with a direct executor, or at least never outside tests!) Then I started worrying about things like deadlock, but again it's probably safe since nothing is blocking, the execute call should return immediately (unless we've managed to fill the thread pool's work queue, in which case it'll throw some rejected task exception, and in that case the system is in bad trouble anyway). So maybe this is all okay?
An alternative would be to have a loop here instead of recursing. That would mean that we'd start the next update on the same thread immediately, rather than putting it to the back of the pool's work queue. I don't know whether that matters, it depends what other work the pool is used for. Given that you've already done a bunch of benchmarking with the current approach, maybe it's better to stick with that.
Sorry, I realize that I'm kind of thinking aloud in this comment! I figured it's the kind of thing that's worth working through carefully...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand what you're saying. I intentionally decided not to use a loop, as I wanted to avoid hoarding a thread in the management pool for potentially long times. I preferred requesting a new thread at the end of the queue (as we're ok with skipping some cluster state updates in ILM). I tried to avoid deadlocks etc. by doing things in the right order, but I get that the recursion can seem confusing and that it's more prone to bugs, so I'm open to other suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we're ok with skipping some cluster state updates in ILM
Just for the record, I think it would be possible to do this skipping with a loop. I wasn't thinking of putting the updates in a queue, I was thinking of having a do-while or something which gets the current state at the end and loops if it doesn't match the state it just processed.
However, I think that this:
I intentionally decided not to use a loop, as I wanted to avoid hoarding a thread in the management pool for potentially long times
is a reasonable point. And I think I've convinced myself that your recursion approach is safe. So let's go with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments :) This looks alright to me (assuming we fix the possible rejection of the management pool task that is :)) but someone from the actual area should ✅ this as well obviously. Thanks!
x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/IndexLifecycleService.java Outdated Show resolved Hide resolved
if (currentState == null) { | ||
return; | ||
} | ||
triggerPolicies(currentState, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ Let's make this an AbstractRunnable
and also make it force execute. Otherwise if we run into a rejection we'll never do the work for a cluster state update and implicitly rely on the fact that there will be another update to get us out of the broken state.
x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/IndexLifecycleService.java Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the new AbstractRunnable
stuff, I have just one small optional suggestion. Thanks for the cool PR!
Instead of processing cluster state updates on the cluster state applier thread, we fork to a different thread where ILM's runtime of processing the cluster state update does not affect the speed at which the cluster can apply new cluster states. That does not mean we don't need to optimize ILM's cluster state processing, as the overall amount of processing is generally unaffected by this fork approach (unless we skip some cluster states), but it does mean we're saving a significant amount of processing on the critical cluster state applier thread. Additionally, by running ILM's state processing asynchronously, we allow ILM to skip some cluster states if the management thread pool is saturated or ILM's processing is taking too long.
Instead of processing cluster state updates on the cluster state applier thread, we fork to a different thread where ILM's runtime of processing the cluster state update does not affect the speed at which the cluster can apply new cluster states. That does not mean we don't need to optimize ILM's cluster state processing, as the overall amount of processing is generally unaffected by this fork approach (unless we skip some cluster states), but it does mean we're saving a significant amount of processing on the critical cluster state applier thread.
Additionally, by running ILM's state processing asynchronously, we allow ILM to skip some cluster states if the management thread pool is saturated or ILM's processing is taking too long.
Here's a flamegraph taken while the many-shards benchmark was at

40%
of theinitial-indices
phase (i.e. when it's adding the 50000 indices as part of the startup), running on commit e6eb8ef (i.e. the last commit on main before multi-project was merged), and a profiler duration of 10 minutes.And here's one with the same profiling setup, but with these changes -- I ran this benchmark with my changes rebased on the aforementioned commit instead of the current

main
, as I didn't want the multi-project changes to influence my results.The roughly 5% of the total processing time that ILM takes up is moved off of the cluster applier thread and onto a different one.
For completeness, that processing does indeed still show up in the second flamegraph here:
