Skip to content

Conversation

JeremyDahlgren
Copy link
Contributor

@JeremyDahlgren JeremyDahlgren commented Mar 14, 2025

Adds a new dynamic setting
TransportGetAllocationStatsAction.CACHE_TTL_SETTING "cluster.routing.allocation.stats.cache.ttl" to configure the time to live for cached NodeAllocationStats on the master. The default value is currently 1 minute per the suggestion in issue #110716.

Closes #110716

@JeremyDahlgren JeremyDahlgren added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team labels Mar 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @JeremyDahlgren, I've created a changelog YAML for you.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. I've left some suggestions.

@elasticsearchmachine
Copy link
Collaborator

Hi @JeremyDahlgren, I've created a changelog YAML for you.

Comment on lines +55 to +56
public static final Setting<TimeValue> CACHE_TTL_SETTING = Setting.timeSetting(
"cluster.routing.allocation.stats.cache.ttl",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What visibility/exposure should this setting have? I saw other cluster.routing.allocation.* settings in docs in cluster-level-shard-allocation-routing-settings.md. Should it be documented there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to add documentation, either in this PR or in follow up. Both are fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ would prefer to add the docs in the same PR, otherwise it tends to get forgotten

@JeremyDahlgren JeremyDahlgren marked this pull request as ready for review March 18, 2025 19:40
@JeremyDahlgren JeremyDahlgren requested a review from a team as a code owner March 18, 2025 19:40
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @JeremyDahlgren, I've created a changelog YAML for you.

Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Docs update can be a follow up

Comment on lines +55 to +56
public static final Setting<TimeValue> CACHE_TTL_SETTING = Setting.timeSetting(
"cluster.routing.allocation.stats.cache.ttl",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to add documentation, either in this PR or in follow up. Both are fine.

Comment on lines 233 to 240
Map<String, NodeAllocationStats> get() {

if (ttlMillis == 0L) {
return null;
}

final var stats = cachedStats.get();

if (stats == null || threadPool.relativeTimeInMillis() - stats.timestampMillis > ttlMillis) {
return null;
}

return stats.stats;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe remove empty lines

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny comments only, otherwise LGTM2

Comment on lines +55 to +56
public static final Setting<TimeValue> CACHE_TTL_SETTING = Setting.timeSetting(
"cluster.routing.allocation.stats.cache.ttl",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ would prefer to add the docs in the same PR, otherwise it tends to get forgotten

final var stats = cachedStats.get();

if (stats == null || threadPool.relativeTimeInMillis() - stats.timestampMillis > ttlMillis) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clear cachedStats here if it's expired? I guess we're just about to overwrite it with a fresh copy so doesn't really matter? If so, could we have a comment just noting this decision for the next person?

Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation update LGTM too

@JeremyDahlgren JeremyDahlgren added the auto-backport Automatically create backport pull requests when merged label Mar 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @JeremyDahlgren, I've created a changelog YAML for you.

@JeremyDahlgren JeremyDahlgren enabled auto-merge (squash) March 21, 2025 17:30
@JeremyDahlgren JeremyDahlgren merged commit d799597 into elastic:main Mar 21, 2025
16 of 17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

There are no branches to backport to. Aborting. 

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 124898

JeremyDahlgren added a commit to JeremyDahlgren/elasticsearch that referenced this pull request Mar 25, 2025
Adds a new cache and setting TransportGetAllocationStatsAction.CACHE_TTL_SETTING "cluster.routing.allocation.stats.cache.ttl" to configure the max age for cached NodeAllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
JeremyDahlgren added a commit to JeremyDahlgren/elasticsearch that referenced this pull request Mar 25, 2025
Adds a new cache and setting TransportGetAllocationStatsAction.CACHE_TTL_SETTING "cluster.routing.allocation.stats.cache.ttl" to configure the max age for cached NodeAllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
JeremyDahlgren added a commit that referenced this pull request Mar 25, 2025
…5588) Adds a new cache and setting TransportGetAllocationStatsAction.CACHE_TTL_SETTING "cluster.routing.allocation.stats.cache.ttl" to configure the max age for cached NodeAllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes #110716
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
Adds a new cache and setting TransportGetAllocationStatsAction.CACHE_TTL_SETTING "cluster.routing.allocation.stats.cache.ttl" to configure the max age for cached NodeAllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
@shainaraskas shainaraskas added the docs-missing-applies-tags PRs that are missing docs applies_to tags for an upcoming release. label Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) docs-missing-applies-tags PRs that are missing docs applies_to tags for an upcoming release. >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0

5 participants