Skip to content

Conversation

JeremyDahlgren
Copy link
Contributor

@JeremyDahlgren JeremyDahlgren commented May 23, 2025

Returns an empty shard stats for shard entries where stats were unavailable in the case where a node has been restarted or left the cluster. The change adds a description field to the SnapshotIndexShardStatus class that is used to include a message indicating why the stats are empty. To reduce confusion when stats are empty the stats values will all be -1 except for the time values which will be zero. This is in addition to the new description field.

This change was motivated by a desire to reduce latency for getting the stats for currently running snapshots. The stats can still be loaded later from the repository via a _snapshot/<repository>/snapshot/_status call when the snapshot has completed.

An API docs update is in elasticsearch-specification PR #4410.

Closes ES-10982

Returns an empty shard stats for shard entries where stats were unavailable in the case where a node has been restarted or left the cluster. The change adds a 'description' field to the SnapshotIndexShardStatus class that is used to include a message indicating why the stats are empty. This change was motivated by a desire to reduce latency for getting the stats for currently running snapshots. The stats can still be loaded from the repository via a _snapshot/<repository>/snapshot/_status call. Closes ES-10982
@JeremyDahlgren JeremyDahlgren added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0 labels May 23, 2025
@JeremyDahlgren JeremyDahlgren marked this pull request as ready for review May 30, 2025 20:11
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @JeremyDahlgren, I've created a changelog YAML for you.

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. My only remaining concern is the to/fromXContent code. I never got that familiar with the logic. I think that it might be OK as is, since it's just reading and writing values. But the testing appears to be in SnapshotStatsTests via the createTestInstance() implementation and extending AbstractXContentTestCase. I wonder what would happen if the random number generation in createTestInstance() were extended to -1? For reference I was looking at the original commit that added the XContent code

@JeremyDahlgren
Copy link
Contributor Author

JeremyDahlgren commented Jun 3, 2025

Looks good. My only remaining concern is the to/fromXContent code. I never got that familiar with the logic. I think that it might be OK as is, since it's just reading and writing values. But the testing appears to be in SnapshotStatsTests via the createTestInstance() implementation and extending AbstractXContentTestCase. I wonder what would happen if the random number generation in createTestInstance() were extended to -1? For reference I was looking at the original commit that added the XContent code

I added an explicit test case for XContent round trip serialization for the missing stats, which failed initially and I added a check for it in the deserialization. Thanks for raising this. I also added an explicit XContent serialization test for the missing stats static method in SnapshotIndexShardStatus. The random number generation in createTestInstance() seems ok with the zero lower bound for normal values, while we use the -1 values as a special case for "missing" stats.

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an explicit test case for XContent round trip serialization for the missing stats, which failed initially and I added a check for in the deserialization. Thanks for raising this. I also added an explicit XContent serialization test for the missing stats static method in SnapshotIndexShardStatus. The random number generation in createTestInstance() seems ok with the zero lower bound for normal values, while we use the -1 values as a special case for "missing" stats.

Clever test extension, I like it. Yeah, your solution makes more sense, to test the particular set of missingStats values, not randomize -1 values with a bunch of other numbers, which wouldn't happen.

My last question is about those zeros you found in the fromXContent method, wondering where they got set (or the absence of being set, really) that way.

JeremyDahlgren and others added 3 commits June 3, 2025 16:11
…apshots/status/SnapshotStats.java Adjust forMissingStats() javadoc Co-authored-by: Dianna Hohensee <artemisapple@gmail.com>
Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, without the pre-existing behavior fix in b1fc83c

@DiannaHohensee
Copy link
Contributor

still lgtm 👍

@JeremyDahlgren JeremyDahlgren merged commit 1b49eab into elastic:main Jun 9, 2025
18 checks passed
benchaplin pushed a commit to benchaplin/elasticsearch that referenced this pull request Jun 9, 2025
…lastic#128399) Returns an empty shard stats for shard entries where stats were unavailable in the case where a node has been restarted or left the cluster. The change adds a 'description' field to the SnapshotIndexShardStatus class that is used to include a message indicating why the stats are empty. This change was motivated by a desire to reduce latency for getting the stats for currently running snapshots. The stats can still be loaded from the repository via a _snapshot/<repository>/snapshot/_status call. Closes ES-10982 Co-authored-by: Dianna Hohensee <artemisapple@gmail.com>
valeriy42 pushed a commit to valeriy42/elasticsearch that referenced this pull request Jun 12, 2025
…lastic#128399) Returns an empty shard stats for shard entries where stats were unavailable in the case where a node has been restarted or left the cluster. The change adds a 'description' field to the SnapshotIndexShardStatus class that is used to include a message indicating why the stats are empty. This change was motivated by a desire to reduce latency for getting the stats for currently running snapshots. The stats can still be loaded from the repository via a _snapshot/<repository>/snapshot/_status call. Closes ES-10982 Co-authored-by: Dianna Hohensee <artemisapple@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0

3 participants