- Notifications
You must be signed in to change notification settings - Fork 25.7k
Expose operation and request counts separately in repository stats #117530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose operation and request counts separately in repository stats #117530
Conversation
…ng_stats_endpoints # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
| Hi @nicktindall, I've created a changelog YAML for you. |
| Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
server/src/main/java/org/elasticsearch/common/blobstore/EndpointStats.java Outdated Show resolved Hide resolved
…ng_stats_endpoints # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
| It's not immediately obvious to me. Does this PR change any user visible behaviour, i.e. GetRepositoriesMetering API? If the impact is only to the internal API, we should not label it as |
It does, because now S3 |
…ng_stats_endpoints
ywangd left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great. I only have some minor comments.
| EndpointStats addTo(EndpointStats other) { | ||
| long ops = operations.sum() + other.operations(); | ||
| long reqs = requests.sum() + other.requests(); | ||
| return new EndpointStats(ops, reqs); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Why do we need this method given we already have EndpointStats#add which also uses Math.addExact for overflow detection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it was just slightly lighter-weight, so rather than doing
return new EndpointStats(operations.sum(), requests.sum()).add(other);which would create two EndpointStats just to do the addition (seemed excessive to just add some longs), I figured this would just create a single new one with the new values. It may be premature optimisation. At very least I should be doing addExact here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 9824022
| Map<String, EndpointStats> toMap() { | ||
| final Map<String, EndpointStats> results = new HashMap<>(); | ||
| final long getOperations = getCount.get(); | ||
| results.put("GetObject", new EndpointStats(getOperations, getOperations)); | ||
| final long listOperations = listCount.get(); | ||
| results.put("ListObjects", new EndpointStats(listOperations, listOperations)); | ||
| final long insertOperations = postCount.get() + putCount.get(); | ||
| results.put("InsertObject", new EndpointStats(insertOperations, insertOperations)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is temporary and needs to be fixed later to differentiate between requests and operations? If so, might worth to raise a ticket and add a TODO with the ticket number here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will do. You're right it's just to meet the requirements of the interface, at the moment GCS doesn't track those numbers separately so we fudge it by returning the single value we do have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 8143b65
| * @param operations The number of calls | ||
| * @param requests The number of calls (including retries) | ||
| */ | ||
| public record EndpointStats(long operations, long requests) implements Writeable, ToXContentObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ActionStats? We have similar classes such as TransportActionStats. Maybe this could be BlobStoreActionStats if we don't mind verbosity. Endpoint feels a bit too generic and it does not align with the fact that we categarize based on purpose as well (in addition to operation/url).
| } | ||
| | ||
| public EndpointStats { | ||
| assert operations >= 0 && requests >= 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also assert requests >= operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 50a9551
| out.writeLong(operations); | ||
| out.writeLong(requests); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are always non-negative, we can use writeVLong to save a few bytes
| out.writeLong(operations); | |
| out.writeLong(requests); | |
| out.writeVLong(operations); | |
| out.writeVLong(requests); |
Similar change will be needed on the read side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 0234b14
| @Override | ||
| public boolean equals(Object o) { | ||
| if (o instanceof EndpointStats other) { | ||
| return requests == other.requests && operations == other.operations; | ||
| } | ||
| return false; | ||
| } | ||
| | ||
| @Override | ||
| public int hashCode() { | ||
| return Objects.hash(operations, requests); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these given it is a record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good pickup, no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 0143979
| public static final RepositoryStats EMPTY_STATS = new RepositoryStats(Collections.emptyMap()); | ||
| | ||
| public final Map<String, Long> requestCounts; | ||
| public final Map<String, EndpointStats> requestCounts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this field accordingly? It gets a bit confusing reading through other parts of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 30968f5
| assertEquals(statsMapString, 1L, stats.get(statsKey(purpose, AzureBlobStore.Operation.PUT_BLOB)).operations()); | ||
| assertEquals(statsMapString, 1L, stats.get(statsKey(purpose, AzureBlobStore.Operation.LIST_BLOBS)).operations()); | ||
| assertEquals(statsMapString, 1L, stats.get(statsKey(purpose, AzureBlobStore.Operation.GET_BLOB_PROPERTIES)).operations()); | ||
| assertEquals(statsMapString, 1L, stats.get(statsKey(purpose, AzureBlobStore.Operation.PUT_BLOCK)).operations()); | ||
| assertEquals(statsMapString, 1L, stats.get(statsKey(purpose, AzureBlobStore.Operation.PUT_BLOCK_LIST)).operations()); | ||
| assertEquals(statsMapString, 1L, stats.get(statsKey(purpose, AzureBlobStore.Operation.BLOB_BATCH)).operations()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a test for .requests() as well to maybe show how they behave when there are retries? Or is it already covered somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind-of implicitly tested elsewhere because we use requests() to compare against the counts maintained by org.elasticsearch.repositories.azure.AzureBlobStoreRepositoryTests.AzureHTTPStatsCollectorHandler, but I added an explicit test in 90e7061
Thanks for explanation. Since there are technically visible changes but trivial enough, maybe label it as |
| Hi @nicktindall, I've created a changelog YAML for you. |
…ng_stats_endpoints
…ng_stats_endpoints
ywangd left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| private void assertRequestCountersAccountedForReads(RepositoryStatsSnapshot statsSnapshot) { | ||
| RepositoryStats repositoryStats = statsSnapshot.getRepositoryStats(); | ||
| Map<String, Long> requestCounts = repositoryStats.requestCounts; | ||
| Map<String, BlobStoreActionStats> requestCounts = repositoryStats.actionStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we also rename the requestCounts here and other places in this test class to something like actionStats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in ca36b82
We discovered that the "request counts" for blob store operations were inconsistently reported between Azure and S3 implementations. S3 counted the requests (including retries) and Azure counted operations (not including retries).
This PR changes the
BlobStoreimplementations to report separateoperationsandrequestscounts, so we can use operations counts for places where we're interested in reporting costs (e.g. the metering endpoint, also changed in this PR) and we can use request counts for reporting behaviour/performance (e.g. benchmarking, that might be interested in an increase in throttled requests).The stats for each
BlobStoreoperation will represented by anEndpointStatsobject which includes two values:requests, the request countoperations, the operation countRelates ES-9767
Fixes #104443