- Notifications
You must be signed in to change notification settings - Fork 25.5k
Limit number of suppressed S3 deletion errors #123630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit number of suppressed S3 deletion errors #123630
Conversation
Hi @pxsalehi, I've created a changelog YAML for you. |
*/ | ||
static final int MAX_BULK_DELETES = 1000; | ||
| ||
static final int MAX_DELETE_EXCEPTIONS = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've picked this randomly more or less. Not sure what it should be. I'm not sure having more than this is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fine by me
return new S3BlobContainer(path, this); | ||
} | ||
| ||
record DeletionExceptions(Exception exception, int count) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we could have added this behaviour to useOrSuppress
but we'd need to each time fetch a copy of the suppressed errors to check the count, so I've done it only here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL Throwable#getSuppressed
is doing quite a bit more than just retrieving an array of exceptions. Still, this seems rather more complicated (and involves unnecessary allocations) vs just counting the exceptions in a separate AtomicInteger
. We don't even need atomicity either, we could just have a class with two mutable fields to keep track of the suppression depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure see 6b85c33.
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
var maxNoOfDeletions = 2 * S3BlobStore.MAX_DELETE_EXCEPTIONS; | ||
var blobs = randomList(1, maxNoOfDeletions * maxBulkDeleteSize, ESTestCase::randomIdentifier); | ||
blobContainer.deleteBlobsIgnoringIfNotExists(randomPurpose(), blobs.iterator()); | ||
fail("deletion should not succeed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer an expectThrows
here (around just the deleteBlobsIgnoringIfNotExists
call)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also done in 6b85c33.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Production code looks good, just one more thing about the new createBlobContainer
override. I could be persuaded if you feel strongly about it.
*/ | ||
static final int MAX_BULK_DELETES = 1000; | ||
| ||
static final int MAX_DELETE_EXCEPTIONS = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fine by me
modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java Outdated Show resolved Hide resolved
final @Nullable TimeValue readTimeout, | ||
final @Nullable Boolean disableChunkedEncoding, | ||
final @Nullable ByteSizeValue bufferSize, | ||
final @Nullable Integer maxBulkDeletes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather just have the one method rather than having to work out which variant is being called (and what the implied default params) are in each case. I know it adds a bunch of noise here but it'll be better in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, mostly because of the noise and the fact that this way of passing these arguments is terrible to extend. Speaking of the long-run, how about replacing these with a parameter object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some kind of builder would seem in order here, but maybe that can be deferred to a followup. Or do that first, reducing the noise in this PR. But either way, let's not have two methods that do almost the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
💔 Backport failed
You can use sqren/backport to manually backport by running |
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string to a length of 1000 characters. Relates: elastic#123630
post-merge LGTM I raised #123953 as a minor improvement for the relevant log message. |
Backporting to 7.17 seems a bit ugly. The code is too different and some settings the test relies on don't exit there. I'm gonna skip it. If you'd like to have it anyway, let me know. |
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: #123630
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: elastic#123630 (cherry picked from commit a9432ba) # Conflicts: # modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java # modules/repository-s3/src/test/java/org/elasticsearch/repositories/s3/S3BlobContainerRetriesTests.java
) * Limit the log line length for s3 deletion error (#123953) If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: #123630 (cherry picked from commit a9432ba) # Conflicts: # modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java # modules/repository-s3/src/test/java/org/elasticsearch/repositories/s3/S3BlobContainerRetriesTests.java * fix backport
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: elastic#123630
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: elastic#123630
We've seen this being an issue on 7.x although can happen on all
versions (I'm pretty sure this PR doesn't cleanly back-port to 7.x though).
Closes #123354