Skip to content

Conversation

pxsalehi
Copy link
Member

@pxsalehi pxsalehi commented Feb 27, 2025

We've seen this being an issue on 7.x although can happen on all
versions (I'm pretty sure this PR doesn't cleanly back-port to 7.x though).

Closes #123354

@pxsalehi pxsalehi added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Feb 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @pxsalehi, I've created a changelog YAML for you.

*/
static final int MAX_BULK_DELETES = 1000;

static final int MAX_DELETE_EXCEPTIONS = 10;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've picked this randomly more or less. Not sure what it should be. I'm not sure having more than this is useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 fine by me

return new S3BlobContainer(path, this);
}

record DeletionExceptions(Exception exception, int count) {}
Copy link
Member Author

@pxsalehi pxsalehi Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we could have added this behaviour to useOrSuppress but we'd need to each time fetch a copy of the suppressed errors to check the count, so I've done it only here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL Throwable#getSuppressed is doing quite a bit more than just retrieving an array of exceptions. Still, this seems rather more complicated (and involves unnecessary allocations) vs just counting the exceptions in a separate AtomicInteger. We don't even need atomicity either, we could just have a class with two mutable fields to keep track of the suppression depth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure see 6b85c33.

@pxsalehi pxsalehi marked this pull request as ready for review March 3, 2025 09:23
@pxsalehi pxsalehi requested review from DaveCTurner and ywangd March 3, 2025 09:23
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Mar 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

var maxNoOfDeletions = 2 * S3BlobStore.MAX_DELETE_EXCEPTIONS;
var blobs = randomList(1, maxNoOfDeletions * maxBulkDeleteSize, ESTestCase::randomIdentifier);
blobContainer.deleteBlobsIgnoringIfNotExists(randomPurpose(), blobs.iterator());
fail("deletion should not succeed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer an expectThrows here (around just the deleteBlobsIgnoringIfNotExists call)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also done in 6b85c33.

@pxsalehi pxsalehi requested a review from DaveCTurner March 3, 2025 11:14
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Production code looks good, just one more thing about the new createBlobContainer override. I could be persuaded if you feel strongly about it.

*/
static final int MAX_BULK_DELETES = 1000;

static final int MAX_DELETE_EXCEPTIONS = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 fine by me

final @Nullable TimeValue readTimeout,
final @Nullable Boolean disableChunkedEncoding,
final @Nullable ByteSizeValue bufferSize,
final @Nullable Integer maxBulkDeletes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather just have the one method rather than having to work out which variant is being called (and what the implied default params) are in each case. I know it adds a bunch of noise here but it'll be better in the long run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, mostly because of the noise and the fact that this way of passing these arguments is terrible to extend. Speaking of the long-run, how about replacing these with a parameter object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of builder would seem in order here, but maybe that can be deferred to a followup. Or do that first, reducing the noise in this PR. But either way, let's not have two methods that do almost the same thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pxsalehi pxsalehi requested a review from DaveCTurner March 3, 2025 12:13
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pxsalehi pxsalehi added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport Automatically create backport pull requests when merged labels Mar 3, 2025
@pxsalehi
Copy link
Member Author

pxsalehi commented Mar 3, 2025

:qa:rolling-upgrade-legacy:v9.1.0#oldClusterTest is failing. Not sure even what that is, and can't find anything besides some gradle error in the build log. I'll merge main into this to pull in a few gradle/build related commits.

@elasticsearchmachine elasticsearchmachine merged commit 113f0c1 into elastic:main Mar 3, 2025
17 checks passed
@pxsalehi pxsalehi deleted the ps250227-capDeletionSuppressedErrors branch March 3, 2025 16:06
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts
7.17 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 123630

ywangd added a commit to ywangd/elasticsearch that referenced this pull request Mar 4, 2025
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string to a length of 1000 characters. Relates: elastic#123630
@ywangd
Copy link
Member

ywangd commented Mar 4, 2025

post-merge LGTM

I raised #123953 as a minor improvement for the relevant log message.

pxsalehi added a commit to pxsalehi/elasticsearch that referenced this pull request Mar 4, 2025
pxsalehi added a commit to pxsalehi/elasticsearch that referenced this pull request Mar 4, 2025
@pxsalehi
Copy link
Member Author

pxsalehi commented Mar 4, 2025

Backporting to 7.17 seems a bit ugly. The code is too different and some settings the test relies on don't exit there. I'm gonna skip it. If you'd like to have it anyway, let me know.

@pxsalehi pxsalehi removed the v7.17.29 label Mar 4, 2025
pxsalehi added a commit to pxsalehi/elasticsearch that referenced this pull request Mar 4, 2025
elasticsearchmachine pushed a commit that referenced this pull request Mar 6, 2025
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: #123630
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Mar 6, 2025
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: elastic#123630 (cherry picked from commit a9432ba) # Conflicts: #	modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java #	modules/repository-s3/src/test/java/org/elasticsearch/repositories/s3/S3BlobContainerRetriesTests.java
elasticsearchmachine pushed a commit that referenced this pull request Mar 6, 2025
) * Limit the log line length for s3 deletion error (#123953) If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: #123630 (cherry picked from commit a9432ba) # Conflicts: #	modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java #	modules/repository-s3/src/test/java/org/elasticsearch/repositories/s3/S3BlobContainerRetriesTests.java * fix backport
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: elastic#123630
costin pushed a commit to costin/elasticsearch that referenced this pull request Mar 15, 2025
If all items fail to delete, the current warning log builds a huge string to include all of them (up to 1000). This PR limits the string length to first 10 entries. Relates: elastic#123630
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team v8.19.0 v9.1.0

4 participants