Support per-project s3 clients #127631

ywangd · 2025-05-02T11:13:11Z

With project secrets in place, we can now create and manage per-project repository clients in addition to the cluster level repository clients. This PR adds a manager class for that. Note that the logic is not yet fully wired because it needs per-project repository/objec_store which will be added in separate PRs (as part of MP snapshots and MP objecStoreService). As such the tests are currently unit tests.

Relates: #126584
Resolves: ES-11713

elasticsearchmachine · 2025-05-05T01:40:34Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

ywangd · 2025-05-05T02:28:28Z

@elasticmachine update branch

ywangd · 2025-05-09T03:20:19Z

...repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3PerProjectClientManager.java

+ }
+
+ @Override
+ public void applyClusterState(ClusterChangedEvent event) {


Change this to be an applier instead of a listener so that the repository (also created in an applier) is fully functional in any listeners of the same cluster state. This is necessary so that the repository can be used to bootstrapping a new project.

ywangd · 2025-05-20T02:06:03Z

@DaveCTurner Could you please review this PR when convenient? It's based on the discussion we had on #126584 and has a reduced scope for just s3 first. Thanks!

DaveCTurner

I would rather we treated a single-project cluster as a special case of a multi-project cluster that only has one project, rather than introducing a whole different way to manage clients for this case. I could perhaps be convinced otherwise, but today this all just looks like it duplicates everything (client lifecycles etc) and any differences in behaviour are lost.

DaveCTurner · 2025-05-20T06:25:05Z

modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Service.java

+ @Nullable // if multi-project is disabled
+ private final S3PerProjectClientManager s3PerProjectClientManager;


Do we really need this to be @Nullable? I'd rather we didn't have to think about the null case, nor to have such different code-paths. Does something break if we use the per-project manager even in the single-project case?

There are a few reasons that I chose for managing them separately:

Existing clients are initialized via the ReloadablePlugin interface, while project clients is a cluster state applier. I didn't want to mix them to avoid contention.

Existing clients allow overriding client settings with repository settings which makes the settings management more complex. We agreed to not support such override for project clients. This makes the new code simpler when separated and retrieving client can be simply by the name.

We have a cluster level blobstore in the MP setup. I feel it is better for the client to be managed on its own with the existing code.

That said, I can give it a go to see whether we can always have the client manager for all clients. For the above points, I think

I used to have synchronized keyword in the applier method. This is no longer the case. So the contention should be acceptable with just CHM opeations. Also the Applier method and ReloadablePlugin interface should never operate against the same entry.

We can probably manage this complexity in the manager and the ClientsHolder. Still not support override for project clients.

We can store the cluster level clients against the default project ID. Functional wise, clients for the default project store and clients for the cluster are identical.

Does it make sense to you?

Yeah I see that there's some differences, I'm just surprised it needs to have such divergent code paths. I'd rather we had a S3ClientManager that looked after all the clients, both project-level and cluster-level. In particular I agree that forbidding repository-level overrides makes things a little simpler to reason about but that shouldn't come at the cost of this amount of extra code complexity/duplication.

The PR has been updated to allow the S3ClientsManager taking care of all clients.

DaveCTurner · 2025-05-20T06:29:49Z

test/framework/src/main/java/org/elasticsearch/test/ClusterServiceUtils.java

+ threadPool,
+ DiscoveryNodeUtils.create("node", "node"),
+ providedSettings,
+ new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS)


Seems a bit odd to use Settings.EMPTY here rather than providedSettings although this is what happens in other createClusterService overrides.

Yeah these utility methods are a bit odd. Even if we use providedSettings here, it is still not accurate since the final settings have something more that gets added in the delegate method. It's the reason that I chose Settings.EMPTY so it is obvious that the settings are empty if it ever triggers a test failure.

DaveCTurner · 2025-05-20T06:30:47Z

modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Service.java

 ResourceWatcherService resourceWatcherService,
 Supplier<Region> defaultRegionSupplier
 ) {
+ final Settings settings = clusterService.getSettings();


nit: I liked the name nodeSettings in this context, it can otherwise be confusing whether we're talking about node-level settings or repository settings from the RepositoryMetadata.

DaveCTurner · 2025-05-20T06:34:28Z

...repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3PerProjectClientManager.java

+ }
+
+ private void closeClientsAsync(List<ClientsHolder> clientsHoldersToClose, ActionListener<Void> listener) {
+ executor.execute(() -> {


Nit: please use an AbstractRunnable (here ActionRunnable.run) rather than executing a bare Runnable, so that exceptions (e.g. a rejection) still complete the listener.

I should have learnt this by now. Thanks for spotting it. See 0f1bdd5

DaveCTurner · 2025-05-20T06:38:44Z

...repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3PerProjectClientManager.java

+ final CountDownLatch latch = new CountDownLatch(1);
+ currentClientsCloseListener.addListener(ActionListener.running(latch::countDown));
+ try {
+ if (latch.await(1, TimeUnit.MINUTES) == false) {


Are you sure we need to wait here? With S3Service today we just decref the clients and move on, allowing any ongoing uses to finish and shut down separately. Is that not ok here?

If so, why 1 minute? This needs to be configurable in case we need to change it. And if it's ok to proceed after a minute why wait at all?

The wait here is to ensure S3Service does not close until all asyncly closed clients (from the applier threads) have finished closing per your comment. But maybe I misunderstood what you were suggesting?

The 1 minute is meant to be a best effort to close things cleanly similar to the wait for IndicesService to close in 1min in Stateless.java. It's unlike to happen. If somehow a close call does stall, it's not fatal and should not prevent progress (decRef can still trigger closeInternal and close both S3Client and SdkHttpClient inline which might take a bit longer?). I can make it configurable if necessary.

Hm I now wonder what I meant there. Tho at the time I wrote that these things had a kinda weird lifecycle, fixed in the upgrade to SDKv2, so maybe I was worried about that. As things stand now it should be safe to just decref each client and move on.

it should be safe to just decref each client and move on

The async closing is to avoid having synchronized method calls in the applier thread. I think this point still stands. Do you mean we no longer need to track the close listeners and assume close calls are completed fast enough so that there is no need to wait for them to finish when the node shuts down?

Yes, +1 to async closing still, just I don't see a need to block the shutdown here. We're not assuming that close calls happen fast enough, we're relying on the refcounting and other lifecycle management to do the right thing even for clients that other subsystems happen to be using while we're shutting down.

ywangd · 2025-05-29T10:12:09Z

@elasticmachine update branch

ywangd · 2025-05-30T07:05:14Z

@elasticmachine update branch

DaveCTurner

LGTM

ywangd · 2025-05-30T11:39:08Z

@elasticmachine update branch

With project secrets in place, we can now create and manage per-project repository clients in addition to the cluster level repository clients. This PR adds a manager class for that. Note that the logic is not yet fully wired because it needs per-project repository/objec_store which will be added in separate PRs (as part of MP snapshots and MP objecStoreService). As such the tests are currently unit tests. Relates: elastic#126584 Resolves: ES-11713

Enables creating different s3 clients for different projects Relates: #127631

The stateful side tests generally do not exercise multi-project. Therefore we need to use the default project-id instead of random one. Relates: elastic#127631

The stateful side tests generally do not exercise multi-project. Therefore we need to use the default project-id instead of random one. Relates: #127631

Enables creating different s3 clients for different projects Relates: elastic#127631

The stateful side tests generally do not exercise multi-project. Therefore we need to use the default project-id instead of random one. Relates: elastic#127631

Enables creating different s3 clients for different projects Relates: elastic#127631

The stateful side tests generally do not exercise multi-project. Therefore we need to use the default project-id instead of random one. Relates: elastic#127631

Similar to #127631, this PR adds per project client support for GCS repository.

This PR adds per project client support for Azure repository similar to #131899 (GCS) and #127631 (s3). Resolves: ES-11714

elasticsearchmachine added the v9.1.0 label May 2, 2025

ywangd added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels May 5, 2025

ywangd requested review from DaveCTurner and henningandersen May 5, 2025 01:39

ywangd marked this pull request as ready for review May 5, 2025 01:40

ywangd requested a review from a team as a code owner May 5, 2025 01:40

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label May 5, 2025

ywangd requested a review from a team as a code owner May 9, 2025 03:17

ywangd added 5 commits May 9, 2025 13:18

WIP

33dc439

add more tests

a8dc278

tweaks

a318fc8

add jira issue

c3b36e7

applier

fdc9c1d

ywangd force-pushed the per-project-s3-clients branch from aef9dde to fdc9c1d Compare May 9, 2025 03:18

ywangd removed request for a team May 9, 2025 03:18

ywangd commented May 9, 2025

View reviewed changes

ywangd requested a review from pxsalehi May 9, 2025 03:20

Merge remote-tracking branch 'origin/main' into per-project-s3-clients

acf6da4

DaveCTurner reviewed May 20, 2025

View reviewed changes

ywangd added 6 commits May 20, 2025 18:19

abstract runnable

0f1bdd5

nodeSettings

9bc5d09

Merge remote-tracking branch 'origin/main' into per-project-s3-clients

d665303

generic

55c697c

consolidate

833e085

no wait for async close

4d7bd69

ywangd added 3 commits May 29, 2025 11:33

tweak

ef5631b

comments

9066af3

fix test

4048064

elasticmachine and others added 2 commits May 29, 2025 13:12

Merge branch 'main' into per-project-s3-clients

38ee3b0

comment

e1862dd

Merge branch 'main' into per-project-s3-clients

9ab977b

DaveCTurner approved these changes May 30, 2025

View reviewed changes

ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 30, 2025

Merge branch 'main' into per-project-s3-clients

d3c0dbb

elasticsearchmachine merged commit 6311bb3 into elastic:main May 30, 2025
18 checks passed

ywangd deleted the per-project-s3-clients branch May 30, 2025 12:54

ywangd mentioned this pull request Jun 12, 2025

Passing in project-id when creating s3 client #129301

Merged

elasticsearchmachine pushed a commit that referenced this pull request Jun 19, 2025

Passing in project-id when creating s3 client (#129301)

92b32b5

Enables creating different s3 clients for different projects Relates: #127631

ywangd mentioned this pull request Jun 19, 2025

[Test] Use default project s3 repo tests #129683

Merged

kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Jun 23, 2025

Passing in project-id when creating s3 client (elastic#129301)

d1b87a3

Enables creating different s3 clients for different projects Relates: elastic#127631

mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025

Passing in project-id when creating s3 client (elastic#129301)

ff7b1ce

Enables creating different s3 clients for different projects Relates: elastic#127631

ywangd mentioned this pull request Jul 28, 2025

Support per-project client for GCS repository #131899

Merged

elasticsearchmachine pushed a commit that referenced this pull request Aug 4, 2025

Support per-project client for GCS repository (#131899)

2559e28

Similar to #127631, this PR adds per project client support for GCS repository.

ywangd mentioned this pull request Aug 12, 2025

Support per-project client for Azure repository #132701

Merged

elasticsearchmachine pushed a commit that referenced this pull request Aug 13, 2025

Support per-project client for Azure repository (#132701)

19b61ef

This PR adds per project client support for Azure repository similar to #131899 (GCS) and #127631 (s3). Resolves: ES-11714

ywangd mentioned this pull request Aug 14, 2025

POC: Add support for per project repo client #126584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support per-project s3 clients #127631

Support per-project s3 clients #127631

Uh oh!

ywangd commented May 2, 2025 •

edited

Loading

elasticsearchmachine commented May 5, 2025

ywangd commented May 5, 2025

ywangd May 9, 2025

ywangd commented May 20, 2025

DaveCTurner left a comment

DaveCTurner May 20, 2025

ywangd May 20, 2025 •

edited

Loading

DaveCTurner May 20, 2025

ywangd May 21, 2025

DaveCTurner May 20, 2025

ywangd May 20, 2025

DaveCTurner May 20, 2025

ywangd May 20, 2025

DaveCTurner May 20, 2025

ywangd May 20, 2025

DaveCTurner May 20, 2025

ywangd May 20, 2025

DaveCTurner May 20, 2025

ywangd May 20, 2025

DaveCTurner May 20, 2025

ywangd commented May 29, 2025

ywangd commented May 30, 2025

DaveCTurner left a comment

ywangd commented May 30, 2025

Uh oh!

Labels

4 participants

		@Nullable // if multi-project is disabled
		private final S3PerProjectClientManager s3PerProjectClientManager;

Support per-project s3 clients #127631

Support per-project s3 clients #127631

Uh oh!

Conversation

ywangd commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented May 5, 2025

ywangd commented May 5, 2025

Choose a reason for hiding this comment

ywangd commented May 20, 2025

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd commented May 29, 2025

ywangd commented May 30, 2025

DaveCTurner left a comment

Choose a reason for hiding this comment

ywangd commented May 30, 2025

Uh oh!

Labels

4 participants

ywangd commented May 2, 2025 •

edited

Loading

ywangd May 20, 2025 •

edited

Loading