- Notifications
You must be signed in to change notification settings - Fork 25.5k
Support per-project s3 clients #127631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support per-project s3 clients #127631
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
@elasticmachine update branch |
aef9dde
to fdc9c1d
Compare } | ||
| ||
@Override | ||
public void applyClusterState(ClusterChangedEvent event) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this to be an applier instead of a listener so that the repository (also created in an applier) is fully functional in any listeners of the same cluster state. This is necessary so that the repository can be used to bootstrapping a new project.
@DaveCTurner Could you please review this PR when convenient? It's based on the discussion we had on #126584 and has a reduced scope for just s3 first. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather we treated a single-project cluster as a special case of a multi-project cluster that only has one project, rather than introducing a whole different way to manage clients for this case. I could perhaps be convinced otherwise, but today this all just looks like it duplicates everything (client lifecycles etc) and any differences in behaviour are lost.
@Nullable // if multi-project is disabled | ||
private final S3PerProjectClientManager s3PerProjectClientManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this to be @Nullable
? I'd rather we didn't have to think about the null
case, nor to have such different code-paths. Does something break if we use the per-project manager even in the single-project case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few reasons that I chose for managing them separately:
- Existing clients are initialized via the
ReloadablePlugin
interface, while project clients is a cluster state applier. I didn't want to mix them to avoid contention. - Existing clients allow overriding client settings with repository settings which makes the settings management more complex. We agreed to not support such override for project clients. This makes the new code simpler when separated and retrieving client can be simply by the name.
- We have a cluster level blobstore in the MP setup. I feel it is better for the client to be managed on its own with the existing code.
That said, I can give it a go to see whether we can always have the client manager for all clients. For the above points, I think
- I used to have synchronized keyword in the applier method. This is no longer the case. So the contention should be acceptable with just CHM opeations. Also the Applier method and ReloadablePlugin interface should never operate against the same entry.
- We can probably manage this complexity in the manager and the ClientsHolder. Still not support override for project clients.
- We can store the cluster level clients against the
default
project ID. Functional wise, clients for the default project store and clients for the cluster are identical.
Does it make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see that there's some differences, I'm just surprised it needs to have such divergent code paths. I'd rather we had a S3ClientManager
that looked after all the clients, both project-level and cluster-level. In particular I agree that forbidding repository-level overrides makes things a little simpler to reason about but that shouldn't come at the cost of this amount of extra code complexity/duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR has been updated to allow the S3ClientsManager
taking care of all clients.
threadPool, | ||
DiscoveryNodeUtils.create("node", "node"), | ||
providedSettings, | ||
new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit odd to use Settings.EMPTY
here rather than providedSettings
although this is what happens in other createClusterService
overrides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah these utility methods are a bit odd. Even if we use providedSettings
here, it is still not accurate since the final settings have something more that gets added in the delegate method. It's the reason that I chose Settings.EMPTY
so it is obvious that the settings are empty if it ever triggers a test failure.
ResourceWatcherService resourceWatcherService, | ||
Supplier<Region> defaultRegionSupplier | ||
) { | ||
final Settings settings = clusterService.getSettings(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I liked the name nodeSettings
in this context, it can otherwise be confusing whether we're talking about node-level settings or repository settings from the RepositoryMetadata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
| ||
private void closeClientsAsync(List<ClientsHolder> clientsHoldersToClose, ActionListener<Void> listener) { | ||
executor.execute(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please use an AbstractRunnable
(here ActionRunnable.run
) rather than executing a bare Runnable
, so that exceptions (e.g. a rejection) still complete the listener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have learnt this by now. Thanks for spotting it. See 0f1bdd5
final CountDownLatch latch = new CountDownLatch(1); | ||
currentClientsCloseListener.addListener(ActionListener.running(latch::countDown)); | ||
try { | ||
if (latch.await(1, TimeUnit.MINUTES) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we need to wait here? With S3Service
today we just decref the clients and move on, allowing any ongoing uses to finish and shut down separately. Is that not ok here?
If so, why 1 minute? This needs to be configurable in case we need to change it. And if it's ok to proceed after a minute why wait at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wait here is to ensure S3Service
does not close until all asyncly closed clients (from the applier threads) have finished closing per your comment. But maybe I misunderstood what you were suggesting?
The 1 minute is meant to be a best effort to close things cleanly similar to the wait for IndicesService to close in 1min in Stateless.java
. It's unlike to happen. If somehow a close call does stall, it's not fatal and should not prevent progress (decRef
can still trigger closeInternal
and close both S3Client
and SdkHttpClient
inline which might take a bit longer?). I can make it configurable if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I now wonder what I meant there. Tho at the time I wrote that these things had a kinda weird lifecycle, fixed in the upgrade to SDKv2, so maybe I was worried about that. As things stand now it should be safe to just decref each client and move on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be safe to just decref each client and move on
The async closing is to avoid having synchronized method calls in the applier thread. I think this point still stands. Do you mean we no longer need to track the close listeners and assume close calls are completed fast enough so that there is no need to wait for them to finish when the node shuts down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, +1 to async closing still, just I don't see a need to block the shutdown here. We're not assuming that close calls happen fast enough, we're relying on the refcounting and other lifecycle management to do the right thing even for clients that other subsystems happen to be using while we're shutting down.
@elasticmachine update branch |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
With project secrets in place, we can now create and manage per-project repository clients in addition to the cluster level repository clients. This PR adds a manager class for that. Note that the logic is not yet fully wired because it needs per-project repository/objec_store which will be added in separate PRs (as part of MP snapshots and MP objecStoreService). As such the tests are currently unit tests. Relates: elastic#126584 Resolves: ES-11713
With project secrets in place, we can now create and manage per-project repository clients in addition to the cluster level repository clients. This PR adds a manager class for that. Note that the logic is not yet fully wired because it needs per-project repository/objec_store which will be added in separate PRs (as part of MP snapshots and MP objecStoreService). As such the tests are currently unit tests. Relates: elastic#126584 Resolves: ES-11713
Enables creating different s3 clients for different projects Relates: #127631
The stateful side tests generally do not exercise multi-project. Therefore we need to use the default project-id instead of random one. Relates: elastic#127631
The stateful side tests generally do not exercise multi-project. Therefore we need to use the default project-id instead of random one. Relates: #127631
Enables creating different s3 clients for different projects Relates: elastic#127631
The stateful side tests generally do not exercise multi-project. Therefore we need to use the default project-id instead of random one. Relates: elastic#127631
Enables creating different s3 clients for different projects Relates: elastic#127631
The stateful side tests generally do not exercise multi-project. Therefore we need to use the default project-id instead of random one. Relates: elastic#127631
Similar to #127631, this PR adds per project client support for GCS repository.
With project secrets in place, we can now create and manage per-project repository clients in addition to the cluster level repository clients. This PR adds a manager class for that. Note that the logic is not yet fully wired because it needs per-project repository/objec_store which will be added in separate PRs (as part of MP snapshots and MP objecStoreService). As such the tests are currently unit tests.
Relates: #126584
Resolves: ES-11713