Remove some blocking in CcrRepository #87235

DaveCTurner · 2022-05-31T08:06:54Z

Today CcrRepository#getRepositoryData blocks the calling thread
pending receipt of the full metadata of the remote cluster. It would be
preferable if it didn't get the full cluster metadata at all, but we can
at least remove the blocking wait here.

Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.

DaveCTurner · 2022-05-31T08:11:14Z

This was #87016 but had to be reverted because it encountered #87237.

elasticmachine · 2022-06-01T11:34:23Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2022-06-01T11:35:25Z

See #87016 for reviews - this PR had no changes from the approved one, it just had to be delayed until a fix for #87237 was merged.

Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.

elasticsearchmachine · 2022-06-01T11:35:54Z

💔 Backport failed

Status	Branch	Result
❌	7.17	Commit could not be cherrypicked due to conflicts
❌	8.2	Commit could not be cherrypicked due to conflicts
✅	8.3

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 87235

Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.

We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to elastic#87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.

Now that elastic#87235 makes the Ccr repository non-blocking, there's no need to fork in this action just like we don't fork in a normal restore operation.

Now that #87235 makes the Ccr repository non-blocking, there's no need to fork in this action just like we don't fork in a normal restore operation.

We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to #87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.

We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to elastic#87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.

We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to #87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.

DaveCTurner added >bug WIP :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v7.17.5 v8.4.0 v8.2.3 v8.3.1 labels May 31, 2022

DaveCTurner added auto-backport-and-merge and removed WIP labels Jun 1, 2022

DaveCTurner marked this pull request as ready for review June 1, 2022 11:34

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 1, 2022

DaveCTurner merged commit 64d716b into elastic:master Jun 1, 2022

DaveCTurner deleted the 2022-05-31-ccrrepository-blocking-redux branch June 1, 2022 11:34

DaveCTurner mentioned this pull request Jun 1, 2022

[8.3] Remove some blocking in CcrRepository (#87235) #87286

Merged

original-brownbear mentioned this pull request Nov 8, 2022

Deduplicate Heavy CCR Repository CS Requests #91398

Merged

original-brownbear mentioned this pull request Nov 8, 2022

Remove needless forking in TransportPutFollowAction #91405

Merged

original-brownbear mentioned this pull request Apr 19, 2023

Deduplicate Heavy CCR Repository CS Requests (#91398) #95372

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove some blocking in CcrRepository #87235

Remove some blocking in CcrRepository #87235

Uh oh!

DaveCTurner commented May 31, 2022

DaveCTurner commented May 31, 2022

elasticmachine commented Jun 1, 2022

DaveCTurner commented Jun 1, 2022

elasticsearchmachine commented Jun 1, 2022

Labels

3 participants

Remove some blocking in CcrRepository #87235

Remove some blocking in CcrRepository #87235

Uh oh!

Conversation

DaveCTurner commented May 31, 2022

DaveCTurner commented May 31, 2022

elasticmachine commented Jun 1, 2022

DaveCTurner commented Jun 1, 2022

elasticsearchmachine commented Jun 1, 2022

💔 Backport failed

Labels

3 participants