Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

Today CcrRepository#getRepositoryData blocks the calling thread
pending receipt of the full metadata of the remote cluster. It would be
preferable if it didn't get the full cluster metadata at all, but we can
at least remove the blocking wait here.

Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.
@DaveCTurner DaveCTurner added >bug WIP :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v7.17.5 v8.4.0 v8.2.3 v8.3.1 labels May 31, 2022
@DaveCTurner
Copy link
Contributor Author

This was #87016 but had to be reverted because it encountered #87237.

@DaveCTurner DaveCTurner marked this pull request as ready for review June 1, 2022 11:34
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 1, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner merged commit 64d716b into elastic:master Jun 1, 2022
@DaveCTurner DaveCTurner deleted the 2022-05-31-ccrrepository-blocking-redux branch June 1, 2022 11:34
@DaveCTurner
Copy link
Contributor Author

See #87016 for reviews - this PR had no changes from the approved one, it just had to be delayed until a fix for #87237 was merged.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jun 1, 2022
Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.17 Commit could not be cherrypicked due to conflicts
8.2 Commit could not be cherrypicked due to conflicts
8.3

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 87235

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jun 1, 2022
Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jun 1, 2022
Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.
elasticsearchmachine pushed a commit that referenced this pull request Jun 1, 2022
Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.
elasticsearchmachine pushed a commit that referenced this pull request Jun 1, 2022
Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.
elasticsearchmachine pushed a commit that referenced this pull request Jun 1, 2022
Today `CcrRepository#getRepositoryData` blocks the calling thread pending receipt of the full metadata of the remote cluster. It would be preferable if it didn't get the full cluster metadata at all, but we can at least remove the blocking wait here.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Nov 8, 2022
We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to elastic#87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Nov 8, 2022
Now that elastic#87235 makes the Ccr repository non-blocking, there's no need to fork in this action just like we don't fork in a normal restore operation.
original-brownbear added a commit that referenced this pull request Nov 17, 2022
Now that #87235 makes the Ccr repository non-blocking, there's no need to fork in this action just like we don't fork in a normal restore operation.
original-brownbear added a commit that referenced this pull request Nov 20, 2022
 We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to #87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 19, 2023
We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to elastic#87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.
elasticsearchmachine pushed a commit that referenced this pull request Apr 19, 2023
We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to #87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.17.5 v8.2.3 v8.3.1 v8.4.0

3 participants