Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Nov 21, 2018

After #35332 has been merged, we noticed some test failures like #35597 in which one or more replica shards failed to be promoted as primaries because the primary replica re-synchronization never succeed.

After some digging it appeared that the execution of the resync action was blocked because of the presence of a global cluster block in the cluster state (in this case, the "no master" block), making the resync action to fail when executed on the primary.

Until #35332 such failures never happened because the TransportResyncReplicationAction is skipping the reroute phase, the only place where blocks were checked. Now with #35332 blocks are checked during reroute and also during the execution of the transport replication action on the primary. After some internal discussion, we decided that the TransportResyncReplicationAction should never be blocked. This action is part of the replica to primary promotion and makes sure that replicas are in sync and should not be blocked when the cluster state has no master or when the index is read only.

This pull request changes the TransportResyncReplicationAction to make obvious that it does not honor blocks. It also adds a simple test that fails if the resync action is blocked during the primary action execution.

Closes #35597

@tlrx tlrx added >enhancement v7.0.0 :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v6.6.0 labels Nov 21, 2018
@tlrx tlrx requested a review from ywelsch November 21, 2018 16:22
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


@Override
protected ClusterBlockLevel globalBlockLevel() {
// resync should never be blocked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add here "because it's an internal action"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@tlrx tlrx merged commit 11052b7 into elastic:master Nov 22, 2018
@tlrx tlrx deleted the resync-action-no-blocks branch November 22, 2018 09:50
@tlrx
Copy link
Member Author

tlrx commented Nov 22, 2018

Thanks @ywelsch

tlrx added a commit that referenced this pull request Nov 22, 2018
After #35332 has been merged, we noticed some test failures like #35597 in which one or more replica shards failed to be promoted as primaries because the primary replica re-synchronization never succeed. After some digging it appeared that the execution of the resync action was blocked because of the presence of a global cluster block in the cluster state (in this case, the "no master" block), making the resync action to fail when executed on the primary. Until #35332 such failures never happened because the TransportResyncReplicationAction is skipping the reroute phase, the only place where blocks were checked. Now with #35332 blocks are checked during reroute and also during the execution of the transport replication action on the primary. After some internal discussion, we decided that the TransportResyncReplicationAction should never be blocked. This action is part of the replica to primary promotion and makes sure that replicas are in sync and should not be blocked when the cluster state has no master or when the index is read only. This commit changes the TransportResyncReplicationAction to make obvious that it does not honor blocks. It also adds a simple test that fails if the resync action is blocked during the primary action execution. Closes #35597
@tlrx tlrx mentioned this pull request Dec 5, 2018
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement v6.6.0 v7.0.0-beta1

4 participants