Skip to content

Conversation

@pxsalehi
Copy link
Member

@pxsalehi pxsalehi commented Feb 10, 2023

  • Process new ClusterInfo only when the cluster state is recovered, since we need to know the indices/nodes in the cluster for this.
  • Avoid missing node IDs that could result from a REPLACE shutdown metadata.

Both of these could lead to NPEs which is not fatal in 8.7/8.8, but could lead to unresolved listeners on 7.17.

@pxsalehi pxsalehi added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Feb 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @pxsalehi, I've created a changelog YAML for you.

@pxsalehi pxsalehi marked this pull request as ready for review February 10, 2023 13:05
@pxsalehi pxsalehi requested a review from DaveCTurner February 10, 2023 13:05
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 10, 2023
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes look good but I think we should have some tests too - see DiskThresholdMonitorTests

@pxsalehi pxsalehi requested a review from DaveCTurner February 13, 2023 10:26
@pxsalehi pxsalehi self-assigned this Feb 13, 2023
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of very small nits

assertEquals(Set.of("test"), result.v2());

final ClusterState blockedClusterState = ClusterState.builder(clusterState)
.blocks(ClusterBlocks.builder().addGlobalBlock(GatewayService.STATE_NOT_RECOVERED_BLOCK).build())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should also have an empty routing table in this state (the routing table is initialised when removing GatewayService.STATE_NOT_RECOVERED_BLOCK)

DiscoveryNodes.Builder discoveryNodes = DiscoveryNodes.builder()
.add(newNormalNode("node1", "node1"))
.add(newNormalNode("node2", "node2"));
// node3 which is to replace node1 may or may not bee in the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo :)

Suggested change
// node3 which is to replace node1 may or may not bee in the cluster
// node3 which is to replace node1 may or may not be in the cluster
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐝

@pxsalehi pxsalehi merged commit b4712a5 into elastic:main Feb 13, 2023
pxsalehi added a commit to pxsalehi/elasticsearch that referenced this pull request Feb 13, 2023
- Process new ClusterInfo only when the cluster state is recovered, since we need to know the indices/nodes in the cluster for this. - Avoid missing node IDs that could result from a REPLACE shutdown metadata. Both of these could lead to NPEs which is not fatal in 8.7/8.8, but could lead to unresolved listeners on 7.17.
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.17 Commit could not be cherrypicked due to conflicts
8.7

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 93699

elasticsearchmachine pushed a commit that referenced this pull request Feb 13, 2023
- Process new ClusterInfo only when the cluster state is recovered, since we need to know the indices/nodes in the cluster for this. - Avoid missing node IDs that could result from a REPLACE shutdown metadata. Both of these could lead to NPEs which is not fatal in 8.7/8.8, but could lead to unresolved listeners on 7.17.
pxsalehi added a commit to pxsalehi/elasticsearch that referenced this pull request Feb 13, 2023
- Process new ClusterInfo only when the cluster state is recovered, since we need to know the indices/nodes in the cluster for this. - Avoid missing node IDs that could result from a REPLACE shutdown metadata. Both of these could lead to NPEs which is not fatal in 8.7/8.8, but could lead to unresolved listeners on 7.17.
elasticsearchmachine pushed a commit that referenced this pull request Feb 15, 2023
* Fix Gradle project evaluation when runtime java home is unset * Avoid NPE in DiskThresholdMonitor (#93699) - Process new ClusterInfo only when the cluster state is recovered, since we need to know the indices/nodes in the cluster for this. - Avoid missing node IDs that could result from a REPLACE shutdown metadata. Both of these could lead to NPEs which is not fatal in 8.7/8.8, but could lead to unresolved listeners on 7.17. --------- Co-authored-by: Mark Vieira <portugee@gmail.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
carlosdelest pushed a commit to carlosdelest/elasticsearch that referenced this pull request Feb 21, 2023
- Process new ClusterInfo only when the cluster state is recovered, since we need to know the indices/nodes in the cluster for this. - Avoid missing node IDs that could result from a REPLACE shutdown metadata. Both of these could lead to NPEs which is not fatal in 8.7/8.8, but could lead to unresolved listeners on 7.17.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.17.10 v8.7.1 v8.8.0

4 participants