Skip to content

Commit 05a3da2

Browse files
vinc13edahlerlend
authored andcommitted
Bug#37275404 AT: testNodeRestart -n MultiCrashTest T1 fails occasionally in .4node4rpl
Context: runMultiCrashTest crashes a subset of the running data nodes in a cluster in a rolling restart fashion or in parallel, to check that whether the situations where the cluster survives or the cluster dies are as expected. Problem: In one of the cases (4 replica), test gracefully crashes 3 of 4 replicas in a node group and expects the 4th replica to die as well (via a crash insertion in QMGR). Then test checks that all nodes expected to be dead are dead and that the remaining nodes are alive, finally test start all nodes again. Problem is that, when nodes are started (via mgmapi) sometimes the 4th node is not yet connected to the cluster and therefore the 'start' command fails. Solution: Ensure that, before start the nodes via mgmapi, all the nodes are already connected to the cluster. Change-Id: Ib2d4265f4816bc6b975570f69aebfe3952e9bb96
1 parent ad16eb8 commit 05a3da2

File tree

1 file changed

+41
-1
lines changed

1 file changed

+41
-1
lines changed

storage/ndb/test/ndbapi/testNodeRestart.cpp

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1157,34 +1157,74 @@ int runMultiCrashTest(NDBT_Context *ctx, NDBT_Step *step) {
11571157
}
11581158
NdbSleep_SecSleep(2);
11591159
}
1160+
11601161
if (restarter.startNodes(dead_nodes, num_dead_nodes) != 0) return NDBT_FAILED;
11611162
if (restarter.waitClusterStarted()) return NDBT_FAILED;
11621163

11631164
if (num_replicas == 2) return NDBT_OK;
11641165

11651166
ndbout_c("Crash two nodes per node group");
11661167
if (num_replicas == 3) {
1168+
// Inject error 644 in all nodes. It will eventually hit in one node
1169+
// in Qmgr::stateArbitCrash.
11671170
prepare_all_nodes_for_death(restarter);
1171+
int val[] = {DumpStateOrd::CmvmiSetRestartOnErrorInsert, 1};
1172+
if (restarter.dumpStateAllNodes(val, 2)) {
1173+
return NDBT_FAILED;
1174+
}
11681175
}
1176+
/*
1177+
* Restart 2 nodes in nostart mode via error insert 1006, in a 3 replica
1178+
* configuration 3rd node will eventually crash as well. In a 4 replica
1179+
* configuration remaining nodes will survive.
1180+
*/
11691181
crash_x_nodes_per_node_group(restarter, dead_nodes, num_dead_nodes, 2);
11701182
if (num_replicas == 3) {
11711183
set_all_dead(restarter, dead_nodes, num_dead_nodes);
11721184
}
11731185
if (!restarter.checkClusterState(dead_nodes, num_dead_nodes)) {
11741186
return NDBT_FAILED;
11751187
}
1176-
NdbSleep_SecSleep(3);
1188+
1189+
if (num_replicas == 3) {
1190+
/*
1191+
* In 3 replica setup all 3 nodes are restarted, 2 via EI 1006 1 via EI 644.
1192+
* Wait until al nodes enter the NOSTART state, then we can start all nodes
1193+
* again.
1194+
*/
1195+
if (restarter.waitClusterNoStart()) {
1196+
return NDBT_FAILED;
1197+
}
1198+
}
11771199
if (restarter.startNodes(dead_nodes, num_dead_nodes) != 0) return NDBT_FAILED;
11781200
if (restarter.waitClusterStarted()) return NDBT_FAILED;
11791201

11801202
if (num_replicas == 4) {
11811203
ndbout_c("Crash three nodes per node group");
1204+
1205+
int val[] = {DumpStateOrd::CmvmiSetRestartOnErrorInsert, 1};
1206+
if (restarter.dumpStateAllNodes(val, 2)) {
1207+
return NDBT_FAILED;
1208+
}
11821209
prepare_all_nodes_for_death(restarter);
1210+
1211+
/*
1212+
* Restart 3 nodes in nostart mode via error insert 1006, the remaining node
1213+
* will eventually crash as well.
1214+
*/
11831215
crash_x_nodes_per_node_group(restarter, dead_nodes, num_dead_nodes, 3);
11841216
set_all_dead(restarter, dead_nodes, num_dead_nodes);
11851217
if (!restarter.checkClusterState(dead_nodes, num_dead_nodes)) {
11861218
return NDBT_FAILED;
11871219
}
1220+
1221+
/*
1222+
* All 4 nodes are restarted, 3 via EI 1006 1 via EI 644. Wait until all
1223+
* nodes enter the NOSTART state, then we can start all nodes again.
1224+
*/
1225+
if (restarter.waitClusterNoStart()) {
1226+
return NDBT_FAILED;
1227+
}
11881228
if (restarter.startNodes(dead_nodes, num_dead_nodes) != 0)
11891229
return NDBT_FAILED;
11901230
if (restarter.waitClusterStarted()) return NDBT_FAILED;

0 commit comments

Comments
 (0)