How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

Question

I am trying to setup an active/passive (2 nodes) Linux-HA cluster with corosync and pacemaker to hold a PostgreSQL-Database up and running. It works via DRBD and a service-ip. If node1 fails, node2 should take over. The same if PG runs on node2 and it fails. Everything works fine except the STONITH thing.

Between the nodes is an dedicated HA-connection (10.10.10.X), so I have the following interface configuration:

eth0 eth1 host 10.10.10.251 172.10.10.1 node1 10.10.10.252 172.10.10.2 node2

Stonith is enabled and I am testing with a ssh-agent to kill nodes.

crm configure property stonith-enabled=true crm configure property stonith-action=poweroff crm configure rsc_defaults resource-stickiness=100 crm configure property no-quorum-policy=ignore crm configure primitive stonith_postgres stonith:external/ssh \ params hostlist="node1 node2" crm configure clone fencing_postgres stonith_postgres

crm_mon -1 shows:

============ Last updated: Mon Mar 19 15:21:11 2012 Stack: openais Current DC: node2 - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 4 Resources configured. ============ Online: [ node2 node1 ] Full list of resources: Master/Slave Set: ms_drbd_postgres Masters: [ node1 ] Slaves: [ node2 ] Resource Group: postgres fs_postgres (ocf::heartbeat:Filesystem): Started node1 virtual_ip_postgres (ocf::heartbeat:IPaddr2): Started node1 postgresql (ocf::heartbeat:pgsql): Started node1 Clone Set: fencing_postgres Started: [ node2 node1 ]

Problem is: when I cut the connection between the eth0-interfaces, it kills both nodes. I think it is a problem with the quorum, because there are just 2 nodes. But I don't want to add a 3rd node just for calculation of the right quorum.

Are there any ideas to solve this problem?

What does the output of crm_mon look like when your cluster is in a failed state? — larsks
– larsks, Commented Mar 19, 2012 at 15:22
Now I am using one stonith device which does not run on the same node like postgres. This work's as expected! — MMore
– MMore, Commented Mar 21, 2012 at 14:15

daff · Accepted Answer · 2012-08-16 23:26:53Z

This is a slightly older question but the problem presented here is based on a misconception on how and when failover in clusters, especially two-node clusters, works.

The gist is: You can not do failover testing by disabling communication between the two nodes. Doing so will result in exactly what you are seeing, a split-brain scenario with additional, mutual STONITH. If you want to test the fencing capabilities, a simple killall -9 corosync on the active node will do. Other ways are crm node fence or stonith_admin -F.

From the not quite complete description of your cluster (where is the output of crm configure show and cat /etc/corosync/corosync.conf?) it seems you are using the 10.10.10.xx addresses for messaging, i.e. Corosync/cluster communication. The 172.10.10.xx addresses are your regular/service network addresses and you would access a given node, for example using SSH, by its 172.10.10.xx address. DNS also seems to resolve a node hostname like node1 to 172.10.10.1.

You have STONITH configured to use SSH, which is not a very good idea in itself, but you are probably just testing. I haven't used it myself but I assume the SSH STONITH agent logs into the other node and issues a shutdown command, like ssh root@node2 "shutdown -h now" or something equivalent.

Now, what happens when you cut cluster communication between the nodes? The nodes no longer see each node as alive and well, because there is no more communication between them. Thus each node assumes it is the only survivor of some unfortunate event and tries to become (or remain) the active or primary node. This is the classic and dreaded split-brain scenario.

Part of this is to make sure the other, obviously and presumably failed node is down for good, which is where STONITH comes in. Keep in mind that both nodes are now playing the same game: trying to become (or stay) active and take over all cluster resources, as well as shooting the other node in the head.

You can probably guess what happens now. node1 does ssh root@node2 "shutdown -h now" and node2 does ssh root@node1 "shutdown -h now". This doesn't use the cluster communication network 10.10.10.xx but the service network 172.10.10.xx. Since both nodes are in fact alive and well, they have no problem issuing commands or receiving SSH connections, so both nodes shoot each other at the same time. This kills both nodes.

If you don't use STONITH then a split-brain could have even worse consequences, especially in case of DRBD, where you could end up with both nodes becoming Primary. Data corruption is likely to happen and the split-brain must be resolved manually.

I recommend reading the material on http://www.hastexo.com/resources/hints-and-kinks which is written and maintained by the guys who contributed (and still contribute) a large chunk of what we today call "the Linux HA stack".

TL;DR: If you are cutting cluster communication between your nodes in order to test your fencing setup, you are doing it wrong. Use killall -9 corosync, crm node fence or stonith_admin -F instead. Cutting cluster communication will only result in a split-brain scenario, which can and will lead to data corruption.

Could you please elaborate on why you think he did wrong? He was testing one of many quite possible scenarios: the well known cleaning staff trips over a cable that happens to be plugged in at eth0. My first impression is that he did great because he tested a weak link of the setup and identified a problem that can lead to data loss. I've always believed that testing something that breaks is more helpful than testing something that is known to work. I would like to know if I have missed something. — wedi
– wedi, Commented Jul 8, 2020 at 12:02

1mi · Accepted Answer · 2018-02-17 23:15:33Z

You could try adding auto_tie_breaker: 1 into the quorum section of /etc/corosync/corosync.conf

When ATB is enabled, the cluster can suffer up to 50% of the nodes failing at the same time, in a deterministic fashion. The cluster partition, or the set of nodes that are still in contact with the node that has the lowest nodeid will remain quorate. The other nodes will be inquorate.

quanta · Accepted Answer · 2013-06-06 04:27:49Z

0

Try reading the Quorum and two-node clusters chapter of the Pacemaker documentation.

edited Jun 6, 2013 at 4:27

quanta

52.6k19 gold badges164 silver badges222 bronze badges

answered Mar 19, 2012 at 12:59

larsks

47.6k16 gold badges136 silver badges193 bronze badges

Think you mean the 'no-quorum-policy=ignore' thing. I already set it (edited also my first post). Doesn't help me here. Can you put a finer point to it, please?

MMore
– MMore

2012-03-19 13:21:29 +00:00
Commented Mar 19, 2012 at 13:21
Well, the documentation suggests that pacemaker will log some specific messages if there are quorum issues with the cluster. Do you see that in your logs? What does crm_mon show?

larsks
– larsks

2012-03-19 13:34:18 +00:00
Commented Mar 19, 2012 at 13:34
I cannot find sth. interesting in the logs. I edited my first post with information of crm_mon -1.

MMore
– MMore

2012-03-19 14:25:00 +00:00
Commented Mar 19, 2012 at 14:25

Add a comment |

user2028980 · Accepted Answer · 2017-04-20 17:38:59Z

Check this for HA cluster using Pacemaker: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/index.html

Dave M · Accepted Answer · 2022-03-20 11:58:49Z

It is more a question than an answer. I'm trying to set-up a hacluster configuration very similar to that of Mmore. I would really appreciate to obtain his configuration files, (the conf files, scripts ou pcs commands to configure and launch the cluster etc.) I'm at this moment trying to find a stonith configuration independent of the available hardware and of the underlying operating system (there are several sites to equip, with different infrastructures and linux versions and myself, I do the tests on a pair of notebooks running openindiana: I got corosync and pacemaker running but I have currently difficulties with stonith: I get error messages even if I disable it.)

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review — Dave M
– Dave M, Commented Mar 20, 2022 at 11:59

Stack Exchange Network

How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

5 Answers 5

You must log in to answer this question.

Linked

Hot Network Questions

How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

5 Answers 5

You must log in to answer this question.

Linked

Related

Hot Network Questions