I am trying to setup an active/passive (2 nodes) Linux-HA cluster with corosync and pacemaker to hold a PostgreSQL-Database up and running. It works via DRBD and a service-ip. If node1 fails, node2 should take over. The same if PG runs on node2 and it fails. Everything works fine except the STONITH thing.
Between the nodes is an dedicated HA-connection (10.10.10.X), so I have the following interface configuration:
eth0 eth1 host 10.10.10.251 172.10.10.1 node1 10.10.10.252 172.10.10.2 node2 Stonith is enabled and I am testing with a ssh-agent to kill nodes.
crm configure property stonith-enabled=true crm configure property stonith-action=poweroff crm configure rsc_defaults resource-stickiness=100 crm configure property no-quorum-policy=ignore crm configure primitive stonith_postgres stonith:external/ssh \ params hostlist="node1 node2" crm configure clone fencing_postgres stonith_postgres crm_mon -1 shows:
============ Last updated: Mon Mar 19 15:21:11 2012 Stack: openais Current DC: node2 - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 4 Resources configured. ============ Online: [ node2 node1 ] Full list of resources: Master/Slave Set: ms_drbd_postgres Masters: [ node1 ] Slaves: [ node2 ] Resource Group: postgres fs_postgres (ocf::heartbeat:Filesystem): Started node1 virtual_ip_postgres (ocf::heartbeat:IPaddr2): Started node1 postgresql (ocf::heartbeat:pgsql): Started node1 Clone Set: fencing_postgres Started: [ node2 node1 ] Problem is: when I cut the connection between the eth0-interfaces, it kills both nodes. I think it is a problem with the quorum, because there are just 2 nodes. But I don't want to add a 3rd node just for calculation of the right quorum.
Are there any ideas to solve this problem?
crm_monlook like when your cluster is in a failed state?