HA Drill: 2/3 Failure

How to recover from emergency scenario with 2-node broken in a 3-node setup?

Module:

PGSQL

Adjusting HAProxy Configuration

If you’re accessing the cluster through means other than HAProxy, you can skip this part (lucky you!). If you’re using HAProxy to access your database cluster, you’ll need to adjust the load balancer configuration to manually direct read/write traffic to the primary.

Edit /etc/haproxy/<pg_cluster>-primary.cfg, where <pg_cluster> is your PostgreSQL cluster name (e.g., pg-meta)
Comment out health check configurations
Comment out the server entries for the two failed nodes, keeping only the current primary

listen pg-meta-primary  bind *:5433  mode tcp  maxconn 5000  balance roundrobin   # Comment out these four health check lines  #option httpchk # <---- remove this  #option http-keep-alive # <---- remove this  #http-check send meth OPTIONS uri /primary # <---- remove this  #http-check expect status 200 # <---- remove this   default-server inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100  server pg-meta-1 10.10.10.10:6432 check port 8008 weight 100   # Comment out the failed nodes  #server pg-meta-2 10.10.10.11:6432 check port 8008 weight 100 <---- comment this  #server pg-meta-3 10.10.10.12:6432 check port 8008 weight 100 <---- comment this

Don’t rush to systemctl reload haproxy just yet - we’ll do that after promoting the primary. This configuration bypasses Patroni’s health checks and directs write traffic straight to our soon-to-be primary.

Manual Replica Promotion

SSH into the target server, switch to dbsu user, execute a CHECKPOINT to flush disk buffers, stop Patroni, restart PostgreSQL, and perform the promotion:

sudo su - postgres # Switch to database dbsu user psql -c 'checkpoint; checkpoint;' # Double CHECKPOINT for good luck (and clean buffers) sudo systemctl stop patroni # Bid farewell to Patroni pg-restart # Restart PostgreSQL pg-promote # Time for a promotion!  psql -c 'SELECT pg_is_in_recovery();' # 'f' means we're primary - mission accomplished!

If you modified the HAProxy config earlier, now’s the time to systemctl reload haproxy and direct traffic to our new primary.

systemctl reload haproxy # Route write traffic to our new primary

Preventing Split-Brain

After stopping the bleeding, priority #2 is: Prevent Split-Brain. We need to ensure the other two servers don’t come back online and start a civil war with our current primary.

The simple approach:

Pull the plug (power/network) on the other two servers - ensure they can’t surprise us with an unexpected comeback
Update application connection strings to point directly to our lone survivor primary

Next steps depend on your situation:

A: The two servers have temporary issues (network/power outage) and can be restored in place
B: The two servers are permanently dead (hardware failure) and need to be decommissioned

Recovery from Temporary Failure

If the other two servers can be restored, follow these steps:

Handle one failed server at a time, prioritizing the management/INFRA node
Start the failed server and immediately stop Patroni

Once ETCD quorum is restored, start Patroni on the surviving server (current primary) to take control of PostgreSQL and reclaim cluster leadership. Put Patroni in maintenance mode:

systemctl restart patroni pg pause <pg_cluster>

On the other two instances, create a touch /pg/data/standby.signal file as postgres user to mark them as replicas, then start Patroni:

systemctl restart patroni

After confirming Patroni cluster identity/roles are correct, exit maintenance mode:

pg resume <pg_cluster>

Recovery from Permanent Failure

After permanent failure, first recover the ~/pigsty directory on the management node - particularly the crucial pigsty.yml and files/pki/ca/ca.key files.

No backup of these files? You might need to deploy a fresh Pigsty and migrate your existing cluster via backup cluster.

Pro tip: Keep your pigsty directory under version control (Git). Learn from this experience - future you will thank present you.

Config Repair

Use your surviving node as the new management node. Copy the ~/pigsty directory there and adjust the configuration. For example, replace the default management node 10.10.10.10 with the surviving node 10.10.10.12:

all:  vars:  admin_ip: 10.10.10.12 # New management node IP  node_etc_hosts: [10.10.10.12 h.pigsty a.pigsty p.pigsty g.pigsty sss.pigsty]  infra_portal: {} # Update other configs referencing old admin_ip   children:   infra: # Adjust Infra cluster  hosts:  # 10.10.10.10: { infra_seq: 1 } # Old Infra node  10.10.10.12: { infra_seq: 3 } # New Infra node   etcd: # Adjust ETCD cluster  hosts:  #10.10.10.10: { etcd_seq: 1 } # Comment out failed node  #10.10.10.11: { etcd_seq: 2 } # Comment out failed node  10.10.10.12: { etcd_seq: 3 } # Keep survivor  vars:  etcd_cluster: etcd   pg-meta: # Adjust PGSQL cluster config  hosts:  #10.10.10.10: { pg_seq: 1, pg_role: primary }  #10.10.10.11: { pg_seq: 2, pg_role: replica }  #10.10.10.12: { pg_seq: 3, pg_role: replica , pg_offline_query: true }  10.10.10.12: { pg_seq: 3, pg_role: primary , pg_offline_query: true }  vars:  pg_cluster: pg-meta

ETCD Repair

Reset ETCD to a single-node cluster:

./etcd.yml -e etcd_safeguard=false -e etcd_clean=true

Follow ETCD Config Reload to adjust ETCD endpoint references.

INFRA Repair

If the surviving node lacks INFRA module, configure and install it:

./infra.yml -l 10.10.10.12

Fix monitoring on the current node:

./node.yml -t node_monitor

PGSQL Repair

./pgsql.yml -t pg_conf # Regenerate PG config systemctl reload patroni # Reload Patroni config on survivor

After module repairs, follow the standard scale-out procedure to add new nodes and restore HA.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified 2025-04-09: update infra doc (5591e0a)