Split-Brain Protection¶
pgraft provides 100% split-brain protection through the Raft consensus algorithm. This page explains how it works and why you can trust it.
What is Split-Brain?¶
Split-brain is a dangerous condition in distributed systems where:
- Network partition divides the cluster
- Multiple nodes believe they are the leader
- Different leaders accept conflicting writes
- Data corruption and inconsistency result
Without Protection
In traditional PostgreSQL replication, network partitions can lead to split-brain scenarios where multiple nodes accept writes, causing permanent data inconsistencies.
How pgraft Prevents Split-Brain¶
pgraft uses the Raft consensus algorithm which provides mathematical guarantees against split-brain through four key mechanisms:
1. Quorum Requirement¶
Leader election requires a majority of votes (N/2 + 1):
- 3-node cluster: Needs 2 votes
- 5-node cluster: Needs 3 votes
- 7-node cluster: Needs 4 votes
Why this prevents split-brain:
Network Partition Example (3-node cluster): Partition 1: Node 1, Node 2 Partition 2: Node 3 Partition 1 (2 nodes): - Has majority (2 out of 3) - Can elect leader - Can accept writes Partition 2 (1 node): - No majority (1 out of 3) - Cannot elect leader - Cannot accept writes (read-only) Mathematical guarantee: Only one partition can have a majority, therefore only one leader can be elected.
2. Term Monotonicity¶
Each leader election increments the term number:
- Terms are strictly increasing
- Higher term always wins
- Old leaders automatically step down when they see a higher term
Example:
Time T0: Node 1 is leader in term 5 Network partition occurs Time T1: Partition A: Node 1 (term 5) Partition B: Node 2, 3 elect new leader (term 6) Network heals Time T2: Node 1 sees messages from term 6 Node 1 automatically steps down Only one leader remains (term 6) 3. Log Completeness¶
Only nodes with up-to-date logs can be elected:
- Candidates with incomplete logs lose elections
- New leader always has all committed entries
- Prevents data loss during leadership transitions
Election Rules:
Candidate A: Last log index = 100, Last log term = 5 Candidate B: Last log index = 95, Last log term = 5 Result: A wins (more up-to-date log) Candidate C: Last log index = 100, Last log term = 4 Candidate D: Last log index = 95, Last log term = 5 Result: D wins (higher term in last entry) 4. Single Leader Per Term¶
Raft's fundamental guarantee:
At most one leader can be elected in a given term.
Why?
- Leader needs majority of votes
- Each node votes for at most one candidate per term
- Two candidates cannot both get majority votes (mathematical impossibility)
Example (5-node cluster): Candidate A gets votes from: Node 1, 2, 3 (majority ✅) Candidate B can only get: Node 4, 5 (not majority ❌) Impossible for both A and B to get 3+ votes! Network Partition Scenarios¶
Scenario 1: Minority Partition¶
Setup: 3-node cluster (Node 1, 2, 3), Node 1 is leader
Event: Node 3 is isolated
Before: [Node 1 (Leader)] ←→ [Node 2] ←→ [Node 3] After Partition: [Node 1 (Leader)] ←→ [Node 2] | [Node 3 (isolated)] Result:
- Node 1 remains leader (still has majority with Node 2)
- Cluster continues operating normally
- Node 3 becomes follower, cannot accept writes
- No split-brain - only one leader
Scenario 2: Equal Partition (3 nodes)¶
Setup: 3-node cluster, network splits 1-2 vs 3
Event: Network partition
Result:
- Partition with nodes 1, 2 keeps leader (has majority)
- Node 3 cannot elect new leader (no majority)
- No split-brain - only one leader
Scenario 3: Leader in Minority¶
Setup: 5-node cluster, Node 1 is leader
Event: Node 1 and 2 isolated from 3, 4, 5
Before: [Node 1 (Leader)] ←→ [Node 2] ←→ [Node 3] ←→ [Node 4] ←→ [Node 5] After Partition: [Node 1] ←→ [Node 2] | [Node 3] ←→ [Node 4] ←→ [Node 5] Result:
- Node 1 loses leadership (no majority - only 2 of 5)
- Nodes 3, 4, 5 elect new leader (have majority - 3 of 5)
- Node 1 steps down after election timeout
- No split-brain - old leader steps down, new leader elected
Verification¶
You can verify split-brain protection yourself:
Test 1: Isolate Minority¶
# 3-node cluster # Block Node 3's network # On Node 1 or 2 (majority partition): psql -c "SELECT pgraft_is_leader();" # One returns true psql -c "SELECT pgraft_add_node(4, '127.0.0.1', 7004);" # Works! # On Node 3 (minority): psql -c "SELECT pgraft_is_leader();" # Returns false psql -c "SELECT pgraft_add_node(4, '127.0.0.1', 7004);" # Fails! Test 2: Leader in Minority¶
# 5-node cluster, Node 1 is leader # Isolate Node 1 and 2 # Majority partition (3, 4, 5) will elect new leader after ~1 second # Minority partition (1, 2) cannot elect leader # When network heals, Node 1 sees higher term and steps down Mathematical Proof Sketch¶
Theorem: At most one leader per term.
Proof:
- Leader requires majority votes: N/2 + 1
- Each node votes once per term
- Two majorities must overlap (Pigeonhole Principle)
- Overlapping node cannot vote for both candidates
- Therefore, at most one candidate can get majority
- QED: At most one leader per term
Comparison with Other Systems¶
| System | Split-Brain Protection | Method |
|---|---|---|
| pgraft | 100% | Raft consensus |
| PostgreSQL streaming replication | No | Manual failover |
| MySQL replication | No | Manual failover |
| Patroni/Stolon | Partial | Requires external consensus (etcd/Zookeeper) |
| PostgreSQL with Pacemaker | Partial | STONITH fencing |
pgraft Advantage
pgraft provides split-brain protection natively without requiring external consensus systems or STONITH devices.
Best Practices¶
1. Use Odd Number of Nodes¶
Odd numbers provide better fault tolerance:
- 3 nodes: Tolerates 1 failure
- 5 nodes: Tolerates 2 failures
- 7 nodes: Tolerates 3 failures
Even numbers waste resources:
- 4 nodes: Still tolerates only 1 failure (same as 3)
- 6 nodes: Still tolerates only 2 failures (same as 5)
2. Geographic Distribution¶
For disaster recovery, distribute nodes across:
- Different availability zones
- Different data centers
- Different geographic regions
Example (5-node cluster):
Region A (2 nodes): Primary data center Region B (2 nodes): Secondary data center Region C (1 node): Tiebreaker 3. Monitor Term Changes¶
Frequent term changes indicate problems:
-- Monitor term changes SELECT pgraft_get_term(); -- If term increases rapidly: -- - Network instability -- - Node failures -- - Election timeout too low Summary¶
pgraft provides guaranteed split-brain protection through:
- Quorum-based elections - Only majority can elect leader
- Term monotonicity - Higher term always wins
- Log completeness - Only up-to-date nodes elected
- Mathematical guarantees - Proven by Raft algorithm
You can trust pgraft to never allow split-brain scenarios.