Skip to content

Testing

This guide covers testing pgraft during development.

Test Harness

pgraft includes a comprehensive test harness in the examples/ directory.

Quick Start

cd examples  # Destroy any existing test cluster ./run.sh --destroy  # Initialize new 3-node cluster ./run.sh --init  # Check cluster status ./run.sh --status 

Test Harness Commands

Command Description
--destroy Destroy test cluster and clean up
--init Initialize new 3-node cluster
--status Check cluster status

Test Cluster Configuration

The test harness creates a 3-node cluster:

  • primary1: Node 1, PostgreSQL port 5432, Raft port 7001
  • replica1: Node 2, PostgreSQL port 5433, Raft port 7002
  • replica2: Node 3, PostgreSQL port 5434, Raft port 7003

Manual Testing

Basic Functionality

-- Test extension creation CREATE EXTENSION pgraft;  -- Test initialization SELECT pgraft_init(); -- Expected: t  -- Test worker status SELECT pgraft_get_worker_state(); -- Expected: RUNNING  -- Test leader election (wait 10 seconds first) SELECT pgraft_get_leader(); -- Expected: 1 (or another node ID)  -- Test leadership check SELECT pgraft_is_leader(); -- Expected: t or f 

Cluster Operations

-- Add node (must be on leader) SELECT pgraft_add_node(2, '127.0.0.1', 7002); -- Expected: t  -- List nodes SELECT * FROM pgraft_get_nodes(); -- Expected: Table with all nodes  -- Cluster status SELECT * FROM pgraft_get_cluster_status(); -- Expected: Status information  -- Get current term SELECT pgraft_get_term(); -- Expected: Integer term number 

Log Replication

-- Replicate entry SELECT pgraft_replicate_entry('{"test": "data"}'); -- Expected: t (if quorum reached)  -- Get log stats SELECT * FROM pgraft_log_get_stats(); -- Expected: Table with statistics 

Failover Testing

Test Leader Failure

# 1. Identify leader psql -p 5432 -c "SELECT pgraft_get_leader();"  # 2. Stop leader (e.g., if node 1 is leader) pg_ctl stop -D examples/data/primary1 -m immediate  # 3. Verify new leader elected (check on remaining nodes) sleep 2 psql -p 5433 -c "SELECT pgraft_get_leader();" psql -p 5434 -c "SELECT pgraft_get_leader();"  # 4. Restart failed node pg_ctl start -D examples/data/primary1  # 5. Verify it rejoins as follower sleep 2 psql -p 5432 -c "SELECT pgraft_is_leader();" 

Test Network Partition

Using iptables (Linux) or pfctl (macOS):

# Simulate partition by blocking Raft port # On node 3, block communication with node 1 and 2 sudo iptables -A INPUT -p tcp --sport 7001 -j DROP sudo iptables -A INPUT -p tcp --sport 7002 -j DROP sudo iptables -A OUTPUT -p tcp --dport 7001 -j DROP sudo iptables -A OUTPUT -p tcp --dport 7002 -j DROP  # Verify node 3 cannot elect itself leader psql -p 5434 -c "SELECT pgraft_is_leader();" # Should return false  # Verify nodes 1 and 2 continue with leader psql -p 5432 -c "SELECT pgraft_get_leader();" psql -p 5433 -c "SELECT pgraft_get_leader();"  # Restore network sudo iptables -D INPUT -p tcp --sport 7001 -j DROP sudo iptables -D INPUT -p tcp --sport 7002 -j DROP sudo iptables -D OUTPUT -p tcp --dport 7001 -j DROP sudo iptables -D OUTPUT -p tcp --dport 7002 -j DROP  # Verify node 3 rejoins sleep 2 psql -p 5434 -c "SELECT * FROM pgraft_get_cluster_status();" 

Performance Testing

Throughput Test

-- Test log entry replication rate DO $$ DECLARE  start_time timestamp;  end_time timestamp;  i integer; BEGIN  start_time := clock_timestamp();   FOR i IN 1..1000 LOOP  PERFORM pgraft_replicate_entry('{"test": "data"}');  END LOOP;   end_time := clock_timestamp();   RAISE NOTICE 'Time: %', end_time - start_time;  RAISE NOTICE 'Entries/sec: %', 1000.0 / extract(epoch from (end_time - start_time)); END $$; 

Latency Test

-- Measure single entry replication latency \timing on SELECT pgraft_replicate_entry('{"test": "data"}'); \timing off 

Stress Testing

Continuous Operations

# Terminal 1: Continuous writes on leader while true; do  psql -p 5432 -c "SELECT pgraft_replicate_entry(now()::text);" 2>&1 | grep -v "replicate"  sleep 0.1 done  # Terminal 2: Monitor cluster status watch -n 1 "psql -p 5432 -c 'SELECT * FROM pgraft_get_cluster_status();'"  # Terminal 3: Simulate failures # Stop/start nodes randomly 

Sustained Load

# Generate sustained load for i in {1..10000}; do  psql -p 5432 -c "SELECT pgraft_replicate_entry('entry_$i');" &  if [ $((i % 100)) -eq 0 ]; then  wait # Wait every 100 to avoid overwhelming  fi done wait 

Integration Testing

Multi-Node Test Script

Create test_cluster.sh:

#!/bin/bash  echo "Testing cluster operations..."  # Test on all nodes for PORT in 5432 5433 5434; do  echo "Node on port $PORT:"  psql -p $PORT -c "SELECT pgraft_is_leader(), pgraft_get_term();" -t done  # Get leader LEADER_PORT=$(psql -p 5432 -c "SELECT pgraft_get_leader();" -t | tr -d ' ') LEADER_PORT=$((5431 + LEADER_PORT))  echo "Leader is on port $LEADER_PORT"  # Add and remove node on leader echo "Testing add/remove node on leader..." psql -p $LEADER_PORT -c "SELECT pgraft_add_node(4, '127.0.0.1', 7004);" sleep 1 psql -p $LEADER_PORT -c "SELECT * FROM pgraft_get_nodes();" psql -p $LEADER_PORT -c "SELECT pgraft_remove_node(4);" sleep 1 psql -p $LEADER_PORT -c "SELECT * FROM pgraft_get_nodes();"  echo "Test completed!" 

Regression Testing

SQL Test Suite

Create regression_tests.sql:

-- Test 1: Extension creation CREATE EXTENSION IF NOT EXISTS pgraft;  -- Test 2: Initialization SELECT pgraft_init();  -- Wait for leader election SELECT pg_sleep(2);  -- Test 3: Worker running SELECT pgraft_get_worker_state() = 'RUNNING' AS worker_ok;  -- Test 4: Leader elected SELECT pgraft_get_leader() > 0 AS leader_elected;  -- Test 5: Get term SELECT pgraft_get_term() > 0 AS term_ok;  -- Test 6: Cluster status SELECT node_id, term, state FROM pgraft_get_cluster_status();  -- Test 7: Log operations (on leader only) DO $$ BEGIN  IF pgraft_is_leader() THEN  PERFORM pgraft_replicate_entry('test_entry');  END IF; END $$;  -- Test 8: Log statistics SELECT * FROM pgraft_log_get_stats();  -- Test 9: Version SELECT pgraft_get_version();  -- Test 10: Debug mode SELECT pgraft_set_debug(true); SELECT pgraft_set_debug(false); 

Run tests:

psql -f regression_tests.sql 

Automated Testing

GitHub Actions

The repository includes GitHub Actions workflow for CI/CD (see .github/workflows/test.yml).

Local Automation

#!/bin/bash # automated_test.sh  set -e # Exit on error  echo "Starting automated tests..."  # Clean slate cd examples ./run.sh --destroy  # Initialize ./run.sh --init sleep 5 # Wait for cluster to stabilize  # Run regression tests on all nodes for PORT in 5432 5433 5434; do  echo "Testing node on port $PORT..."  psql -p $PORT -f ../tests/regression_tests.sql done  # Failover test echo "Testing failover..." LEADER=$(psql -p 5432 -t -c "SELECT pgraft_get_leader();" | tr -d ' ') LEADER_PORT=$((5431 + LEADER))  pg_ctl stop -D data/node$LEADER -m immediate sleep 3 NEW_LEADER=$(psql -p $((LEADER_PORT + 1)) -t -c "SELECT pgraft_get_leader();" | tr -d ' ')  if [ "$NEW_LEADER" != "$LEADER" ]; then  echo "✓ Failover successful" else  echo "✗ Failover failed"  exit 1 fi  # Cleanup ./run.sh --destroy  echo "All tests passed!" 

Test Coverage

Key areas to test:

  • Extension creation and initialization
  • Background worker startup
  • Leader election
  • Node addition (on leader)
  • Node addition (on follower - should fail)
  • Node removal
  • Log replication
  • Cluster status queries
  • Leader failure and recovery
  • Network partition handling
  • Concurrent operations
  • Configuration changes
  • Snapshot creation and recovery

Debugging Test Failures

Enable Debug Logging

SELECT pgraft_set_debug(true); 

Check Logs

# View logs for all nodes tail -f examples/logs/primary1/postgresql.log & tail -f examples/logs/replica1/postgresql.log & tail -f examples/logs/replica2/postgresql.log & 

Common Issues

Leader not elected: - Check network connectivity - Verify clock synchronization - Increase election timeout

Node won't join: - Verify configuration matches - Check firewall rules - Ensure node is initialized

Replication lag: - Check disk I/O - Monitor network latency - Review snapshot settings

Reporting Issues

When reporting test failures, include:

  1. Test scenario: What were you testing?
  2. Expected result: What should happen?
  3. Actual result: What actually happened?
  4. Logs: Relevant log excerpts (with debug enabled)
  5. Configuration: postgresql.conf settings
  6. Environment: OS, PostgreSQL version, pgraft version