DEV Community

Cover image for Manage Replication and Failover on a PostgreSQL 16 Cluster Using repmgr in 02 Nodes on Ubuntu 24.04 LTS
Chitt Ranjan Mahto (Chirag)
Chitt Ranjan Mahto (Chirag)

Posted on

Manage Replication and Failover on a PostgreSQL 16 Cluster Using repmgr in 02 Nodes on Ubuntu 24.04 LTS

inchirags@gmail.com Chirag's PostgreSQL DBA Tutorial https://www.chirags.in


Manage Replication and Failover on a PostgreSQL 16 Cluster Using repmgr in 02 Nodes on Ubuntu 24.04 LTS


Static IP addresses for all nodes.

+------------+------------------+---------------+
| Role | IP Address | Hostname |
+------------+------------------+---------------+
| Primary DB | 192.168.224.128 | node1 |
+------------+------------------+---------------+
| Replica DB | 192.168.224.129 | node2 |
+------------+------------------+---------------+
What Is repmgr?

Repmgr is an open source suite for managing PostgreSQL HA clusters. It closely integrates with PostgreSQL to configure a primary node, clone replica nodes, monitor the HA cluster, and perform a failover. Repmgr supports a single read-write primary server and one or more read-only standby nodes, also known as replicas. This guide focuses on repmgr because it is an efficient and robust solution with long-standing popularity in the database management area.

The complete list of steps required to provision PostgreSQL and repmgr follows this sequence:

1. Install PostgreSQL on both nodes. 2. Access and secure PostgreSQL on both nodes. 3. Install repmgr on both nodes. 4. Enable SSH connectivity between the nodes. 5. Create a repmgr user on the primary node. 6. Configure the database replication settings in the PostgreSQL configuration file on the primary. 7. Configure the PostgreSQL authentication settings on the primary. 8. Configure the repmgr HA cluster settings on both nodes. 9. Register the primary server. 10. Clone and register the standby server. 
Enter fullscreen mode Exit fullscreen mode

Steps:

  1. Install PostgreSQL on both nodes.

Ensure the server is up to date. Use the apt update command to install any updates. Reboot the server if necessary.

All Nodes as sudo User:

sudo apt update
sudo apt upgrade -y
Install PostgreSQL:

All Nodes as sudo User:

sudo apt -y install postgresql postgresql-contrib
Ensure the PostgreSQL service starts automatically.

sudo systemctl start postgresql
sudo systemctl enable postgresql
Check the status:

sudo systemctl status postgresql

  1. Access and secure PostgreSQL on both nodes.

Change the password for the postgres Linux account:

All Nodes as sudo User:

sudo passwd postgres
admin@123
Choose a unique strong password and store it in a secure location.

Output:

passwd: password updated successfully

Switch to the postgres user to access PostgreSQL Prompt:

All Nodes as sudo User:

sudo -u postgres psql
SELECT version();
Output:


PostgreSQL 16.9 (Ubuntu 16.9-0ubuntu0.24.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit

(1 row)

Add a password for the postgres database account using the following command.

ALTER USER postgres WITH PASSWORD 'admin@123';
Output:

ALTER ROLE

Exit:

\q

  1. Install repmgr on both nodes.

All Nodes as sudo User:

sudo apt-get install -y postgresql-16-repmgr
All Nodes as sudo User:

sudo systemctl enable repmgrd
sudo systemctl start repmgrd

  1. Enable SSH connectivity between the nodes.

On the primary node, switch to the postgres user:

**Primary Node as sudo User:

su - postgres
Generate an SSH key:

Primary Node as postgres User:

ssh-keygen -t rsa -b 4096
Copy the key to each standby node in the HA cluster. In the following example, replace 192.168.224.129 with the IP node of your standby node:

Primary Node as postgres User:

ssh-copy-id postgres@192.168.224.129
SSH can now be used to access a standby node without a password from the primary. The following command, when run from the postgres account on the primary, places the user in the postgres user directory on the standby:

Primary Node as postgres User:

ssh 192.168.224.129
When done, type exit to log out of the secondary server.

exit
Repeat these steps to create a key on each standby node share it with the primary:

Standby Node as sudo User:

su - postgres
Standby Node as postgres User:

ssh-keygen -t rsa -b 4096
Standby Node as postgres User:

ssh-copy-id postgres@192.168.224.128
Output:

Number of key(s) added: 1

Now try logging into the machine

ssh postgres@192.168.224.128
and check to make sure that only the key(s) you wanted were added.

-----------snapshot---------

  1. Create a repmgr user on the primary node (192.168.224.128).

To allow repmgr to manage PostgreSQL data replication, create a repmgr user on the primary server. Then create a new database for the repmgr data. The commands in this section must only be executed on the primary server. Do not create any database entries on the standby because this interferes with replication.

While logged in as the postgres account, create the repmgr user:

Primary Node as postgres User:

createuser -s repmgr
Now create the repmgr database, with the repmgr user as the owner:

Primary Node as postgres User:

createdb repmgr -O repmgr

  1. Configure the database replication settings in the PostgreSQL configuration file on the primary.

To configure the replication settings in postgresql.conf, follow these steps. This file must only be changed on the primary node. Repmgr copies it to the standby nodes in a later configuration stage.

Edit the /etc/postgresql/16/main/postgresql.conf file as the postgres user:

Primary Node as postgres User:

nano /etc/postgresql/16/main/postgresql.conf
Change the settings in the following file sample to adjust the replication settings. In some cases, these lines only have to be uncommented. These lines are found in non-contiguous locations in the file. Use the text editor search utility (CTRL+W in nano)to find them.

File: /etc/postgresql/16/main/postgresql.conf

 listen_addresses = '*' shared_preload_libraries = 'repmgr' wal_level = replica archive_mode = on archive_command = '/bin/true' max_wal_senders = 10 max_replication_slots = 10 hot_standby = on wal_log_hints = on 
Enter fullscreen mode Exit fullscreen mode

Note:

It is easiest to set listen_addresses to *. However, in some networks this might pose additional security concerns. For extra security, set this value to a comma-separated list consisting of localhost and the IP addresses of all nodes in the HA cluster. When done, press CTRL+X, followed by Y then Enter to save the file and exit nano. 
Enter fullscreen mode Exit fullscreen mode
  1. Configure the PostgreSQL authentication settings on the primary.

Users must also add client authentication capabilities to the pg_hba.conf file. This file tells PostgreSQL what type of connections to trust and how to authenticate them. Add entries to trust repmgr connections from both the primary and standby servers. Edit this file on the primary server only.

Open the /etc/postgresql/16/main/pg_hba.conf file for editing as the postgres user:

Primary Node as postgres User:

nano /etc/postgresql/16/main/pg_hba.conf
Add the below details:

local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr 192.168.224.128/32 trust
host replication repmgr 192.168.224.129/32 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.224.128/32 trust
host repmgr repmgr 192.168.224.129/32 trust
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

Restart the PostgreSQL process on the primary using an account with sudo privileges:

Primary Node as sudo User:

sudo systemctl restart postgresql
Ensure there are no error messages and PostgreSQL is still active:

Primary Node as sudo User:

sudo systemctl status postgresql
Press the "q" key to exit the systemctl status output.

Allow Firewall port 5432 in both the nodes:

As sudo User in both the nodes:

sudo ufw allow 5432/tcp
Ensure the primary database is accessible from the standby nodes.

Standby Node as postgres User:

psql 'host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2'
The PostgreSQL prompt should appear, indicating the repmgr database context:

Output:

psql (16.8 (Ubuntu 16.8-0ubuntu0.24.04.1))

SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)

Type "help" for help.

repmgr=#
When done, type quit or exit to leave the PostgreSQL prompt, then type exit again to return to the terminal shell as your standard Linux user with sudo privileges.

  1. Configure the repmgr HA cluster settings on both nodes.

Create log folder in both the nodes as sudo User:

sudo mkdir -p /var/log/postgresql/repmgr.log
sudo touch /var/log/postgresql/repmgr.log
sudo chown postgres:postgres /var/log/postgresql/repmgr.log
Primary Node as sudo User:

sudo nano /etc/repmgr.conf
node_id=1
node_name=pg1
conninfo='host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/16/main'
failover=automatic
promote_command='repmgr -f /etc/repmgr.conf standby promote --log-to-file'
follow_command='repmgr -f /etc/repmgr.conf standby follow --log-to-file'
log_file='/var/log/postgresql/repmgr.log'
log_level=DEBUG
primary_unreachable_timeout=5
repmgrd_standby_timeout=10
repmgrd_failover_delay=5
repmgrd_service_start_command='sudo systemctl start repmgrd'
repmgrd_service_stop_command='sudo systemctl stop repmgrd'
monitoring_history=yes
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

Create an etc/repmgr.conf file on each standby node:

Standby Node as sudo User:

sudo nano /etc/repmgr.conf
node_id=2
node_name=pg2
conninfo='host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/16/main'
failover=automatic
promote_command='repmgr -f /etc/repmgr.conf standby promote --log-to-file'
follow_command='repmgr -f /etc/repmgr.conf standby follow --log-to-file'
log_file='/var/log/postgresql/repmgr.log'
log_level=DEBUG
primary_unreachable_timeout=5
repmgrd_standby_timeout=10
repmgrd_failover_delay=5
repmgrd_service_start_command='sudo systemctl start repmgrd'
repmgrd_service_stop_command='sudo systemctl stop repmgrd'
monitoring_history=yes
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

  1. Register the primary server.

Primary Node as sudo User:

su - postgres
Primary Node as postgres User:

repmgr -f /etc/repmgr.conf primary register
Output:

INFO: connecting to primary database...

NOTICE: attempting to install extension "repmgr"

NOTICE: "repmgr" extension successfully installed

NOTICE: primary node record (ID: 1) registered

The repmgr utility confirms the primary is registered:

Confirm the primary is running using the cluster show command.

Primary Node as postgres User:

repmgr -f /etc/repmgr.conf cluster show
A node with an ID of 1 has the role of primary and a status of running.

Output:

ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string

----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------

1 | pg1 | primary | * running | | default | 100 | 1 | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2

  1. Clone and register the standby server.

Standby Node as sudo User:

sudo systemctl stop postgresql
Standby Node as sudo User:

su - postgres
Standby Node as postgres User:

repmgr -h 192.168.224.128 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --copy-external-config-files --dry-run
Output:

NOTICE: destination directory "/var/lib/postgresql/16/main" provided

INFO: connecting to source node

DETAIL: connection string is: host=192.168.224.128 user=repmgr dbname=repmgr

DETAIL: current installation size is 29 MB

INFO: "repmgr" extension is installed in database "repmgr"

WARNING: target data directory appears to be a PostgreSQL data directory

DETAIL: target data directory is "/var/lib/postgresql/16/main"

HINT: use -F/--force to overwrite the existing data directory

INFO: replication slot usage not requested; no replication slot will be set up for this standby

INFO: parameter "max_wal_senders" set to 10

NOTICE: checking for available walsenders on the source node (2 required)

INFO: sufficient walsenders available on the source node

DETAIL: 2 required, 10 available

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: required number of replication connections could be made to the source server

DETAIL: 2 replication connections required

WARNING: data checksums are not enabled and "wal_log_hints" is "off"

DETAIL: pg_rewind requires "wal_log_hints" to be enabled

NOTICE: external configuration files detected, checking SSH connection to host "192.168.224.128"

INFO: SSH connection to host "192.168.224.128" succeeded

NOTICE: standby will attach to upstream node 1

HINT: consider using the -c/--fast-checkpoint option

INFO: would execute:

pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/16/main -h 192.168.224.128 -p 5432 -U repmgr -X stream

INFO: all prerequisites for "standby clone" are met


If the dry run is successful, run the command again without the --dry-run option:

Standby Node as postgres User:

repmgr -h 192.168.224.128 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --copy-external-config-files --force
Output:

NOTICE: destination directory "/var/lib/postgresql/16/main" provided

INFO: connecting to source node

DETAIL: connection string is: host=192.168.224.128 user=repmgr dbname=repmgr

DETAIL: current installation size is 29 MB

INFO: replication slot usage not requested; no replication slot will be set up for this standby

NOTICE: checking for available walsenders on the source node (2 required)

NOTICE: checking replication connections can be made to the source server (2 required)

WARNING: data checksums are not enabled and "wal_log_hints" is "off"

DETAIL: pg_rewind requires "wal_log_hints" to be enabled

NOTICE: copying external configuration files from upstream node "192.168.224.128"

INFO: rsync command line:

rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/postgresql.conf /etc/postgresql/16/main/postgresql.conf

receiving incremental file list

postgresql.conf

 29,957 100% 28.57MB/s 0:00:00 (xfr#1, to-chk=0/1) 
Enter fullscreen mode Exit fullscreen mode

INFO: rsync command line:

rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/pg_hba.conf /etc/postgresql/16/main/pg_hba.conf

receiving incremental file list

pg_hba.conf

 6,484 100% 6.18MB/s 0:00:00 (xfr#1, to-chk=0/1) 
Enter fullscreen mode Exit fullscreen mode

INFO: rsync command line:

rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/pg_ident.conf /etc/postgresql/16/main/pg_ident.conf

receiving incremental file list

WARNING: directory "/var/lib/postgresql/16/main" exists but is not empty

NOTICE: -F/--force provided - deleting existing data directory "/var/lib/postgresql/16/main"

NOTICE: starting backup (using pg_basebackup)...

HINT: this may take some time; consider using the -c/--fast-checkpoint option

INFO: executing:

pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/16/main -h 192.168.224.128 -p 5432 -U repmgr -X stream

NOTICE: standby clone (using pg_basebackup) complete

NOTICE: you can now start your PostgreSQL server

HINT: for example: pg_ctl -D /var/lib/postgresql/16/main start

HINT: after starting the server, you need to register this standby with "repmgr standby register"


Repeat the steps above to clone any other standby nodes.

On the standby node, make a further adjustment to postgresql.conf:

Standby Node as postgres User:

nano /etc/postgresql/16/main/postgresql.conf
data_directory = '/var/lib/postgresql/16/main'
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

Start PostgreSQL on all standby nodes:

Standby Node as sudo User:

sudo systemctl start postgresql
Verify that PostgreSQL has a status of active:

Standby Node as sudo User

sudo systemctl status postgresql
Press "q" to close the output and return to the terminal shell.

Check for replication is active, access the PostgreSQL shell on the primary node as the postgres user:

Primary Node as postgres User:

psql
Primary Node as postgres User in PostgreSQL Shell:

SELECT * FROM pg_stat_replication;
Output:

pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start

| backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time

------+----------+---------+------------------+-----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------

5331 | 16388 | repmgr | pg2 | 192.168.224.129 | | 53056 | 2025-03-14 14:59:55.601531+00 | | streaming | 0/30001F0 | 0/30001F0 | 0/30001F0 | 0/30001F0 | | | | 0 | async | 2025-03-14 15:00:55.710461+00

(1 row)

Each active standby node should have its own entry. Scan for the following details in the output:

application_name should contain the node_name of the standby.

The client_addr should indicate the IP address of the standby node.

The state should be streaming.

The sync_state is async.

Return to the standby server and log in as the postgres account if not already:

Standby Node as sudo User:

su - postgres
Standby Node as postgres User:

repmgr -f /etc/repmgr.conf standby register
Output:

INFO: connecting to local node "pg2" (ID: 2)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)

INFO: standby registration complete

NOTICE: standby node "pg2" (ID: 2) successfully registered

Repeat this operation on each standby node.

Run the cluster show command on the standby to confirm it is registered:

Standby Node as postgres User:

repmgr -f /etc/repmgr.conf cluster show
Output:

ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string

----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------

1 | pg1 | primary | * running | | default | 100 | 1 | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2

2 | pg2 | standby | running | pg1 | default | 100 | 1 | host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2

Register all nodes, including the primary and all standby nodes, with repmgrd. The daemon monitors the node and quickly responds to any failures.

Testing a Failover Event

To confirm the HA cluster is working, follow these steps.

On the primary node, access the postgres database as the postgres database user:

Primary Node as postgres User:

psql postgres
Create a new customers table inside the postgres database:

Primary Node as postgres User in PostgreSQL Shell:

CREATE TABLE employees (employee_id int, first_name varchar(100), last_name varchar(100));
INSERT INTO employees (employee_id, first_name, last_name) VALUES
(1, 'Chirag', 'Mahto'),
(2, 'Sanju', 'Mehta');
\dt
The new table appears in the output:

 List of relations 
Enter fullscreen mode Exit fullscreen mode

Schema | Name | Type | Owner

--------+-----------+-------+----------

public | employees | table | postgres

(1 row)

postgres=# SELECT * FROM employees;
employee_id | first_name | last_name

-------------+------------+-----------

 1 | Chirag | Mahto 2 | Sanju | Mehta 
Enter fullscreen mode Exit fullscreen mode

(2 rows)

Now, Access PostgreSQL on the standby node:

Standby Node as postgres User:

psql postgres
\dt
The same table appears in the output. The update on the primary is replicated to this node.

List of relations

Schema | Name | Type | Owner

--------+-----------+-------+----------

public | employees | table | postgres

SELECT * FROM employees;
Output:

employee_id | first_name | last_name

-------------+------------+-----------

 1 | Chirag | Mahto 2 | Sanju | Mehta 
Enter fullscreen mode Exit fullscreen mode

(2 rows)

Primary Node as sudo User:

sudo systemctl stop postgresql
Standby Node as postgres User:

repmgr -f /etc/repmgr.conf cluster show
The new table appears in the output:

ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string

----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------

1 | pg1 | primary | - failed | ? | default | 100 | | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2

2 | pg2 | primary | * running | | default | 100 | 2 | host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2

Ensure the new primary allows write operations. Create a new table in the postgres database to verify this:

Standby Node as sudo User:

sudo -u postgres repmgr standby promote
psql
Standby Node as postgres User in PostgreSQL Shell

CREATE TABLE employees2 (employee_id int, first_name varchar(100), last_name varchar(100));
Standby Node as postgres User in PostgreSQL Shell

\dt
The new customers2 table is listed alongside the previous table:

 List of relations 
Enter fullscreen mode Exit fullscreen mode

Schema | Name | Type | Owner

--------+------------+-------+----------

public | employees | table | postgres

public | employees2 | table | postgres

(2 rows)

For any doubts and query, please write on YouTube video comments section.

Note : Flow the Process shown in video.

😉Subscribe and like for more videos:

https://www.youtube.com/@chiragstutorial

💛Don't forget to, 💘Follow, 💝Like, 💖Share 💙&, Comment

Thanks & Regards,

Chitt Ranjan Mahto "Chirag"


Note: All scripts used in this demo will be available in our website. Link will be available in description. 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)