I'm setting up a PostgreSQL replication on two servers (CentOS 6.5) with HA by Corosync/Pacemaker.
My software info:
postgresql91-9.1.19-1PGDG.rhel6.x86_64 postgresql91-server-9.1.19-1PGDG.rhel6.x86_64 postgresql91-libs-9.1.19-1PGDG.rhel6.x86_64 postgresql91-contrib-9.1.19-1PGDG.rhel6.x86_64 postgresql91-devel-9.1.19-1PGDG.rhel6.x86_64 corosynclib-1.4.7-2.el6.x86_64 corosync-1.4.7-2.el6.x86_64 pacemaker-cli-1.1.12-8.el6_7.2.x86_64 pacemaker-1.1.12-8.el6_7.2.x86_64 pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64 pacemaker-libs-1.1.12-8.el6_7.2.x86_64 resource-agents-3.9.5-24.el6_7.1.x86_64
The replication is working, from master I can see the slave server connected:
-bash-4.1$ psql -c "select client_addr,sync_state from pg_stat_replication;" client_addr | sync_state -------------+------------ 172.16.1.10 | async (1 row)
And I also confirm that data created on master is replicated to slave.
Here is my crm configure show
:
node master node slave primitive PSQL pgsql \ params restart_on_promote=true pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="master slave" repuser=rep rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=172.16.1.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" primitive RepIP IPaddr2 \ params ip=172.16.1.100 nic=eth2 cidr_netmask=24 \ op monitor interval=30s primitive VirtualIP IPaddr2 \ params ip=10.0.0.100 cidr_netmask=24 \ op monitor interval=30s group psql-ha VirtualIP RepIP \ meta target-role=Started property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore
But the resource PSQL
can not start. My crm status
:
Last updated: Sat Nov 28 13:09:47 2015 Last change: Sat Nov 28 12:50:21 2015 Stack: classic openais (with plugin) Current DC: master - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ master slave ] Resource Group: psql-ha VirtualIP (ocf::heartbeat:IPaddr2): Started master RepIP (ocf::heartbeat:IPaddr2): Started master Failed actions: PSQL_start_0 on slave 'not configured' (6): call=60, status=complete, last-rc-change='Sat Nov 28 12:50:21 2015', queued=0ms, exec=53ms
There is an error log in /var/log/messages
:
Nov 28 12:50:21 slave pgsql(PSQL)[3387]: ERROR: Replication(rep_mode=async or sync) requires Master/Slave configuration.
Could anyone explain for me why I got that error?
Thanks.
UPDATED:
(name of hosts changed to node1/node2)
Problem solved with configuration of @gf_.
Note: Forget about my old configuration, I'm using only one virtual IP in this deployment model.
Current status:
[root@node1 ~]# crm_mon -Af -1 Last updated: Wed Dec 2 05:13:56 2015 Last change: Wed Dec 2 05:10:06 2015 Stack: classic openais (with plugin) Current DC: node2 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ node1 node2 ] VirtualIP (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPSQL [PSQL] Masters: [ node2 ] Slaves: [ node1 ] Node Attributes: * Node node1: + PSQL-data-status : STREAMING|SYNC + PSQL-status : HS:sync + master-PSQL : 100 * Node node2: + PSQL-data-status : LATEST + PSQL-master-baseline : 000000000E000078 + PSQL-status : PRI + master-PSQL : 1000 Migration summary: * Node node1: * Node node2:
Working configuration:
node node1 \ attributes PSQL-data-status="STREAMING|SYNC" node node2 \ attributes PSQL-data-status=LATEST primitive PSQL pgsql \ params restart_on_promote=false pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="node1 node2" repuser=replicate rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=10.0.0.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" \ op start timeout=60s interval=0s on-fail=restart \ op monitor timeout=60s interval=4s on-fail=restart \ op monitor timeout=60s interval=3s on-fail=restart role=Master \ op promote timeout=60s interval=0s on-fail=restart \ op demote timeout=60s interval=0s on-fail=stop \ op stop timeout=60s interval=0s on-fail=block \ op notify timeout=60s interval=0s primitive VirtualIP IPaddr2 \ params ip=10.0.0.100 nic=eth1 cidr_netmask=24 \ op monitor interval=30s ms msPSQL PSQL \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 target-role=Started notify=true colocation rsc_colocation-1 inf: VirtualIP msPSQL:Master order rsc_order-1 0: msPSQL:promote VirtualIP:start symmetrical=false order rsc_order-2 0: msPSQL:promote VirtualIP:stop symmetrical=false property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false \ last-lrm-refresh=1449033003 rsc_defaults rsc-options: \ resource-stickiness=100