2

I'm setting up a PostgreSQL replication on two servers (CentOS 6.5) with HA by Corosync/Pacemaker.

My software info:

postgresql91-9.1.19-1PGDG.rhel6.x86_64 postgresql91-server-9.1.19-1PGDG.rhel6.x86_64 postgresql91-libs-9.1.19-1PGDG.rhel6.x86_64 postgresql91-contrib-9.1.19-1PGDG.rhel6.x86_64 postgresql91-devel-9.1.19-1PGDG.rhel6.x86_64 corosynclib-1.4.7-2.el6.x86_64 corosync-1.4.7-2.el6.x86_64 pacemaker-cli-1.1.12-8.el6_7.2.x86_64 pacemaker-1.1.12-8.el6_7.2.x86_64 pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64 pacemaker-libs-1.1.12-8.el6_7.2.x86_64 resource-agents-3.9.5-24.el6_7.1.x86_64 

The replication is working, from master I can see the slave server connected:

-bash-4.1$ psql -c "select client_addr,sync_state from pg_stat_replication;" client_addr | sync_state -------------+------------ 172.16.1.10 | async (1 row)

And I also confirm that data created on master is replicated to slave.

Here is my crm configure show:

node master node slave primitive PSQL pgsql \ params restart_on_promote=true pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="master slave" repuser=rep rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=172.16.1.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" primitive RepIP IPaddr2 \ params ip=172.16.1.100 nic=eth2 cidr_netmask=24 \ op monitor interval=30s primitive VirtualIP IPaddr2 \ params ip=10.0.0.100 cidr_netmask=24 \ op monitor interval=30s group psql-ha VirtualIP RepIP \ meta target-role=Started property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore 

But the resource PSQL can not start. My crm status:

Last updated: Sat Nov 28 13:09:47 2015 Last change: Sat Nov 28 12:50:21 2015 Stack: classic openais (with plugin) Current DC: master - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ master slave ] Resource Group: psql-ha VirtualIP (ocf::heartbeat:IPaddr2): Started master RepIP (ocf::heartbeat:IPaddr2): Started master Failed actions: PSQL_start_0 on slave 'not configured' (6): call=60, status=complete, last-rc-change='Sat Nov 28 12:50:21 2015', queued=0ms, exec=53ms 

There is an error log in /var/log/messages:

Nov 28 12:50:21 slave pgsql(PSQL)[3387]: ERROR: Replication(rep_mode=async or sync) requires Master/Slave configuration.

Could anyone explain for me why I got that error?

Thanks.

UPDATED:

(name of hosts changed to node1/node2)

Problem solved with configuration of @gf_.

Note: Forget about my old configuration, I'm using only one virtual IP in this deployment model.

Current status:

[root@node1 ~]# crm_mon -Af -1 Last updated: Wed Dec 2 05:13:56 2015 Last change: Wed Dec 2 05:10:06 2015 Stack: classic openais (with plugin) Current DC: node2 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ node1 node2 ] VirtualIP (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPSQL [PSQL] Masters: [ node2 ] Slaves: [ node1 ] Node Attributes: * Node node1: + PSQL-data-status : STREAMING|SYNC + PSQL-status : HS:sync + master-PSQL : 100 * Node node2: + PSQL-data-status : LATEST + PSQL-master-baseline : 000000000E000078 + PSQL-status : PRI + master-PSQL : 1000 Migration summary: * Node node1: * Node node2: 

Working configuration:

node node1 \ attributes PSQL-data-status="STREAMING|SYNC" node node2 \ attributes PSQL-data-status=LATEST primitive PSQL pgsql \ params restart_on_promote=false pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="node1 node2" repuser=replicate rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=10.0.0.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" \ op start timeout=60s interval=0s on-fail=restart \ op monitor timeout=60s interval=4s on-fail=restart \ op monitor timeout=60s interval=3s on-fail=restart role=Master \ op promote timeout=60s interval=0s on-fail=restart \ op demote timeout=60s interval=0s on-fail=stop \ op stop timeout=60s interval=0s on-fail=block \ op notify timeout=60s interval=0s primitive VirtualIP IPaddr2 \ params ip=10.0.0.100 nic=eth1 cidr_netmask=24 \ op monitor interval=30s ms msPSQL PSQL \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 target-role=Started notify=true colocation rsc_colocation-1 inf: VirtualIP msPSQL:Master order rsc_order-1 0: msPSQL:promote VirtualIP:start symmetrical=false order rsc_order-2 0: msPSQL:promote VirtualIP:stop symmetrical=false property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false \ last-lrm-refresh=1449033003 rsc_defaults rsc-options: \ resource-stickiness=100 
1
  • Glad it helped. Another hint: Remember to set up fencing. Commented Dec 2, 2015 at 22:01

1 Answer 1

2
  • At the same time, PSQL should run on both of your nodes, master and slave. (Just a small note: Not sure if these terms are good to choose as node names in your setup.)

  • So, you have to reflect this in your configuration, the error you've got is quite clear, and describes, what's missing: You've to configure your PSQL as a cloned (should run on multiple nodes, at the same time), multi-state (should run in a master-slave-setup) resource. If you've got no idea, what this is about, now would be a good time to have a look into the docs, especially for Clones - Resources That Get Active on Multiple Hosts and Multi-state - Resources That Have Multiple Modes.

  • So, your extended configuration could look like this:

    ms msPSQL PSQL \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" 
  • Additionally, you have to specify, on which of your nodes your VirtualIP and RepIP should run, and you have to make sure, that the resources are stopped / started in correct order:

    colocation rsc_colocation-1 inf: psql-ha msPSQL:Master order rsc_order-1 0: msPSQL:promote psql-ha:start symmetrical=false order rsc_order-2 0: msPSQL:demote psql-ha:stop symmetrical=false 
1
  • Thank you @gf_ it works now. I got some troubles with replication and also had to deal with cleanup resource (crm_resource --cleanup --resource msPSQL). Your configuration is totally correct (: I also updated my configuration and current status in question. Commented Dec 2, 2015 at 5:18

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.