PostgreSQL Replication Tutorial

PostgreSQL: Understanding replication Hans-Jürgen Schönig www.postgresql-support.de Hans-Jürgen Schönig www.postgresql-support.de

Welcome to PostgreSQL replication Hans-J¨urgen Sch¨onig www.postgresql-support.de

What you will learn How PostgreSQL writes data What the transaction log does How to set up streaming replication How to handle Point-In-Time-Recovery Managing conflicts Monitoring replication More advanced techniques Hans-Jürgen Schönig www.postgresql-support.de

How PostgreSQL writes data Hans-J¨urgen Sch¨onig www.postgresql-support.de

Writing a row of data Understanding how PostgreSQL writes data is key to understanding replication Vital to understand PITR A lot of potential to tune the system Hans-J¨urgen Sch¨onig www.postgresql-support.de

Write the log first (1) It is not possible to send data to a data table directly. What if the system crashes during a write? A data file could end up with broken data at potentially unknown positions Corruption is not an option Hans-Jürgen Schönig www.postgresql-support.de

Write the log first (2) Data goes to the xlog (= WAL) first WAL is short for “Write Ahead Log” IMPORTANT: The xlog DOES NOT contain SQL It contains BINARY changes Hans-Jürgen Schönig www.postgresql-support.de

The xlog The xlog consists of a set of 16 MB files The xlog consists of various types of records (heap changes, btree changes, etc.) It has to be flushed to disk on commit to achieve durability Hans-Jürgen Schönig www.postgresql-support.de

Expert tip: Debugging the xlog Change WAL DEBUG in src/include/pg config manual.h Recompile PostgreSQL NOTE: This is not for normal use but just for training purposes Hans-Jürgen Schönig www.postgresql-support.de

Enabling wal debug test=# SET wal_debug TO on; SET test=# SET client_min_messages TO debug; SET Hans-J¨urgen Sch¨onig www.postgresql-support.de

Observing changes Every change will go to the screen now It helps to understand how PostgreSQL works Apart from debugging: The practical purpose is limited Hans-J¨urgen Sch¨onig www.postgresql-support.de

Making changes Data goes to the xlog first Then data is put into shared buffers At some later point data is written to the data files This does not happen instantly leaving a lot of room for optimization and tuning Hans-Jürgen Schönig www.postgresql-support.de

A consistent view of the data Data is not sent to those data files instantly. Still: End users will have a consistent view of the data When a query comes in, it checks the I/O cache (= shared buffers) and asks the data files only in case of a cache miss. xlog is about the physical not about the logical level Hans-Jürgen Schönig www.postgresql-support.de

Sustaining writes We cannot write to the xlog forever without recycling it. The xlog is recycled during a so called “checkpoint”. Before the xlog can be recycled, data must be stored safely in those data files Checkpoints have a huge impact on performance Hans-Jürgen Schönig www.postgresql-support.de

Checkpoint parameters: checkpoint timeout = 15min max wal size = 5GB min wal size = 800MB checkpoint completion target = 0.5 checkpoint warning = 30s Hans-J¨urgen Sch¨onig www.postgresql-support.de

Checkpointing to frequently Checkpointing is expensive PostgreSQL warns about too frequent checkpoints This is what checkpoint warning is good for Hans-J¨urgen Sch¨onig www.postgresql-support.de

min wal size and max wal size (1) This is a replacement for checkpoint segments Now the xlog size is auto-tuned The new configuration was introduced in PostgreSQL 9.5 Hans-Jürgen Schönig www.postgresql-support.de

min wal size and max wal size (2) Instead of having a single knob (checkpoint segments) that both triggers checkpoints, and determines how many checkpoints to recycle, they are now separate concerns. There is still an internal variable called CheckpointSegments, which triggers checkpoints. But it no longer determines how many segments to recycle at a checkpoint. That is now auto-tuned by keeping a moving average of the distance between checkpoints (in bytes), and trying to keep that many segments in reserve. Hans-J¨urgen Sch¨onig www.postgresql-support.de

min wal size and max wal size (3) The advantage of this is that you can set max wal size very high, but the system won’t actually consume that much space if there isn’t any need for it. The min wal size sets a floor for that; you can effectively disable the auto-tuning behavior by setting min wal size equal to max wal size. Hans-Jürgen Schönig www.postgresql-support.de

How does it impact replication The xlog has all the changes needed and can therefore be used for replication. Copying data files is not enough to achieve a consistent view of the data It has some implications related to base backups Hans-Jürgen Schönig www.postgresql-support.de

Setting up streaming replication Hans-J¨urgen Sch¨onig www.postgresql-support.de

The basic process S: Install PostgreSQL on the slave (no initdb) M: Adapt postgresql.conf M: Adapt pg hba.conf M: Restart PostgreSQL S: Pull a base backup S: Start the slave Hans-J¨urgen Sch¨onig www.postgresql-support.de

Changing postgresql.conf wal level: Ensure that there is enough xlog generated by the master (recovering a server needs more xlog than just simple crash-safety) max wal senders: When a slave is streaming, connects to the master and fetches xlog. A base backup will also need 1 / 2 wal senders hot standby: This is not needed because it is ignored on the master but it saves some work on the slave later on Hans-J¨urgen Sch¨onig www.postgresql-support.de

Changing pg hba.conf Rules for replication have to be added. Note that “all” databases does not include replication A separate rule has to be added, which explicitly states “replication” in the second column Replication rules work just like any other pg hba.conf rule Remember: The first line matching rules Hans-Jürgen Schönig www.postgresql-support.de

Restarting PostgreSQL To activate those settings in postgresql.conf the master has to be restarted. If only pg hba.conf is changed, a simple SIGHUP (pg ctl reload) is enough. Hans-J¨urgen Sch¨onig www.postgresql-support.de

Using pg basebackup (1) pg basebackup will fetch a copy of the data from the master While pg basebackup is running, the master is fully operational (no downtime needed) pg basebackup connects through a database connection and copies all data files as they are In most cases this does not create a consistent backup The xlog is needed to “repair” the base backup (this is exactly what happens during xlog replay anyway) Hans-Jürgen Schönig www.postgresql-support.de

Using pg basebackup (2) pg_basebackup -h master.com -D /slave --xlog-method=stream --checkpoint=fast -R Hans-J¨urgen Sch¨onig www.postgresql-support.de

xlog-method: Self-contained backups By default a base backup is not self-contained. The database does not start up without additional xlog. This is fine for Point-In-Time-Recovery because there is an archive around. For streaming it can be a problem. –xlog-method=stream opens a second connection to fetch xlog during the base backup Hans-Jürgen Schönig www.postgresql-support.de

checkpoint=fast: Instant backups By default pg basebackup starts as soon as the master checkpoints. This can take a while. –checkpoint=fast makes the master check instantly. In case of a small backup an instant checkpoint speeds things up. Hans-J¨urgen Sch¨onig www.postgresql-support.de

-R: Generating a config file For a simple streaming setup all PostgreSQL has to know is already passed to pg basebackup (host, port, etc.). -R automatically generates a recovery.conf file, which is quite ok in most cases. Hans-Jürgen Schönig www.postgresql-support.de

Backup throttling –max-rate=RATE: maximum transfer rate to transfer data directory (in kB/s, or use suffix “k” or “M”) If your master is weak a pg basebackup running at full speed can lead to high response times and disk wait. Slowing down the backup can help to make sure the master stays responsive. Hans-Jürgen Schönig www.postgresql-support.de

Adjusting recovery.conf A basic setup needs: primary conninfo: A connect string pointing to the master server standby mode = on: Tells the system to stream instantly Additional configuration parameters are available Hans-Jürgen Schönig www.postgresql-support.de

Starting up the slave Make sure the slave has connected to the master Make sure it has reached a consistent state Check for wal sender and wal receiver processes Hans-J¨urgen Sch¨onig www.postgresql-support.de

Promoting a slave to master Promoting a slave to a master is easy: pg_ctl -D ... promote After promotion recovery.conf will be renamed to recovery.done Hans-J¨urgen Sch¨onig www.postgresql-support.de

One word about security So far replication has been done as superuser This is not necessary Creating a user, which can do just replication makes sense CREATE ROLE foo ... REPLICATION ... NOSUPERUSER; Hans-J¨urgen Sch¨onig www.postgresql-support.de

Monitoring replication Hans-J¨urgen Sch¨onig www.postgresql-support.de

Simple checks The most basic and most simplistic check is to check for wal sender (on the master) wal receiver (on the slave) Without those processes the party is over Hans-J¨urgen Sch¨onig www.postgresql-support.de

More detailed analysis pg stat replication contains a lot of information Make sure an entry for each slave is there Check for replication lag Hans-J¨urgen Sch¨onig www.postgresql-support.de

Checking for replication lag A sustained lag is not a good idea. The distance between the sender and the receiver can be measured in bytes SELECT client_addr, pg_xlog_location_diff(pg_current_xlog_location(), sent_location) FROM pg_stat_replication; In asynchronous replication the replication lag can vary dramatically (for example during CREATE INDEX, etc.) Hans-J¨urgen Sch¨onig www.postgresql-support.de

Creating large clusters Hans-J¨urgen Sch¨onig www.postgresql-support.de

Handling more than 2 nodes A simple 2 node cluster is easy. In case of more than 2 servers, life is a bit harder. If you have two slaves and the master fails: Who is going to be the new master? Unless you want to resync all your data, you should better elect the server containing most of the data already Comparing xlog positions is necessary Hans-J¨urgen Sch¨onig www.postgresql-support.de

Timeline issues When a slave is promoted the timeline ID is incremented Master and slave have to be in the same timeline In case of two servers it is important to connect one server to the second one first and do the promotion AFTERWARDS. This ensures that the timeline switch is already replicated from the new master to the surviving slave. Hans-Jürgen Schönig www.postgresql-support.de

Cascading slaves Slaves can be connected to slaves Cascading can make sense to reduce bandwidth requirements Cascading can take load from the master Use pg basebackup to fetch data from a slave as if it was a master Hans-J¨urgen Sch¨onig www.postgresql-support.de

Conflicts Hans-Jürgen Schönig www.postgresql-support.de

How conflicts happen During replication conflicts can happen Example: The master might want to remove a row still visible to a reading transaction on the slave Hans-Jürgen Schönig www.postgresql-support.de

What happens during a conflict PostgreSQL will terminate a database connection after some time max standby archive delay = 30s max standby streaming delay = 30s Those settings define the maximum time the slave waits during replay before replay is resumed. In rare cases a connection might be aborted quite soon. Hans-Jürgen Schönig www.postgresql-support.de

Reducing conflicts Conflicts can be reduced nicely by setting hot standby feedback. The slave will send its oldest transaction ID to tell the master that cleanup has to be deferred. Hans-Jürgen Schönig www.postgresql-support.de

Making replication more reliable Hans-J¨urgen Sch¨onig www.postgresql-support.de

What happens if a slave reboots? If a slave is gone for too long, the master might recycle its transaction log The slave needs a full history of the xlog Setting wal keep segments on the master helps to prevent the master from recycling transaction log too early I recommend to always use wal keep segments to make sure that a slave can be started after a pg basebackup Hans-J¨urgen Sch¨onig www.postgresql-support.de

Making use of replication slots Replication slots have been added in PostgreSQL 9.4 There are two types of replication slots: Physical replication slots (for streaming) Logical replication slots (for logical decoding) Hans-J¨urgen Sch¨onig www.postgresql-support.de

Configuring replication slots Change max replication slots and restart the master Run . . . test=# SELECT * FROM pg_create_physical_replication_slot(’some_name’); slot_name | xlog_position -----------+--------------- some_name | (1 row) Hans-Jürgen Schönig www.postgresql-support.de

Tweaking the slave Add this replication slot to primary slot name on the slave: primary_slot_name = ’some_name’ The master will ensure that xlog is only recycled when it has been consumed by the slave. Hans-J¨urgen Sch¨onig www.postgresql-support.de

A word of caution If a slave is removed make sure the replication slot is dropped. Otherwise the master might run out of disk space. NEVER use replication slots without monitoring the size of the xlog on the sender. Hans-J¨urgen Sch¨onig www.postgresql-support.de

Key advantages of replication slots The difference between master and slave can be arbitrary. During bulk load or CREATE INDEX this can be essential. It can help to overcome the problems caused by slow networks. It can help to avoid resyncs. Hans-Jürgen Schönig www.postgresql-support.de

Moving to synchronous replication Hans-J¨urgen Sch¨onig www.postgresql-support.de

Synchronous vs. asynchronous Asynchronous replication: Commits on the slave can happen long after the commit on the master. Synchronous replication: A transaction has to be written to a second server. Synchronous replication potentially adds some network latency to the scenery Hans-J¨urgen Sch¨onig www.postgresql-support.de

The application name During normal operations the application name setting can be used to assign a name to a database connection. In case of synchronous replication this variable is used to determine synchronous candidates. Hans-J¨urgen Sch¨onig www.postgresql-support.de

Configuring synchronous replication: Master: add names to synchronous standby names Slave: add an application name to your connect string in primary conninfo Hans-Jürgen Schönig www.postgresql-support.de

Fail safety Synchronous replication needs 2 active servers If no two servers are left, replication will wait until a second server is available. Use AT LEAST 3 servers for synchronous replication to avoid risk. Hans-J¨urgen Sch¨onig www.postgresql-support.de

Point-In-Time-Recovery Hans-J¨urgen Sch¨onig www.postgresql-support.de

What it does PITR can be used to reach (almost) any point after a base backup. It is more of a backup strategy than a replication thing. Replication and PITR can be combined. Hans-J¨urgen Sch¨onig www.postgresql-support.de

Configuring for PITR S: create an archive (ideally this is not on the master) M: Change postgresql.conf set wal level set max wal senders (if pg basebackup is desired) set archive mode to on set a proper archive command to archive xlog M: adapt pg hba.conf (if pg basebackup is desired) M: restart the master Hans-Jürgen Schönig www.postgresql-support.de

pg basebackup, etc. Perform a pg basebackup as performed before –xlog-method=stream and -R are not needed In the archive a .backup file will be available after pg basebackup You can delete all xlog files older than the oldest base backup you want to keep. The .backup file will guide you Hans-Jürgen Schönig www.postgresql-support.de

Restoring from a crash Take a base backup. Write a recovery.conf file: restore command: Tell PostgreSQL where to find xlog recovery target time (optional): Use a timestamp to tell the system how far to recover Start the server Make sure the system has reached consistent state Hans-Jürgen Schönig www.postgresql-support.de

More config options Hans-Jürgen Schönig www.postgresql-support.de

recovery min apply delay: Delayed replay This settings allows you to tell the slave that a certain delay is desired. Example: A stock broker might want to provide you with 15 minute old data Hans-J¨urgen Sch¨onig www.postgresql-support.de

pause at recovery target Make sure that the recovery does not stop at a specified point in time. Make PostgreSQL wait when a certain point is reached. This is essential in case you do not know precisely how far to recover Hans-Jürgen Schönig www.postgresql-support.de

recovery target name Sometimes you want to recover to a certain point in time, which has been specified before. To specify a point in time run . . . SELECT pg_create_restore_point(’some_name’); Use this name in recovery.conf to recover to this very specific point Hans-Jürgen Schönig www.postgresql-support.de

Finally . . . Hans-J¨urgen Sch¨onig www.postgresql-support.de

Contact us . . . Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Austria More than 15 years of PostgreSQL experience: Training Consulting 24x7 support Hans-Jürgen Schönig www.postgresql-support.de

PostgreSQL Replication Tutorial

More Related Content

What's hot

Viewers also liked

Similar to PostgreSQL Replication Tutorial

Recently uploaded

In this document

PostgreSQL Replication Tutorial