© 2023 NTT DATA Group Corporation © 2023 NTT DATA Group Corporation PostgreSQL on Kubernetes: Realizing High Availability with PGO Aug. 30th, 2023 at Postgres Ibiza 2023 Shinya Kato NTT DATA Group Corporation
© 2023 NTT DATA Group Corporation 2 About me • Shinya Kato @ShinyaKato_ • NTT DATA Group Corporation • Jobs • Research & development about PostgreSQL • PostgreSQL support • Sometimes development in PostgreSQL community
© 2023 NTT DATA Group Corporation 3 About this talk • Presentation slide is available on SlideShare • https://www.slideshare.net/nttdata-tech • Presentation slide was prepared on Aug. 10th, 2023 • May not be up-to-date
© 2023 NTT DATA Group Corporation 4 Topics PostgreSQL Operator Basic features of PGO Configuration about high availability Backup/Restore Disaster recovery Encountered challenges Summary
© 2023 NTT DATA Group Corporation 5 01 PostgreSQL Operator
© 2023 NTT DATA Group Corporation 6 Tasks of DBA Setting up HA cluster Monitoring metrics and logs Managing backups
© 2023 NTT DATA Group Corporation 7 What is PostgreSQL Operator? • Such a PostgreSQL environment can be easily realized • Features vary depending on PostgreSQL Operators Backup Metrics Logging PostgreSQL HA PostgreSQL PostgreSQL HA Setup/ Monitoring User
© 2023 NTT DATA Group Corporation 8 Popular PostgreSQL Operators PGO postgres- operator CloudNative PG StackGres Main development company Crunchy Data Zalando EnterpriseDB OnGres Latest version 5.4.1 1.10.0 1.20.2 1.5.0 License Apache 2.0 MIT Apache 2.0 AGPLv3 Stars 3,372 3,469 1,650 636 (GitHub) 90 (GitLab) Implementation Go Go Go Java
© 2023 NTT DATA Group Corporation 9 02 Basic features of PGO
© 2023 NTT DATA Group Corporation 10 PGO • Developed and provided as OSS by Crunchy Data • Several PostgreSQL core developers belong • One of the most popular PostgreSQL Operator • 3,372 stars on GitHub • Supports PostgreSQL 11~15
© 2023 NTT DATA Group Corporation 11 Architecture • PGO consists of multiple software Client PgBouncer PVC AWS GCP Azure Connection pooling Monitoring postgres- exporter Patroni pgBackRest postgres- exporter Patroni pgBackRest postgres- exporter Patroni pgBackRest Backup HA
© 2023 NTT DATA Group Corporation 12 Architecture • PGO consists of multiple software Client PgBouncer PVC AWS GCP Azure Connection pooling Monitoring postgres- exporter Patroni pgBackRest postgres- exporter Patroni pgBackRest postgres- exporter Patroni pgBackRest Backup HA
© 2023 NTT DATA Group Corporation 13 Version • Version used in this talk • Kubernetes 1.26 • PGO 5.4.0
© 2023 NTT DATA Group Corporation 14 Install PGO • Use examples from Cruchy Data GitHub • https://github.com/CrunchyData/postgres-operator-examples $ git clone https://github.com/CrunchyData/postgres-operator-examples.git $ cd postgres-operator-examples $ git checkout 9a3b808 $ kubectl apply -k kustomize/install/namespace $ kubectl apply --server-side -k kustomize/install/default $ kubectl get pods -n postgres-operator NAME READY STATUS RESTARTS AGE pgo-7574d677f7-jbpl5 1/1 Running 0 15s PGO 5.4.0 commit hash
© 2023 NTT DATA Group Corporation 15 Create PostgreSQL Cluster • Use examples from Cruchy Data GitHub $ kubectl apply --server-side -k kustomize/postgres $ kubectl get postgrescluster -n postgres-operator NAME AGE hippo 71s $ kubectl get pods -n postgres-operator -l postgres- operator.crunchydata.com/instance-set=instance1 NAME READY STATUS RESTARTS AGE hippo-instance1-pdcg-0 4/4 Running 0 75s PostgreSQL cluster named “hippo” is created
© 2023 NTT DATA Group Corporation 16 Change the Number of Replicas • Set the number of replicas in the manifest spec: image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8- 15.3-2 postgresVersion: 15 instances: - name: instance1 replicas: 3 dataVolumeClaimSpec:
© 2023 NTT DATA Group Corporation 17 Change the Number of Replicas • Apply the manifest $ kubectl apply --server-side -k kustomize/postgres $ kubectl get pods -n postgres-operator -l postgres- operator.crunchydata.com/instance-set=instance1 NAME READY STATUS RESTARTS AGE hippo-instance1-pdcg-0 4/4 Running 0 80s hippo-instance1-6rqk-0 4/4 Running 0 18s hippo-instance1-l5sk-0 4/4 Running 0 18s 3 PostgreSQL Pods started
© 2023 NTT DATA Group Corporation 18 Failover • Automatically restarts a Pod when it goes down DEMO
© 2023 NTT DATA Group Corporation 19 Rolling Update • Change to new image tag and apply DEMO
© 2023 NTT DATA Group Corporation 20 03 Configuration about high availability
© 2023 NTT DATA Group Corporation 21 Patroni • Software for high availability PostgreSQL DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby
© 2023 NTT DATA Group Corporation 22 Patroni • Software for high availability PostgreSQL DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby TABLE pg_stat_replication SELECT pg_current_wal_lsn() SELECT pg_last_wal_replay_lsn() SHOW synchronous_commit SELECT pg_stat_get_wal_receiver() SELECT pg_last_wal_receive_lsn() SELECT pg_is_wal_replay_paused()
© 2023 NTT DATA Group Corporation 23 Patroni • Software for high availability PostgreSQL DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby Stored - Leader - Members - Config
© 2023 NTT DATA Group Corporation 24 Patroni Configuration • PGO uses Patroni to realize high availability • Patroni configurations are able to set in the manifest • synchronous_mode • loop_wait • ttl • master_start_timeout • etc.
© 2023 NTT DATA Group Corporation 25 synchronous_mode • In default, Patroni uses async replication. • If synchronous_mode is on, one node is sync replication 2 node: Primary and sync standby 3 node: Primary, sync standby, and async standby Primary Sync standby Primary Sync standby Async standby Streaming Replication Streaming Replication
© 2023 NTT DATA Group Corporation 26 loop_wait and ttl • loop_wait:The interval between status and DCS checks • ttl: The TTL to acquire the leader lock DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby Default is to check every 10 seconds. By default, the leader lock expires in 30 seconds.
© 2023 NTT DATA Group Corporation 27 loop_wait and ttl • loop_wait:The interval between status and DCS checks • ttl: The TTL to acquire the leader lock DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby crash Leader is not updated If the leader hasn’t been updated after the TTL time has passed, the leader lock has been expired
© 2023 NTT DATA Group Corporation 28 loop_wait and ttl • loop_wait:The interval between status and DCS checks • ttl: The TTL to acquire the leader lock DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby crash New leader has been elected
© 2023 NTT DATA Group Corporation 29 master_start_timeout • The time allowed for primary recovery before failover DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby Default is to wait 300 seconds for recovery. In the worst case, failover takes master_start_timeout + 2 * loop_wait seconds
© 2023 NTT DATA Group Corporation 30 Setting in the manifest • My settings spec: patroni: dynamicConfiguration: synchronous_mode: true # Default is false master_start_timeout: 0 # Default is 300 leaderLeaseDurationSeconds: 30 # Default value syncPeriodSeconds: 10 # Default value Note that in PGO, loop_wait and ttl are converted to syncPeriodSeconds and leaderLeaseDurationSeconds, respectively.
© 2023 NTT DATA Group Corporation 31 Kubernetes Configuration • Distribute PostgreSQL Pods to the appropriate worker nodes using two functions • Node Affinity • Topology Spread Constraints
© 2023 NTT DATA Group Corporation 32 Node Affinity • Assign PostgreSQL Pods to specific worker nodes • High-performance SSD • High-speed network Worker for PostgreSQL Worker for application servers Assign Pods
© 2023 NTT DATA Group Corporation 33 Node Affinity • Labeling worker nodes $ kubectl label nodes kind-worker diskType=ssd $ kubectl label nodes kind-worker2 diskType=ssd $ kubectl label nodes kind-worker3 diskType=ssd $ kubectl label nodes kind-worker4 diskType=hdd $ kubectl get nodes --show-labels NAME STATUS LABELS kind-control-plane Ready ... kind-worker Ready diskType=ssd,... kind-worker2 Ready diskType=ssd,... kind-worker3 Ready diskType=ssd,... kind-worker4 Ready diskType=hdd,... diskType=ssd diskType=ssd diskType=ssd diskType=hdd Worker 1 Worker 2 Worker 3 Worker 4
© 2023 NTT DATA Group Corporation 34 Node Affinity • Configure in the manifest spec: instances: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: diskType operator: In values: - ssd Assign to a worker with diskType=ssd
© 2023 NTT DATA Group Corporation 35 Node Affinity • Apply the manifest $ kubectl apply --server-side -k kustomize/postgres $ kubectl get pods -n postgres-operator -l postgres- operator.crunchydata.com/instance-set=instance1 -o wide NAME READY STATUS NODE hippo-instance1-j4rd-0 4/4 Running kind-worker hippo-instance1-jqz5-0 4/4 Running kind-worker hippo-instance1-pslx-0 4/4 Running kind-worker2 Not assigned to worker 4
© 2023 NTT DATA Group Corporation 36 Topology Spread Constraints • Pods can be distributed by region/zone/node • Ensure availability in region/zone/node failure Not distributed AvailabilityZone A Distributed AvailabilityZone A AvailabilityZone B Running in AvailabilityZone B
© 2023 NTT DATA Group Corporation 37 Topology Spread Constraints • Labeling nodes $ kubectl label nodes kind-worker availabilityZone=A $ kubectl label nodes kind-worker2 availabilityZone=B $ kubectl label nodes kind-worker3 availabilityZone=C $ kubectl get nodes --show-labels NAME STATUS LABELS kind-control-plane Ready ... kind-worker Ready availabilityZone=A,... kind-worker2 Ready availabilityZone=B,... kind-worker3 Ready availabilityZone=C,... availabilityZone=A Worker 1 Worker 2 availabilityZone=B Worker 3 availabilityZone=C
© 2023 NTT DATA Group Corporation 38 • Configure in the manifest spec: instances: topologySpreadConstraints: - maxSkew: 1 topologyKey: availabilityZone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: postgres-operator.crunchydata.com/instance-set: instance postgres-operator.crunchydata.com/cluster: hippo Topology Spread Constraints Distributed in availabilityZone Not assigned if not satisfiable Specify the label of pods to be assigned
© 2023 NTT DATA Group Corporation 39 Topology Spread Constraints • Apply the manifest $ kubectl apply --server-side -k kustomize/postgres $ kubectl get pods -n postgres-operator -l postgres- operator.crunchydata.com/instance-set=instance1 -o wide NAME READY STATUS NODE hippo-instance1-bzkw-0 4/4 Running kind-worker3 hippo-instance1-cmkl-0 4/4 Running kind-worker hippo-instance1-m82d-0 4/4 Running kind-worker2 Distributed to worker 1, 2, and 3
© 2023 NTT DATA Group Corporation 40 04 Backup/Restore
© 2023 NTT DATA Group Corporation 41 pgBackRest • Multi-functional backup tool for PostgreSQL pgBackRest Backup Server pgBackRest Backups WAL archives Backups WAL archives Backup PostgreSQL #1 pgBackRest PostgreSQL #2 PostgreSQL #1 PostgreSQL #2
© 2023 NTT DATA Group Corporation 42 pgBackRest Configuration backups: pgbackrest: image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.45-2 repos: - name: repo1 schedules: full: "0 15 * * *" s3: bucket: pgo-auto-backups endpoint: s3.ap-northeast-1.amazonaws.com region: ap-northeast-1 - name: repo2 gcs: bucket: pgo-manual-backups configuration: - secret: name: pgo-s3-creds - secret: name: pgo-gcs-creds global: repo1-retention-full: "30" manual: repoName: repo2 Multiple repositories Retention policy Manual backups Backup type/ Backup scheduling Secret to store authentication for s3 and gcs
© 2023 NTT DATA Group Corporation 43 Multiple Repositories • Stored on several (up to 4) different storages pgBackRest Amazon Simple Storage Service (Amazon S3) s ts ob pgBackRest p pgBackRest Start up manually or at scheduled times ① ②
© 2023 NTT DATA Group Corporation 44 Backup type/Backup scheduling • 3 backup types • full • differential • incremental • Backup scheduling • cron-formatted string backups: pgbackrest: repos: - name: repo1 schedules: full: "0 1 * * 0" differential:"0 1 * * 1-6" Weekly full backup at 1am on Sundays. Daily differential backups at 1am, except Sundays.
© 2023 NTT DATA Group Corporation 45 Retention policy • There are two different types of backup retention you can set backups: pgbackrest: global: repo1-retention-full: "30" repo1-retention-full-type: count # Default backups: pgbackrest: global: repo1-retention-full: "30" repo1-retention-full-type: time Retain 30 full backups Retain full backups for 30 days
© 2023 NTT DATA Group Corporation 46 Manual backups • Configure in the manifest • Execute kubectl annotate command to trigger manual backup backups: pgbackrest: manual: repoName: repo2 options : - --type=full $ kubectl annotate -n postgres-operator postgrescluster hippo ¥ postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
© 2023 NTT DATA Group Corporation 47 Manual backups DEMO
© 2023 NTT DATA Group Corporation 48 Restore • Clone a PostgreSQL cluster • Perform a point-in-time-recovery (PITR) • Perform an in-place point-in-time-recovery (PITR) • Restore individual databases
© 2023 NTT DATA Group Corporation 49 05 Disaster Recovery
© 2023 NTT DATA Group Corporation 50 Standby Cluster • Consists only of standbys on another Kubernetes cluster • Sync methods with Primary are repo-based and/or streaming Primary Standby Standby Standby AWS, Azure, GCP Kubernetes cluster #1 Kubernetes cluster #2 repo-based streaming spec: standby: enabled: true
© 2023 NTT DATA Group Corporation 51 Standby Cluster • Consists only of standbys on another Kubernetes cluster • Sync methods with Primary are repo-based and/or streaming Primary Standby Standby Primary Standby AWS, Azure, GCP Kubernetes cluster #1 Kubernetes cluster #2 repo-based streaming spec: standby: enabled: false
© 2023 NTT DATA Group Corporation 52 06 Encountered Challenges
© 2023 NTT DATA Group Corporation 53 Missing WAL Archive • Missing WAL because PGO always set archive_mode=on • cannot set archive_mode=always 01 Primary Standby WAL archive 02 03 04 05 06 07 06 07 WALs waiting for transfer Replication 06 07
© 2023 NTT DATA Group Corporation 54 Missing WAL Archive • Missing WAL because PGO set archive_mode=on • cannot set archive_mode=always 01 Primary New primary WAL archive 02 03 04 05 06 07 08 Failover 06 07 Missing WAL 08 Resume transfer from WAL 08
© 2023 NTT DATA Group Corporation 55 Forced to Use pg_rewind • Patroni has a parameter use_pg_rewind • Control how to recover standby after failover • PGO forces use_pg_rewind to on • Sometimes pg_rewind fails and standby cannot be recovered • Manual handling is required
© 2023 NTT DATA Group Corporation 56 07 Summary
© 2023 NTT DATA Group Corporation 57 Summary • Introduced what PostgreSQL Operator is • Explained how to realize high availability with PGO • Introduced encountered challenges
© 2023 NTT DATA Group Corporation 58 References • https://access.crunchydata.com/documentation/postgres-operator/latest • https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/high-availability • https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/backups • https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery • https://github.com/CrunchyData/postgres-operator • https://github.com/CrunchyData/postgres-operator-examples • https://patroni.readthedocs.io/en/latest/ • https://pgbackrest.org/ • https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/ • https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
All other company or product names mentioned here in are trademarks or registered trademarks of their respective owners.

PostgreSQL on Kubernetes: Realizing High Availability with PGO (Postgres Ibiza 2023)

  • 1.
    © 2023 NTTDATA Group Corporation © 2023 NTT DATA Group Corporation PostgreSQL on Kubernetes: Realizing High Availability with PGO Aug. 30th, 2023 at Postgres Ibiza 2023 Shinya Kato NTT DATA Group Corporation
  • 2.
    © 2023 NTTDATA Group Corporation 2 About me • Shinya Kato @ShinyaKato_ • NTT DATA Group Corporation • Jobs • Research & development about PostgreSQL • PostgreSQL support • Sometimes development in PostgreSQL community
  • 3.
    © 2023 NTTDATA Group Corporation 3 About this talk • Presentation slide is available on SlideShare • https://www.slideshare.net/nttdata-tech • Presentation slide was prepared on Aug. 10th, 2023 • May not be up-to-date
  • 4.
    © 2023 NTTDATA Group Corporation 4 Topics PostgreSQL Operator Basic features of PGO Configuration about high availability Backup/Restore Disaster recovery Encountered challenges Summary
  • 5.
    © 2023 NTTDATA Group Corporation 5 01 PostgreSQL Operator
  • 6.
    © 2023 NTTDATA Group Corporation 6 Tasks of DBA Setting up HA cluster Monitoring metrics and logs Managing backups
  • 7.
    © 2023 NTTDATA Group Corporation 7 What is PostgreSQL Operator? • Such a PostgreSQL environment can be easily realized • Features vary depending on PostgreSQL Operators Backup Metrics Logging PostgreSQL HA PostgreSQL PostgreSQL HA Setup/ Monitoring User
  • 8.
    © 2023 NTTDATA Group Corporation 8 Popular PostgreSQL Operators PGO postgres- operator CloudNative PG StackGres Main development company Crunchy Data Zalando EnterpriseDB OnGres Latest version 5.4.1 1.10.0 1.20.2 1.5.0 License Apache 2.0 MIT Apache 2.0 AGPLv3 Stars 3,372 3,469 1,650 636 (GitHub) 90 (GitLab) Implementation Go Go Go Java
  • 9.
    © 2023 NTTDATA Group Corporation 9 02 Basic features of PGO
  • 10.
    © 2023 NTTDATA Group Corporation 10 PGO • Developed and provided as OSS by Crunchy Data • Several PostgreSQL core developers belong • One of the most popular PostgreSQL Operator • 3,372 stars on GitHub • Supports PostgreSQL 11~15
  • 11.
    © 2023 NTTDATA Group Corporation 11 Architecture • PGO consists of multiple software Client PgBouncer PVC AWS GCP Azure Connection pooling Monitoring postgres- exporter Patroni pgBackRest postgres- exporter Patroni pgBackRest postgres- exporter Patroni pgBackRest Backup HA
  • 12.
    © 2023 NTTDATA Group Corporation 12 Architecture • PGO consists of multiple software Client PgBouncer PVC AWS GCP Azure Connection pooling Monitoring postgres- exporter Patroni pgBackRest postgres- exporter Patroni pgBackRest postgres- exporter Patroni pgBackRest Backup HA
  • 13.
    © 2023 NTTDATA Group Corporation 13 Version • Version used in this talk • Kubernetes 1.26 • PGO 5.4.0
  • 14.
    © 2023 NTTDATA Group Corporation 14 Install PGO • Use examples from Cruchy Data GitHub • https://github.com/CrunchyData/postgres-operator-examples $ git clone https://github.com/CrunchyData/postgres-operator-examples.git $ cd postgres-operator-examples $ git checkout 9a3b808 $ kubectl apply -k kustomize/install/namespace $ kubectl apply --server-side -k kustomize/install/default $ kubectl get pods -n postgres-operator NAME READY STATUS RESTARTS AGE pgo-7574d677f7-jbpl5 1/1 Running 0 15s PGO 5.4.0 commit hash
  • 15.
    © 2023 NTTDATA Group Corporation 15 Create PostgreSQL Cluster • Use examples from Cruchy Data GitHub $ kubectl apply --server-side -k kustomize/postgres $ kubectl get postgrescluster -n postgres-operator NAME AGE hippo 71s $ kubectl get pods -n postgres-operator -l postgres- operator.crunchydata.com/instance-set=instance1 NAME READY STATUS RESTARTS AGE hippo-instance1-pdcg-0 4/4 Running 0 75s PostgreSQL cluster named “hippo” is created
  • 16.
    © 2023 NTTDATA Group Corporation 16 Change the Number of Replicas • Set the number of replicas in the manifest spec: image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8- 15.3-2 postgresVersion: 15 instances: - name: instance1 replicas: 3 dataVolumeClaimSpec:
  • 17.
    © 2023 NTTDATA Group Corporation 17 Change the Number of Replicas • Apply the manifest $ kubectl apply --server-side -k kustomize/postgres $ kubectl get pods -n postgres-operator -l postgres- operator.crunchydata.com/instance-set=instance1 NAME READY STATUS RESTARTS AGE hippo-instance1-pdcg-0 4/4 Running 0 80s hippo-instance1-6rqk-0 4/4 Running 0 18s hippo-instance1-l5sk-0 4/4 Running 0 18s 3 PostgreSQL Pods started
  • 18.
    © 2023 NTTDATA Group Corporation 18 Failover • Automatically restarts a Pod when it goes down DEMO
  • 19.
    © 2023 NTTDATA Group Corporation 19 Rolling Update • Change to new image tag and apply DEMO
  • 20.
    © 2023 NTTDATA Group Corporation 20 03 Configuration about high availability
  • 21.
    © 2023 NTTDATA Group Corporation 21 Patroni • Software for high availability PostgreSQL DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby
  • 22.
    © 2023 NTTDATA Group Corporation 22 Patroni • Software for high availability PostgreSQL DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby TABLE pg_stat_replication SELECT pg_current_wal_lsn() SELECT pg_last_wal_replay_lsn() SHOW synchronous_commit SELECT pg_stat_get_wal_receiver() SELECT pg_last_wal_receive_lsn() SELECT pg_is_wal_replay_paused()
  • 23.
    © 2023 NTTDATA Group Corporation 23 Patroni • Software for high availability PostgreSQL DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby Stored - Leader - Members - Config
  • 24.
    © 2023 NTTDATA Group Corporation 24 Patroni Configuration • PGO uses Patroni to realize high availability • Patroni configurations are able to set in the manifest • synchronous_mode • loop_wait • ttl • master_start_timeout • etc.
  • 25.
    © 2023 NTTDATA Group Corporation 25 synchronous_mode • In default, Patroni uses async replication. • If synchronous_mode is on, one node is sync replication 2 node: Primary and sync standby 3 node: Primary, sync standby, and async standby Primary Sync standby Primary Sync standby Async standby Streaming Replication Streaming Replication
  • 26.
    © 2023 NTTDATA Group Corporation 26 loop_wait and ttl • loop_wait:The interval between status and DCS checks • ttl: The TTL to acquire the leader lock DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby Default is to check every 10 seconds. By default, the leader lock expires in 30 seconds.
  • 27.
    © 2023 NTTDATA Group Corporation 27 loop_wait and ttl • loop_wait:The interval between status and DCS checks • ttl: The TTL to acquire the leader lock DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby crash Leader is not updated If the leader hasn’t been updated after the TTL time has passed, the leader lock has been expired
  • 28.
    © 2023 NTTDATA Group Corporation 28 loop_wait and ttl • loop_wait:The interval between status and DCS checks • ttl: The TTL to acquire the leader lock DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby crash New leader has been elected
  • 29.
    © 2023 NTTDATA Group Corporation 29 master_start_timeout • The time allowed for primary recovery before failover DCS DCS DCS Get status GET /leader Primary Get status Standby Get status Standby Default is to wait 300 seconds for recovery. In the worst case, failover takes master_start_timeout + 2 * loop_wait seconds
  • 30.
    © 2023 NTTDATA Group Corporation 30 Setting in the manifest • My settings spec: patroni: dynamicConfiguration: synchronous_mode: true # Default is false master_start_timeout: 0 # Default is 300 leaderLeaseDurationSeconds: 30 # Default value syncPeriodSeconds: 10 # Default value Note that in PGO, loop_wait and ttl are converted to syncPeriodSeconds and leaderLeaseDurationSeconds, respectively.
  • 31.
    © 2023 NTTDATA Group Corporation 31 Kubernetes Configuration • Distribute PostgreSQL Pods to the appropriate worker nodes using two functions • Node Affinity • Topology Spread Constraints
  • 32.
    © 2023 NTTDATA Group Corporation 32 Node Affinity • Assign PostgreSQL Pods to specific worker nodes • High-performance SSD • High-speed network Worker for PostgreSQL Worker for application servers Assign Pods
  • 33.
    © 2023 NTTDATA Group Corporation 33 Node Affinity • Labeling worker nodes $ kubectl label nodes kind-worker diskType=ssd $ kubectl label nodes kind-worker2 diskType=ssd $ kubectl label nodes kind-worker3 diskType=ssd $ kubectl label nodes kind-worker4 diskType=hdd $ kubectl get nodes --show-labels NAME STATUS LABELS kind-control-plane Ready ... kind-worker Ready diskType=ssd,... kind-worker2 Ready diskType=ssd,... kind-worker3 Ready diskType=ssd,... kind-worker4 Ready diskType=hdd,... diskType=ssd diskType=ssd diskType=ssd diskType=hdd Worker 1 Worker 2 Worker 3 Worker 4
  • 34.
    © 2023 NTTDATA Group Corporation 34 Node Affinity • Configure in the manifest spec: instances: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: diskType operator: In values: - ssd Assign to a worker with diskType=ssd
  • 35.
    © 2023 NTTDATA Group Corporation 35 Node Affinity • Apply the manifest $ kubectl apply --server-side -k kustomize/postgres $ kubectl get pods -n postgres-operator -l postgres- operator.crunchydata.com/instance-set=instance1 -o wide NAME READY STATUS NODE hippo-instance1-j4rd-0 4/4 Running kind-worker hippo-instance1-jqz5-0 4/4 Running kind-worker hippo-instance1-pslx-0 4/4 Running kind-worker2 Not assigned to worker 4
  • 36.
    © 2023 NTTDATA Group Corporation 36 Topology Spread Constraints • Pods can be distributed by region/zone/node • Ensure availability in region/zone/node failure Not distributed AvailabilityZone A Distributed AvailabilityZone A AvailabilityZone B Running in AvailabilityZone B
  • 37.
    © 2023 NTTDATA Group Corporation 37 Topology Spread Constraints • Labeling nodes $ kubectl label nodes kind-worker availabilityZone=A $ kubectl label nodes kind-worker2 availabilityZone=B $ kubectl label nodes kind-worker3 availabilityZone=C $ kubectl get nodes --show-labels NAME STATUS LABELS kind-control-plane Ready ... kind-worker Ready availabilityZone=A,... kind-worker2 Ready availabilityZone=B,... kind-worker3 Ready availabilityZone=C,... availabilityZone=A Worker 1 Worker 2 availabilityZone=B Worker 3 availabilityZone=C
  • 38.
    © 2023 NTTDATA Group Corporation 38 • Configure in the manifest spec: instances: topologySpreadConstraints: - maxSkew: 1 topologyKey: availabilityZone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: postgres-operator.crunchydata.com/instance-set: instance postgres-operator.crunchydata.com/cluster: hippo Topology Spread Constraints Distributed in availabilityZone Not assigned if not satisfiable Specify the label of pods to be assigned
  • 39.
    © 2023 NTTDATA Group Corporation 39 Topology Spread Constraints • Apply the manifest $ kubectl apply --server-side -k kustomize/postgres $ kubectl get pods -n postgres-operator -l postgres- operator.crunchydata.com/instance-set=instance1 -o wide NAME READY STATUS NODE hippo-instance1-bzkw-0 4/4 Running kind-worker3 hippo-instance1-cmkl-0 4/4 Running kind-worker hippo-instance1-m82d-0 4/4 Running kind-worker2 Distributed to worker 1, 2, and 3
  • 40.
    © 2023 NTTDATA Group Corporation 40 04 Backup/Restore
  • 41.
    © 2023 NTTDATA Group Corporation 41 pgBackRest • Multi-functional backup tool for PostgreSQL pgBackRest Backup Server pgBackRest Backups WAL archives Backups WAL archives Backup PostgreSQL #1 pgBackRest PostgreSQL #2 PostgreSQL #1 PostgreSQL #2
  • 42.
    © 2023 NTTDATA Group Corporation 42 pgBackRest Configuration backups: pgbackrest: image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.45-2 repos: - name: repo1 schedules: full: "0 15 * * *" s3: bucket: pgo-auto-backups endpoint: s3.ap-northeast-1.amazonaws.com region: ap-northeast-1 - name: repo2 gcs: bucket: pgo-manual-backups configuration: - secret: name: pgo-s3-creds - secret: name: pgo-gcs-creds global: repo1-retention-full: "30" manual: repoName: repo2 Multiple repositories Retention policy Manual backups Backup type/ Backup scheduling Secret to store authentication for s3 and gcs
  • 43.
    © 2023 NTTDATA Group Corporation 43 Multiple Repositories • Stored on several (up to 4) different storages pgBackRest Amazon Simple Storage Service (Amazon S3) s ts ob pgBackRest p pgBackRest Start up manually or at scheduled times ① ②
  • 44.
    © 2023 NTTDATA Group Corporation 44 Backup type/Backup scheduling • 3 backup types • full • differential • incremental • Backup scheduling • cron-formatted string backups: pgbackrest: repos: - name: repo1 schedules: full: "0 1 * * 0" differential:"0 1 * * 1-6" Weekly full backup at 1am on Sundays. Daily differential backups at 1am, except Sundays.
  • 45.
    © 2023 NTTDATA Group Corporation 45 Retention policy • There are two different types of backup retention you can set backups: pgbackrest: global: repo1-retention-full: "30" repo1-retention-full-type: count # Default backups: pgbackrest: global: repo1-retention-full: "30" repo1-retention-full-type: time Retain 30 full backups Retain full backups for 30 days
  • 46.
    © 2023 NTTDATA Group Corporation 46 Manual backups • Configure in the manifest • Execute kubectl annotate command to trigger manual backup backups: pgbackrest: manual: repoName: repo2 options : - --type=full $ kubectl annotate -n postgres-operator postgrescluster hippo ¥ postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
  • 47.
    © 2023 NTTDATA Group Corporation 47 Manual backups DEMO
  • 48.
    © 2023 NTTDATA Group Corporation 48 Restore • Clone a PostgreSQL cluster • Perform a point-in-time-recovery (PITR) • Perform an in-place point-in-time-recovery (PITR) • Restore individual databases
  • 49.
    © 2023 NTTDATA Group Corporation 49 05 Disaster Recovery
  • 50.
    © 2023 NTTDATA Group Corporation 50 Standby Cluster • Consists only of standbys on another Kubernetes cluster • Sync methods with Primary are repo-based and/or streaming Primary Standby Standby Standby AWS, Azure, GCP Kubernetes cluster #1 Kubernetes cluster #2 repo-based streaming spec: standby: enabled: true
  • 51.
    © 2023 NTTDATA Group Corporation 51 Standby Cluster • Consists only of standbys on another Kubernetes cluster • Sync methods with Primary are repo-based and/or streaming Primary Standby Standby Primary Standby AWS, Azure, GCP Kubernetes cluster #1 Kubernetes cluster #2 repo-based streaming spec: standby: enabled: false
  • 52.
    © 2023 NTTDATA Group Corporation 52 06 Encountered Challenges
  • 53.
    © 2023 NTTDATA Group Corporation 53 Missing WAL Archive • Missing WAL because PGO always set archive_mode=on • cannot set archive_mode=always 01 Primary Standby WAL archive 02 03 04 05 06 07 06 07 WALs waiting for transfer Replication 06 07
  • 54.
    © 2023 NTTDATA Group Corporation 54 Missing WAL Archive • Missing WAL because PGO set archive_mode=on • cannot set archive_mode=always 01 Primary New primary WAL archive 02 03 04 05 06 07 08 Failover 06 07 Missing WAL 08 Resume transfer from WAL 08
  • 55.
    © 2023 NTTDATA Group Corporation 55 Forced to Use pg_rewind • Patroni has a parameter use_pg_rewind • Control how to recover standby after failover • PGO forces use_pg_rewind to on • Sometimes pg_rewind fails and standby cannot be recovered • Manual handling is required
  • 56.
    © 2023 NTTDATA Group Corporation 56 07 Summary
  • 57.
    © 2023 NTTDATA Group Corporation 57 Summary • Introduced what PostgreSQL Operator is • Explained how to realize high availability with PGO • Introduced encountered challenges
  • 58.
    © 2023 NTTDATA Group Corporation 58 References • https://access.crunchydata.com/documentation/postgres-operator/latest • https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/high-availability • https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/backups • https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery • https://github.com/CrunchyData/postgres-operator • https://github.com/CrunchyData/postgres-operator-examples • https://patroni.readthedocs.io/en/latest/ • https://pgbackrest.org/ • https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/ • https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
  • 59.
    All other companyor product names mentioned here in are trademarks or registered trademarks of their respective owners.