0

As we know in a Redis Cluster, data is divided into shards, with each shard being managed by a master node and one or more replica nodes. The problem I want to tackle is the case where master and replica should never be taken down together, as I will be losing the keys stored there and decrease the overall availability of the system. I have already set podAntiAffinity so to prefer to schedule pods on different nodes, so now I want to make sure that nodes that hold a "pair" of master-slave shard are never taken down altogether, there is the PodDistributionBudget option and setting it to 1 for maxUnavailable, or some preStopHook magic where we block if one of the paired nodes is not available, but they do not feel natural to me, so I wonder if there is like a concept for failure-domains or upgrade-domains that I am missing on?

1 Answer 1

0

The configurations of Pod Topology Spread Constraints, PodDisruptionBudget, podAntiAffinity, and preStop hooks are used together to build high availability applications (not just Redis) on kubernetes.

Their functionalities:

  • Use Pod Topology Spread Constraints (PTSC) to distribute pods across failure domains (e.g., nodes or zones).
  • Configure a PodDisruptionBudget (PDB) to ensure a minimum number of pods remain available.
  • Use podAntiAffinity (PAA) to prevent master and replica pods from being scheduled on the same node.
  • Use preStop hooks to delay termination until the counterpart pod is available.

Relations to fault-domain or upgrade-domain:

  • PTSC:A failure domain is a scope where a failure (e.g., a node crash, zone outage) affects all resources within it. By spreading pods across failure domains, you reduce the blast radius of a failure.
  • PDB: Without PDB, a node drain (e.g., for an upgrade) could evict all pods of an app, causing downtime. PDB ensures enough replicas stay running.
  • PAA: It’s a stricter way to enforce separation compared to PTSC, which allow some skew.
  • Prestop hook: It ensures the app exits cleanly, reducing the chance of data corruption or client errors during pod termination.

Kubernetes manifest examples:

PodDisruptionBudget configuration that ensures at least two redis pods available during disruption (e.g. upgrade process)

apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: redis-pdb spec: minAvailable: 2 selector: matchLabels: app: redis 

Deployment manifest that (1) spreads pods evenly across zones using PTSC, (2) Redis master and replica pods are scheduled on different nodes using PAA, and (3) a graceful shutdown before terminating a pod using preStop hook.

apiVersion: apps/v1 kind: Deployment metadata: name: redis labels: app: redis spec: replicas: 3 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: redis affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: redis topologyKey: kubernetes.io/hostname containers: - name: redis image: redis:6.2 ports: - containerPort: 6379 lifecycle: preStop: exec: command: - /bin/sh - -c - | echo "Waiting for replicas to sync before shutdown..." # Example preStop logic, e.g., ensuring no data loss sleep 10 

Reference: https://kubernetes.io/docs/concepts

1
  • 1
    While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review Commented Apr 2 at 16:51

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.