DEV Community

Shubham
Shubham

Posted on

Kubernetes Node Management - Drain, Cordon and Uncordon

Most Kubernetes engineers don’t start their day expecting to drain a node, but they often end up doing just that. Managing node availability becomes routine work that directly affects workload stability and uptime. You’ll often use drain, cordon, and uncordon when:

  • Scaling a cluster
  • Preparing for a node upgrade
  • Patching OS level vulnerabilities
  • Replacing underlying infrastructure
  • Investigating issues on a specific node

Let’s go through how these actually behave, with visual examples.

1. DRAIN

The kubectl drain command is used when you want to safely evict all running pods from a node and prevent new ones from being scheduled on it. This is typically used during node maintenance, upgrades, or when preparing to decommission a node.

When you run kubectl drain node2, Kubernetes performs two actions:

  • It marks the node as unschedulable (SchedulingDisabled).
  • It evicts all non-daemonset pods from the node.

Before Drain: All three nodes are healthy and ready to accept pods. node2 is running Pod C and Pod D.
Once kubectl drain node2 is executed:

  • Pod C is moved to node1.
  • Pod D is moved to node3.
  • node2 is marked as SchedulingDisabled so no new pods are placed there.

Things I learnt after burn out:

  • Use --ignore-daemonsets or the command fails if daemonset pods are present.
  • Pods using emptyDir lose all data when evicted, even if they come back quickly.
  • If a PodDisruptionBudget is set, drain can block until it’s safe to evict.
  • Hanging drains are usually due to finalizers or stuck shutdown hooks. Use --force only if you understand the risk.
  • Drain marks the node as unschedulable. You must run uncordon manually to bring it back.
  • Draining nodes with system pods and no tolerations can silently break networking or DNS.

2. Cordon

The kubectl cordon command is used when you want to stop new pods from being scheduled on a node, but keep existing pods running. This is often done before maintenance, scaling operations, or selective upgrades where you don’t want to disrupt workloads immediately.

Top comments (0)