DEV Community

Cheedge Lee
Cheedge Lee

Posted on • Edited on • Originally published at notes-renovation.hashnode.dev

Etcd Backup and Restore (2)

1. Backup

This pare can refer to my last post, here.

2. Restore

To follow the official procedure[1]:
"If any API servers are running in your cluster, you should not attempt to restore instances of etcd."
Therefore, for restoring an etcd backup, where we need to stop all API server instances, restore the etcd state, then restart the API servers:

  1. stop all API server instances
  2. restore state in all etcd instances
  3. restart all API server instances

2.1 Stop all API server instances

  1. check the api server
# check the api server $ k get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-94fb6bc47-wr56s 1/1 Running 5 (14m ago) 21d canal-cgrhr 2/2 Running 2 (60m ago) 21d canal-jb5rr 2/2 Running 2 (60m ago) 21d coredns-57888bfdc7-895dj 1/1 Running 1 (60m ago) 21d coredns-57888bfdc7-9rjt5 1/1 Running 1 (60m ago) 21d etcd-controlplane 1/1 Running 2 (60m ago) 21d kube-apiserver-controlplane 1/1 Running 0 21d kube-controller-manager-controlplane 1/1 Running 3 (17m ago) 21d kube-proxy-5xtp7 1/1 Running 1 (60m ago) 21d kube-proxy-bt2pv 1/1 Running 2 (60m ago) 21d kube-scheduler-controlplane 1/1 Running 3 (17m ago) 21d 
Enter fullscreen mode Exit fullscreen mode

we can see kube-apiserver-controlplane is exist, so we need to stop this first.

  1. Move the kube-apiserver manifest file to a temporary location to stop the API server:
sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/ 
Enter fullscreen mode Exit fullscreen mode

To be notice, this will be essentially to plan for the temporary loss of kubectl functionality.

# temporary loss of kubectl functionality. controlplane $ k get pods -n kube-system The connection to the server 172.30.1.2:6443 was refused - did you specify the right host or port? 
Enter fullscreen mode Exit fullscreen mode

However, as we see above, stopping the API server makes kubectl unusable, because kubectl communicates with the API server. Once the API server is stopped, kubectl commands cannot be executed since the API server is no longer available to handle requests.

But because the kubelet watches the /etc/kubernetes/manifests directory for static pod definitions and removes the pod when the manifest file is removed or moved, therefore moving the kube-apiserver.yaml manifest to a temporary location can stop the API server in a kubeadm-based Kubernetes cluster.

But we can use crictl to Interact with Containers
If the cluster uses containerd, we can use crictl to interact with the containers:

crictl ps | grep kube-apiserver crictl stop <container-id> crictl rm <container-id> 
Enter fullscreen mode Exit fullscreen mode

2.2 Restore state in all etcd instances

sudo ETCDCTL_API=3 etcdctl snapshot restore /path/to/snapshot.db \ --data-dir=/var/lib/etcd_restore 
Enter fullscreen mode Exit fullscreen mode

and then edit the /etc/kubernetes/manifests/etcd.yaml

 volumes: - hostPath: path: /etc/kubernetes/pki/etcd type: DirectoryOrCreate name: etcd-certs - hostPath: path: /var/lib/etcd_restore # change here to etcd path in host machine type: DirectoryOrCreate name: etcd-data 
Enter fullscreen mode Exit fullscreen mode

Here maybe confused, but we will talk it later. (here)

2.3 Restart all API server instances

sudo mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/ 
Enter fullscreen mode Exit fullscreen mode

Verify Cluster Health:
Use kubectl to check the state of the cluster after the API server is back up.

# check API server running kubectl get pods -n kube-system # check API server health kubectl get --raw /healthz 
Enter fullscreen mode Exit fullscreen mode

Reference

Restoring an etcd cluster

Top comments (0)