Defination from the official page states that kured is a Kubernetes daemonset that performs safe automatic node reboots when the need to do so is indicated by the package management system of the underlying OS.
By periodically rebooting nodes, Kured ensures that any pending updates or configuration changes take effect, resulting in a more efficient and reliable cluster.
Here are the key points to note:
Kured monitors the operating system for security patches, kernel updates, and system-level changes in Kubernetes nodes. It proactively identifies the need for reboots to keep the cluster secure and up-to-date.
When a reboot is required, Kured gracefully cordons the node, marking it as unschedulable for new pods without disrupting existing ones. It then proceeds to drain the node, evicting existing pods in a controlled manner to ensure a smooth reboot process.
Kured includes built-in safety mechanisms to prevent unnecessary reboots and allows users to define maintenance windows for avoiding disruptions.
The continuous monitoring by Kured ensures that the Kubernetes cluster operates with the latest updates, enhancing performance, security, and stability.
Organizations can leverage Kubernetes clusters more effectively while minimizing risks associated with outdated software and configurations.
Setting up Kured is a straightforward process that involves deploying it as a DaemonSet in the Kubernetes cluster. This deployment strategy ensures that Kured runs on every node within the cluster, effectively monitoring and managing the rebooting process for each individual node.
Here is how you can do that
# ClusterRole for kured apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kured rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "patch"] - apiGroups: [""] resources: ["pods"] verbs: ["list", "delete", "get"] - apiGroups: ["apps"] resources: ["daemonsets"] verbs: ["get"] - apiGroups: [""] resources: ["pods/eviction"] verbs: ["create"] # ClusterRoleBinding for kured apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kured roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kured subjects: - kind: ServiceAccount name: kured namespace: kube-system # Role for kured in kube-system namespace apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: kube-system name: kured rules: - apiGroups: ["apps"] resources: ["daemonsets"] resourceNames: ["kured"] verbs: ["update"] # RoleBinding for kured in kube-system namespace apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: namespace: kube-system name: kured subjects: - kind: ServiceAccount namespace: kube-system name: kured roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: kured # ServiceAccount for kured apiVersion: v1 kind: ServiceAccount metadata: name: kured namespace: kube-system # DaemonSet for kured apiVersion: apps/v1 kind: DaemonSet metadata: name: kured namespace: kube-system spec: selector: matchLabels: name: kured updateStrategy: type: RollingUpdate template: metadata: labels: name: kured spec: serviceAccountName: kured tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule - key: node-role.kubernetes.io/control-plane effect: NoSchedule - key: "node-role.kubernetes.io/mysql" operator: "Equal" effect: "NoSchedule" hostPID: true restartPolicy: Always containers: - name: kured image: ghcr.io/kubereboot/kured:{{ kured_version }} imagePullPolicy: IfNotPresent securityContext: privileged: true env: - name: KURED_NODE_ID valueFrom: fieldRef: fieldPath: spec.nodeName command: - /usr/bin/kured - --reboot-days=mon,tue,wed,thu - --reboot-delay=90s - --start-time=3am - --end-time=5am - --time-zone=UTC - --prometheus-url={{ prometheus_url }} - --alert-filter-regexp=^Watchdog$ - --period=15m
Explanation:
This part allows you to define maintenance windows for avoiding disruptions.
command: - /usr/bin/kured - --reboot-days=mon,tue,wed,thu - --reboot-delay=90s - --start-time=3am - --end-time=5am - --time-zone=UTC - --prometheus-url={{ prometheus_url }} - --alert-filter-regexp=^Watchdog$ - --period=15m
Top comments (1)
Welcome here, and thank you for sharing !
This means that nodes are constantly updating ? Is this a default behavior ? And what about public Cloud providers ?
Side note : when editing an article you can format code plus give a language (for ex yaml), you would have syntax color π