Recommended configurations for workloads in ACK clusters - Container Service for Kubernetes

When configuring workloads (Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs) in a Container Service for Kubernetes (ACK) cluster, you must consider multiple factors to ensure that applications run stably and reliably.

Claim requests and limits for each pod

In Kubernetes clusters, scheduling too many pods on a node can lead to high load, preventing normal service provision. When configuring pods, you should declare their required requests and limits, so the cluster can find suitable nodes based on resource requirements when deploying pods.

In the example below, the Nginx Pod's resource configuration is:

CPU request: 1 core. Memory request: 1024 MiB
CPU limit: 2 cores. Memory limit: 4096 MiB

apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx resources: # Resource claim requests: memory: "1024Mi" cpu: "1000m" limits: memory: "4096Mi" cpu: "2000m"

Kubernetes uses a static resource scheduling mechanism. The formula for calculating remaining resources on each node is Remaining resources = Total resources on the node - Allocated resources. When you manually run resource-intensive programs, Kubernetes cannot detect their actual resource usage. Therefore, the formula calculates based on allocated resources, not actual used resources.

Additionally, all pods should declare resources. If a pod does not declare resources, after being scheduled to a node, Kubernetes will not reserve resources for it. This may lead to too many pods being scheduled on the node, causing resource contention issues.

You can use the resource profiling feature provided by ACK to obtain container-level resource specification recommendations based on historical resource usage data, simplifying the complexity of configuring container requests and limits.

Wait until dependencies are ready instead of terminating an application during startup

Some applications may have external dependencies, such as needing to read data from a database (DB) or relying on interfaces from another service. When an application starts, external dependencies might not be fully satisfied. In traditional manual operations, the approach is typically "exit if dependencies are not met" (known as failfast). However, in Kubernetes, most operations are automated without human intervention. For example, the system automatically selects nodes and starts applications during deployment, automatically restarts applications when they fail, and can automatically scale through HPA (Horizontal Pod Autoscaler) when load increases.

Imagine two applications A and B, where A depends on B, and they run on the same node. If the node restarts for some reason, and after restart A starts first while B has not yet started, A's dependencies cannot be satisfied. If A exits directly following the traditional approach, even after B starts, A will not automatically recover without manual intervention.

In Kubernetes clusters, you should check dependencies during startup. If dependencies are not met, you can implement polling and waiting instead of exiting directly. This functionality can be implemented through Init Container.

Configure restart policy

During pod operation, process termination is a common phenomenon. Code defects or excessive memory usage can cause application processes to exit, leading to pod termination. You can configure restartPolicy for pods to ensure they automatically restart after exiting.`

apiVersion: v1 kind: Pod metadata: name: nginx-test spec: restartPolicy: OnFailure containers: - name: nginx image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6

The available values for restartPolicy are:

Always: automatically restarts the pod in all cases.
OnFailure: automatically restarts the pod upon failures (the state of the closed process is not 0).
Never: never restarts the pod.

Configure liveness probes and readiness probes

A pod being in the Running state does not necessarily mean it can provide services normally. For pods in the Running state, internal processes might deadlock, preventing normal service provision. However, because the pod remains in the Running state, Kubernetes will not automatically restart it.

Therefore, you should configure probes for all pods:

Liveness Probe is used to detect whether a pod is truly alive and capable of providing services. When a Liveness Probe detects problems, Kubernetes automatically restarts the pod.
Readiness Probe is used to detect whether a pod is ready to provide external services. During application startup, the initialization process may take time, during which the pod cannot provide external services. Readiness Probe can inform Ingress or Service whether traffic can be forwarded to the pod. When problems occur with the pod, it prevents new traffic from being forwarded to that pod.

apiVersion: v1 kind: Pod metadata: name: tomcat spec: containers: - name: tomcat image: tomcat livenessProbe: httpGet: path: /index.jsp port: 8080 initialDelaySeconds: 3 periodSeconds: 3 readinessProbe: httpGet: path: /index.jsp port: 8080

Run only one process in each container

Some developers are accustomed to using containers as virtual machines (VMs) and running multiple processes in a single container, such as monitoring processes, logging processes, sshd processes, or even the entire Systemd. This practice causes the following issues:

Determining the overall resource usage of a pod becomes more complex, making it difficult to properly configure Requests and Limits.
If a container runs only a single process, when that process is interrupted, the external container engine can immediately detect it and restart the container. If a container runs multiple processes, even if one process crashes, the external container engine cannot detect it, potentially causing the container to malfunction.

Kubernetes supports collaborative work between multiple processes. For example, Nginx and PHP-FPM can communicate through Unix Domain Socket. You can create a pod containing 2 containers and store the Unix Domain Socket in a shared Volume between them.

Eliminate single points of failure (SPOF)

If an application has only one instance, when that instance fails, although Kubernetes can automatically restart the instance, there will inevitably be a brief service interruption during this period. Similar service interruptions may occur even when updating applications or releasing new versions.

In Kubernetes, you should avoid directly managing pods. Instead, manage them through Deployments or StatefulSets, and ensure that applications run at least two or more pod instances. This practice effectively improves system high availability and prevents service interruptions caused by single instance failures.

References

For more information about performing canary releases and blue-green releases for applications in ACK, see Deployment and release.
For more information about the best practices for application management, see Best practices for workload management.
If pods are abnormal, first refer to Pod troubleshooting and FAQ about workloads for self-troubleshooting.