How to create an autoscaling Kubernetes deployment that uses Keda for the horizontal scaling of the Spring Boot container based on the AngularPortfolioMgr project. The article extends the deployment setup that is shown in this article.
Deployment architecture

Keda will scale the WebApp horizontally up and down according to the metrics that are used. The metrics are Cpu usage in percent and the number of active http(s) requests. The vertical scaling is provided by Kubernetes itself, with the resource requests and limits for the containers. Kubernetes will try to provide the resources based on the available hardware.
The deployment of Zookeeper, Kafka, Postgresql and the WebApp is described in this article. Here the horizontal and vertical scaling of the WebApp is shown and how Keda can be configured to use metrics provided by Spring Actuator with Micrometer.
Kubernetes resources
The Helm Chart for the deployment can be found in the angularportfoliomgr directory. The template can be found in kubTemplate.yaml:
apiVersion: apps/v1 kind: Deployment metadata: name: {{ .Values.webAppName }} labels: app: {{ .Values.webAppName }} spec: selector: matchLabels: app: {{ .Values.webAppName }} template: metadata: labels: app: {{ .Values.webAppName }} spec: terminationGracePeriodSeconds: 15 containers: - name: {{ .Values.webAppName }} image: "{{ .Values.webImageName }}:{{ .Values.webImageVersion }}" imagePullPolicy: Always resources: limits: memory: "786M" cpu: "1.0" requests: memory: "512M" cpu: "0.1" env: {{- include "helpers.list-envApp-variables" . | indent 10 }} ports: - containerPort: 8080 livenessProbe: httpGet: path: "/actuator/health/livenessState" port: 8080 initialDelaySeconds: 5 periodSeconds: 5 startupProbe: httpGet: path: "/actuator/health/readinessState" port: 8080 failureThreshold: 60 periodSeconds: 5 The containers section has the requested cpu and memory resources that Kubernetes tries to provide on deployment. The section limits cpu and memory resources provide the limits to how much Kubernetes will scale up the resources due to load. The ‘terminationGracePeriodSeconds’ gives time to the pod to finish open requests on termination.
Implementing the shutdown hook for long tasks
A task that takes longer to finish than the ‘terminationGracePeriodSeconds’ needs to implement the shutdown hook like the FileClientBean:
public Boolean importZipFile(String filename) { this.importDone = false; Thread shutDownThread = createShutDownThread(); Runtime.getRuntime().addShutdownHook(shutDownThread); ... Runtime.getRuntime().removeShutdownHook(shutDownThread); this.importDone = true; return true; } private Thread createShutDownThread() { return new Thread(() -> { while(!this.importDone) { try { Thread.sleep(5000); } catch (InterruptedException e) { LOGGER.warn("ShutdownHook Thread interrupted.", e); } } LOGGER.info("ShutdownHook Thread is Done."); }); } The shutdown hook creates a thread that lives until the task sets ‘this.importDone’ to false and keeps the application from terminating. After the task is done, the shutdown hook is removed and ‘this.importDone’ is set to false. Then the application will terminate, and Kubernetes will terminate the pod. That stops Kubernetes from terminating pods with active tasks.
Keda setup on Minikube
Minikube needs enough memory and cpu for the deployment and the ‘metrics-server’. That can be setup with these commands:
minikube config set memory 16384 minikube config set cpu 4 minikube addons enable metrics-server This sets the memory to 16 GB and the cpu to 4 cores and enables the ‘metrics-servicer’.
kubectl create namespace keda helm install keda kedacore/keda --namespace keda Keda is installed in the keda namespace that is created in the first line. Then the Helm Chart for Keda is used to install Keda.
Keda setup
The ScaledObject for the Keda setup can be found in the kubTemplate.yaml:
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: keda-scaled-webapp namespace: default labels: deploymentName: {{ .Values.webAppName }} spec: scaleTargetRef: name: {{ .Values.webAppName }} pollingInterval: 10 # Optional. Default: 30 seconds minReplicaCount: 1 # Optional. Default: 0 maxReplicaCount: 3 # Optional. Default: 100 triggers: - type: metrics-api metadata: targetValue: "{{ .Values.kedaRequestLimit }}" url: "http://{{ .Values.webServiceName }}.default.svc.cluster.local:8080/actuator/metrics/http.server.requests.active" valueLocation: "measurements.0.value" - type: cpu metricType: Utilization # Allowed types are 'Utilization' or 'AverageValue' metadata: value: "{{ .Values.kedaCpuLimit }}" The ‘deploymentName’ and the ‘scaleTargetRef/name’ setup the deployment configuration that Keda should scale. The ‘pollingInterval’ sets the polling interval to 10 seconds. The ‘min/maxReplicaCount’ sets the minimum and maximum number of pods in the deployment.
The trigger of the metrics-api section sets the ‘targetValue’ that increases the deployment count. The ‘url’ is used to request the Spring Actuator Micrometer value. The ‘valueLocation’ provides the path to the value in the Json provided by the application.
The trigger of the cpu section sets the ‘metrcType’ to ‘Utilization’ to get the percentage of the used cpu(s). The ‘value’ sets the value that triggers Keda to scale.
Conclusion
Kubernetes with Keda provides automatic horizontal scaling. If it is required, Keda can scale down to zero instances. That provides an important feature to handle load spikes in a Kubernetes cluster. The large number of scalers enables flexible scaling based on metrics that suit the requirements. It would be possible to scale based on topic length in Kafka or the results of Postgres Sql queries. Keda provides a user-friendly method to solve the scaling problem with stateless Kubernetes pods.