cluster-autoscaler [AWS] isn't aware of LoadBalancer inflight requests causing 502s when external traffic policy is set to Cluster

I have two deployments behind a service of type Load Balancer and External Traffic Policy set to Cluster. Deployment (say A) behind a service of type Load Balancer(A) and deployment (say B) also behind a service of type Load Balancer(B). I am also using cluster-autoscaler to scaling my worker node.
Deployment App A is my app server, Deployment B is my web-server which would forward all the requests to Load Balancer A of Deployment A workloads. The RTT for each request is around 10-20 seconds. (To reproduce the issue, it wrote a sample APP to include 20 second sleep).

Whenever, I add new deployment workload(say C) to my Cluster, cluster-autoscaler would add new nodes to fulfill the workload request. Whenever, I delete the deployment workload(C), the cluster-autoscaler would scale down the worker nodes (Drain -> Terminate).

As my external traffic policy is set to Cluster, all the new nodes that are joined to the cluster are also registered to the load balancer. However, when cluster-autoscaler deletes a node(let say Node 10), and all the requests which are on Node10 are closed as the node is marked for termination by cluster-autoscaler as there is no workload running resulting in 502's for Service A. This is because cluster-autoscaler it is not aware of active/inflight requests on the node10 for Service A and Service B resulting in 502's as the request is interrupted because of node termination/draining.

Work around: Change external traffic policy to Local

Ask: Make cluster-autoscaler more resilient and make it aware of inflight request when external traffic policy is set to Cluster.

Deployment A manifest file === apiVersion: apps/v1 kind: Deployment metadata: name: limits-nginx spec: selector: matchLabels: run: limits-nginx replicas: 2 template: metadata: labels: run: limits-nginx spec: containers: - name: limits-nginx image: nithmu/nithish:sample-golang-app env: - name: MSG_ENV value: "Hello from the environment" ports: - containerPort: 8080 resources: requests: memory: "264Mi" cpu: "250m" limits: memory: "300Mi" cpu: "300m" === Service A manifest file === { "kind":"Service", "apiVersion":"v1", "metadata":{ "name":"sameple-go-app", "annotations":{ "service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled": "true", "service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled": "true" }, "labels":{ "run": "limits-nginx" } }, "spec":{ "ports": [ { "port":80, "targetPort":8080 } ], "selector":{ "run":"limits-nginx" }, "type":"LoadBalancer" } } === Deployment B manifest file === apiVersion: apps/v1 kind: Deployment metadata: name: my-nginx spec: selector: matchLabels: run: my-nginx replicas: 2 template: metadata: labels: run: my-nginx spec: containers: - name: my-nginx image: nithmu/nithish:nginx_echo imagePullPolicy: Always ports: - containerPort: 8080 === Service B manifest file === { "kind":"Service", "apiVersion":"v1", "metadata":{ "name":"my-nginx", "annotations":{ "service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled": "true", "service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled": "true" }, "labels":{ "run":"my-nginx" } }, "spec":{ "ports": [ { "port":80, "targetPort":8080 } ], "selector":{ "run":"my-nginx" }, "type":"LoadBalancer" } } === Deployment C manifest file === apiVersion: apps/v1 kind: Deployment metadata: name: l-nginx spec: selector: matchLabels: run: l-nginx replicas: 2 template: metadata: labels: run: l-nginx spec: containers: - name: l-nginx image: nginx env: - name: MSG_ENV value: "Hello from the environment" ports: - containerPort: 80 resources: requests: memory: "1564Mi" cpu: "2000m" limits: memory: "1600Mi" cpu: "2500m" ===

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cluster-autoscaler [AWS] isn't aware of LoadBalancer inflight requests causing 502s when external traffic policy is set to Cluster #1907

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cluster-autoscaler [AWS] isn't aware of LoadBalancer inflight requests causing 502s when external traffic policy is set to Cluster #1907

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions