AWS Application Load Balancer sends traffic to unhealthy target group

Question

I have 3 instances(node-0, node-1, node-2) running 2 services - one is a websocket and the other one an API (both services run in each instance).

Target Group Setup:

Target Group	Instance	Health Check Path
api-node-0	node-0	/some-path/api/v1/ping
api-node-1	node-1	/some-path/api/v1/ping
api-node-2	node-2	/some-path/api/v1/ping
websocket-node-0	node-0	/some-path/websocket/v1/ping
websocket-node-1	node-1	/some-path/websocket/v1/ping
websocket-node-2	node-2	/some-path/websocket/v1/ping

Listener and Rules:

HTTPS:443 Listener

Rules:

api

Condition: Path /some-path/api/*
Action: Forward to target group:
- api-node-0 (33.33%)
- api-node-1 (33.33%)
- api-node-2 (33.33%)
- Stickiness: Off

websocket

Condition: Path /some-path/websocket/*
Action: Forward to target group:
- websocket-node-0 (33.33%)
- websocket-node-1 (33.33%)
- websocket-node-2 (33.33%)
- Stickiness: Off

default

Condition: No other rule applies
Action: Forward to target group:
- api-node-0 (100%)

Health Check attributes:

Interval: 30 seconds
Timeout: 5 seconds
Healthy threshold: 2
Unhealthy threshold: 2
Healthy threshold: 2 consecutive health check successes
Unhealthy threshold: 2 consecutive health check failures
Success codes: 200

Load Balancer attributes:

HTTP client keepalive duration: 3600 seconds
Connection idle timeout: 60 seconds
X-Forwarded-For header: Append
Cross-zone load balancing: On

P.S. If you need any more information regarding the setup please let me know.

During normal testing where all target groups are healthy the ALB seems to be operating as expected. Issue arises when I want to simulate a scenario when one of the services on a node becomes unhealthy, I changed the health check path of i.e api-node-1, it shows up as unhealthy (Error 404)but traffic is still being send to it. Confirmed both via Access logs and CloudWatch Metrics (RequestCountPerTarget). I also tried as a simulation of an unhealthy group to block the access of the ALB by removing the relevant security group from the instance. (Error 400)

Testing methods (with unhealthy target group): Using curl (10-20 times) or a Grafana k6 Load Test and monitored traffic both in Access Logs and Cloudwatch - traffic was still being routed to all the instances and one of them was shown as unhealthy.

You can find another question that discussed this issue linked here.

Michael - sqlbot · Accepted Answer · 2024-10-05 19:06:41Z

The answer to the question you linked to is also the answer to the question you are asking, because health check status is disregarded whenever 100% of the targets in the group are failing their health checks.

With only one target in each group, the health check status of the group would always be ignored, so the out-of-service node would still receive traffic.

Now I get it, so the solution would be to have a single target group with all 3 instances instead of having separate target groups for each instance correct? — root69
– root69, Commented Oct 7, 2024 at 6:19
The issue with this is I will not have any visibility on Cloudwatch metrics since I can only graph per target group which means I will not see where traffic is being routed to in that target group. — root69
– root69, Commented Oct 7, 2024 at 6:38

Stack Exchange Network

AWS Application Load Balancer sends traffic to unhealthy target group

api

websocket

default

1 Answer 1

You must log in to answer this question.

Hot Network Questions

AWS Application Load Balancer sends traffic to unhealthy target group

api

websocket

default

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions