1

I have 3 instances(node-0, node-1, node-2) running 2 services - one is a websocket and the other one an API (both services run in each instance).

Target Group Setup:

Target Group Instance Health Check Path
api-node-0 node-0 /some-path/api/v1/ping
api-node-1 node-1 /some-path/api/v1/ping
api-node-2 node-2 /some-path/api/v1/ping
websocket-node-0 node-0 /some-path/websocket/v1/ping
websocket-node-1 node-1 /some-path/websocket/v1/ping
websocket-node-2 node-2 /some-path/websocket/v1/ping

Listener and Rules:

HTTPS:443 Listener

Rules:

api

  • Condition: Path /some-path/api/*
  • Action: Forward to target group:
    • api-node-0 (33.33%)
    • api-node-1 (33.33%)
    • api-node-2 (33.33%)
    • Stickiness: Off

websocket

  • Condition: Path /some-path/websocket/*
  • Action: Forward to target group:
    • websocket-node-0 (33.33%)
    • websocket-node-1 (33.33%)
    • websocket-node-2 (33.33%)
    • Stickiness: Off

default

  • Condition: No other rule applies
  • Action: Forward to target group:
    • api-node-0 (100%)

Health Check attributes:

  • Interval: 30 seconds
  • Timeout: 5 seconds
  • Healthy threshold: 2
  • Unhealthy threshold: 2
  • Healthy threshold: 2 consecutive health check successes
  • Unhealthy threshold: 2 consecutive health check failures
  • Success codes: 200

Load Balancer attributes:

  • HTTP client keepalive duration: 3600 seconds
  • Connection idle timeout: 60 seconds
  • X-Forwarded-For header: Append
  • Cross-zone load balancing: On

P.S. If you need any more information regarding the setup please let me know.

During normal testing where all target groups are healthy the ALB seems to be operating as expected. Issue arises when I want to simulate a scenario when one of the services on a node becomes unhealthy, I changed the health check path of i.e api-node-1, it shows up as unhealthy (Error 404)but traffic is still being send to it. Confirmed both via Access logs and CloudWatch Metrics (RequestCountPerTarget). I also tried as a simulation of an unhealthy group to block the access of the ALB by removing the relevant security group from the instance. (Error 400)

Testing methods (with unhealthy target group): Using curl (10-20 times) or a Grafana k6 Load Test and monitored traffic both in Access Logs and Cloudwatch - traffic was still being routed to all the instances and one of them was shown as unhealthy.

You can find another question that discussed this issue linked here.

1 Answer 1

0

The answer to the question you linked to is also the answer to the question you are asking, because health check status is disregarded whenever 100% of the targets in the group are failing their health checks.

With only one target in each group, the health check status of the group would always be ignored, so the out-of-service node would still receive traffic.

2
  • Now I get it, so the solution would be to have a single target group with all 3 instances instead of having separate target groups for each instance correct? Commented Oct 7, 2024 at 6:19
  • The issue with this is I will not have any visibility on Cloudwatch metrics since I can only graph per target group which means I will not see where traffic is being routed to in that target group. Commented Oct 7, 2024 at 6:38

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.