I have two Websocket servers which working with Zookeeper & Curator as Active/Passive, if one server fails then second backend comes alive.
I configured it the following way:
upstream backend { server 172.31.9.1:8080 max_fails=1 fail_timeout=5s; server 172.31.9.0:8080 max_fails=1 fail_timeout=5s; } server { listen 443; # host name to respond to server_name xxxxxx.compute.amazonaws.com; ssl on; ssl_certificate /etc/ssl/certs/wildcard.dev.xxxx.net.crt; ssl_certificate_key /etc/ssl/certs/wildcard.dev.xxxx.net.key; location / { # switch off logging access_log off; proxy_pass http://backend; proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # WebSocket support (nginx 1.4) proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } } What I expect is when active and passive servers switch places, Nginx will take 5 secs to recognize failure and redirect all traffic to active server.
What actually happens that it takes up to 25 seconds to recognize active server and switch all traffic.
In real scenario, I can handle up to 10 secs of downtime between redirects.
What am I missing?