I have an Ubuntu 10.10 server with plenty of RAM, bandwidth and CPU. I'm seeing a strange, repeatable pattern in the distribution of latencies when serving static files from both Apache and nginx. Because the problem is common to both http servers, I'm wondering if I have misconfigured or poorly tuned Ubuntu's networking or cache parameters.
ab -n 1000 -c 4 http://apache-host/static-file.jpg:
Percentage of the requests served within a certain time (ms) 50% 5 66% 3007 75% 3009 80% 3011 90% 9021 95% 9032 98% 21068 99% 45105 100% 45105 (longest request) ab -n 1000 -c 4 http://nginx-host/static-file.jpg:
Percentage of the requests served within a certain time (ms) 50% 19 66% 19 75% 3011 80% 3017 90% 9021 95% 12026 98% 12028 99% 18063 100% 18063 (longest request) The results consistently follow this kind of pattern - 50% or more of requests served as expected, then the remainder falling into discrete bands, with the slowest a few orders of magnitude slower.
Apache is 2.x and has mod_php installed. nginx is 1.0.x and has Passenger installed (but neither app server should be in the critical path for a static file). Load average was around 1 when each test was run (server has 12 physical cores). 5GB free ram, 7GB cached swap. Tests were run from localhost.
Here are the configuration changes I have made from Ubuntu server 10.10 defaults:
/etc/sysctl.conf: net.core.rmem_default = 65536 net.core.wmem_default = 65536 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.route.flush = 1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_moderate_rcvbuf = 1 net.core.somaxconn = 8192 /etc/security/limits.conf: * hard nofile 65535 * soft nofile 65535 root hard nofile 65535 root soft nofile 65535 other config: ifconfig eth0 txqueuelen 1000 Please let me know if this kind of problem rings any bells, or if more information about the config would be helpful. Thanks for your time.
Update: Here's what I'm seeing after increasing net.netfilter.nf_conntrack_max as suggested below:
Percentage of the requests served within a certain time (ms) 50% 2 66% 2 75% 2 80% 2 90% 3 95% 3 98% 3 99% 3 100% 5 (longest request)
dmesg?abside of things....http://localhost? Could there be a DNS bottleneck? How big is this static file? Even your 50% doesn't add up to me, should be 0-1MS for localhostdmesgtold the tale:nf_conntrack: table full, dropping packet.Didsudo sysctl -w net.netfilter.nf_conntrack_max=131072and the problem is gone: 100% of requests in 6ms. Thank you, @KyleBrandt!