I've got a legacy system running Debian 7 (proxmox) hosting OpenVZ containers, and I'm seeing a troublesome problem where the system is being overwhelmed by open connections to VZ container running the apache frontend.
When this is happening, the log on the server fills with thousands of "TCP: time wait bucket table overflow (CT233)" errors. This is coupled with slow responses from the webserver. Is there anything I can do to mitigate this problem?
After googling around, I've made some tweaks to various conntrack settings, but I've been reluctant to do anything too radical without a better understanding of what the repercussions could be (or, indeed, whether this was actually likely to be helpful in any case)
To get an idea what the situation is, here is the output of "sysctl -a | grep conntrack" when this was happening today:
net.netfilter.nf_conntrack_generic_timeout = 480 net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120 net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60 net.netfilter.nf_conntrack_tcp_timeout_established = 345600 net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120 net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60 net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30 net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 net.netfilter.nf_conntrack_tcp_timeout_close = 10 net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300 net.netfilter.nf_conntrack_tcp_loose = 1 net.netfilter.nf_conntrack_tcp_be_liberal = 0 net.netfilter.nf_conntrack_tcp_max_retrans = 3 net.netfilter.nf_conntrack_udp_timeout = 30 net.netfilter.nf_conntrack_udp_timeout_stream = 180 net.netfilter.nf_conntrack_icmp_timeout = 30 net.netfilter.nf_conntrack_acct = 0 net.netfilter.nf_conntrack_events = 1 net.netfilter.nf_conntrack_events_retry_timeout = 15 net.netfilter.nf_conntrack_max = 131072 net.netfilter.nf_conntrack_count = 128397 net.netfilter.nf_conntrack_buckets = 32768 net.netfilter.nf_conntrack_checksum = 1 net.netfilter.nf_conntrack_log_invalid = 0 net.netfilter.nf_conntrack_expect_max = 256 net.nf_conntrack_max = 131072 This includes a few changes that I made today: I doubled nf_conntrack_buckets from 16384 to 32768, I shrank conntrack_generic_timeout from 600s to 480s, and I shrank conntrack_tcp_timeout_established from 5d to 4d.
The vast majority of the open connections at any given time are in TIME_WAIT.
I'm hoping there is something that someone with more knowledge of TCP/Kernel tuning than I can recommend.
Thanks!