Overwhelmed by "TCP: time wait bucket table overflow" errors -- What can I do to mitigate?

Question

I've got a legacy system running Debian 7 (proxmox) hosting OpenVZ containers, and I'm seeing a troublesome problem where the system is being overwhelmed by open connections to VZ container running the apache frontend.

When this is happening, the log on the server fills with thousands of "TCP: time wait bucket table overflow (CT233)" errors. This is coupled with slow responses from the webserver. Is there anything I can do to mitigate this problem?

After googling around, I've made some tweaks to various conntrack settings, but I've been reluctant to do anything too radical without a better understanding of what the repercussions could be (or, indeed, whether this was actually likely to be helpful in any case)

To get an idea what the situation is, here is the output of "sysctl -a | grep conntrack" when this was happening today:

net.netfilter.nf_conntrack_generic_timeout = 480 net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120 net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60 net.netfilter.nf_conntrack_tcp_timeout_established = 345600 net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120 net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60 net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30 net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 net.netfilter.nf_conntrack_tcp_timeout_close = 10 net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300 net.netfilter.nf_conntrack_tcp_loose = 1 net.netfilter.nf_conntrack_tcp_be_liberal = 0 net.netfilter.nf_conntrack_tcp_max_retrans = 3 net.netfilter.nf_conntrack_udp_timeout = 30 net.netfilter.nf_conntrack_udp_timeout_stream = 180 net.netfilter.nf_conntrack_icmp_timeout = 30 net.netfilter.nf_conntrack_acct = 0 net.netfilter.nf_conntrack_events = 1 net.netfilter.nf_conntrack_events_retry_timeout = 15 net.netfilter.nf_conntrack_max = 131072 net.netfilter.nf_conntrack_count = 128397 net.netfilter.nf_conntrack_buckets = 32768 net.netfilter.nf_conntrack_checksum = 1 net.netfilter.nf_conntrack_log_invalid = 0 net.netfilter.nf_conntrack_expect_max = 256 net.nf_conntrack_max = 131072

This includes a few changes that I made today: I doubled nf_conntrack_buckets from 16384 to 32768, I shrank conntrack_generic_timeout from 600s to 480s, and I shrank conntrack_tcp_timeout_established from 5d to 4d.

The vast majority of the open connections at any given time are in TIME_WAIT.

I'm hoping there is something that someone with more knowledge of TCP/Kernel tuning than I can recommend.

Thanks!

Ross Messiah · Accepted Answer · 2016-10-26 09:03:49Z

I ended up adjusting two other variables, doubling each of them: "net.ipv4.tcp_max_tw_buckets" and "net.ipv4.tcp_max_tw_buckets_ub", and since making those changes the "time wait bucket table overflow " errors have not reoccurred. I'm going to keep an eye on it over the course of the next week or so, however, and see if this has actually fixed the issue.

Stack Exchange Network

Overwhelmed by "TCP: time wait bucket table overflow" errors -- What can I do to mitigate?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Overwhelmed by "TCP: time wait bucket table overflow" errors -- What can I do to mitigate?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions