We have a cluster of 3 hosts (actually an Apache Doris cluster, with the servers transmitting data via a third-party library brpc). When transmitting data between servers via TCP, we often encounter segment retransmission problems (perhaps packet loss, the reason is currently unknown).
Through tcpdump packet capture, we found that the system has SACK enabled, and the sender initiate retransmission only after receiving hundreds of dup ACKs.
I understand that according to the basic fast retransmission rule, the sender will initiate a fast retransmission after receiving 3 dup ACKs. I'm also aware that some RFC standards allow the system to dynamically adjust DupThresh, but in my tcpdump packet capture results, retransmission was initiated only after receiving 429 dup ACKs (actually, it's may be a RTO timeout retransmit). Is the dynamic DupThresh so large? Or are there other retransmission rules? Please enlighten me!
Below are some screenshots of the tcpdump results. I will post the completed packet capture files later. Take the packet numbered 486831 as an example, which is a packet sent from server 64 to 67. Packet No. 486888 is the first ACK from 67 to this packet.
Then 64 sent several more packets to 67. Packet No. 496991 is the second ACK for Packet No. 486831 and packet No. 486995 is the third ACK, packet No. 487045 is the fourth, and so on: dup-ACKs
It was not until packet No. 496896 that server 64 retransmitted packet No. 486831 after receiving the 430th ACK, at which more than 7 seconds had passed since the first transmission of this data packet: many-dup-ACKs-and-retransmit
/proc/sys/net/ipv4/tcp_max_reordering
to a small value of10
, butss -i
shows thatreordering
of new TCP connection is still300
, which is the old value. The OS kernel was upgraded from3.10.0-1160.el7.x86_64
to5.15.0-1.el7.elrepo.x86_64
, IT guys said the only upgraded the kernel(not sure how they did it), is it possible that there are kernel and lib mismatch that can cause the newtcp_max_reordering
value not take effect for new TCP connection?