2

According to https://lwn.net/Articles/629155/ the Linux kernel, "can only forward something between 1M and 2M packets per core every second" - but how good Linux is at scaling across dozens of cores then?

Say I have a multiqueue(128 rx+tx pairs) 100gbps NIC on multicore CPU - will Linux be able to saturate 100gbps NIC i.e. scale "between 1M and 2M packets per core every second" across several dozens of cores without much regression to achieve like 10..20M packets/sec overall throughput?

I've also read somewhere that Linux is having hard time scaling network performance above 4 cores - is it true for the latest kernel versions too?

1 Answer 1

2

Realize that 10 Mpps on a host is pushing scalability, and will require tuning to perform well. The RHEL Network Performance Tuning Guide goes into depth of some of this, from NIC offloading to NUMA effects.

Even small 1000 byte packets means a 100 Gb NIC is required.

10 Mpps is maybe 90 ns per packet. Not a lot, only a couple hundred CPU cycles.


To simply drop 10 Mpps, Cloudflare experimented with bypassing netfilter entirely, and used XDP. This is a bit exotic, if your standard host security model assumes netfilter with conntrack. Also cheating a bit on the bandwidth, with 10 Gb NICs assuming a denial of service via tiny packets.

ESnet has achieved 78 Gbps single flows with everyone's favorite do-nothing benchmarks, iperf and nuttcp. Notably, they used 9000 byte packets, so this is "only" 1 million PPS. Still some tuning was required, of the sort you do for high-end databases:

  • Use the correct PCI-E slot, at least version 3 x16.
  • Set CPUs to performance rather than power saving.
  • CPU bind to the same NUMA node as the NIC. Socket interconnect speed matters
  • Max Linux TCP buffers.
  • Updating NIC driver and firmware.

Not too bad, considering further tuning was required on older kernels.

Neither Cloudflare nor ESnet is doing significant computation within these packet flow benchmarks. Doing useful work would be another variable in scalability. Perhaps scale out: haproxy to a dozen backend hosts, each doing an easier to achieve 1 Mpps. Maybe that still hits some of the forwarding scalability limits mentioned by LWN, hard to say.


Regarding the latest kernel, that LWN article was from 5 years ago, and will not be up to date. Get newer kernels on the latest NICs to tune your own benchmarks. Especially if you want to write your own "How to drop 20 Mpps" article.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.