Realize that 10 Mpps on a host is pushing scalability, and will require tuning to perform well. The RHEL Network Performance Tuning Guide goes into depth of some of this, from NIC offloading to NUMA effects.
Even small 1000 byte packets means a 100 Gb NIC is required.
10 Mpps is maybe 90 ns per packet. Not a lot, only a couple hundred CPU cycles.
To simply drop 10 Mpps, Cloudflare experimented with bypassing netfilter entirely, and used XDP. This is a bit exotic, if your standard host security model assumes netfilter with conntrack. Also cheating a bit on the bandwidth, with 10 Gb NICs assuming a denial of service via tiny packets.
ESnet has achieved 78 Gbps single flows with everyone's favorite do-nothing benchmarks, iperf and nuttcp. Notably, they used 9000 byte packets, so this is "only" 1 million PPS. Still some tuning was required, of the sort you do for high-end databases:
- Use the correct PCI-E slot, at least version 3 x16.
- Set CPUs to performance rather than power saving.
- CPU bind to the same NUMA node as the NIC. Socket interconnect speed matters
- Max Linux TCP buffers.
- Updating NIC driver and firmware.
Not too bad, considering further tuning was required on older kernels.
Neither Cloudflare nor ESnet is doing significant computation within these packet flow benchmarks. Doing useful work would be another variable in scalability. Perhaps scale out: haproxy to a dozen backend hosts, each doing an easier to achieve 1 Mpps. Maybe that still hits some of the forwarding scalability limits mentioned by LWN, hard to say.
Regarding the latest kernel, that LWN article was from 5 years ago, and will not be up to date. Get newer kernels on the latest NICs to tune your own benchmarks. Especially if you want to write your own "How to drop 20 Mpps" article.