Can VMs networking benefit from DPDK? Virtio/Vhost-user status & updates Maxime Coquelin – Victor Kaplansky 2017-01-27
2 AGENDA Can VMs networking benefit from DPDK? ● Overview ● Challenges ● New & upcoming features
Overview
4 DPDK – project overview Overview DPDK is a set of userspace libraries aimed at fast packet processing. ● Data Plane Development Kit ● Goal: ● Benefit from software flexibility ● Achieving performance close to dedicated HW solutions
5 DPDK – project overview Overview ● License: BSD ● CPU architectures: x86, Power8, TILE-Gx & ARM ● NICs: Intel, Mellanox, Broadcom, Cisco,... ● Other HW: Crypto, SCSI for SPDK project ● Operating systems: Linux, BSD
6 1st release by Intel as Zip file 2012 2013 6Wind intiates dpdk.org community DPDK - project history Overview Packaged In Fedora 2014 2015 Power8 & TILE-Gx support ARM Support / Crypto 2016 2017 Moving to Linux Foundation v1.2 v1.3 v1.4 v1.5 v1.6 v1.7 v1.8 v2.0 v2.1 v2.2 v16.04v16.07 v16.11 v17.02v17.05v17.08v17.11 v16.11: ~750K LoC / ~6000 commits / ~350 contributors
7 DPDK – comparison Overview User Kernel NIC Application Socket Net stack Driver User Kernel NIC Application DPDK VFIO
8 DPDK – performance Overview DPDK uses: ● CPU isolation/partitioning & polling → Dedicated CPU cores to poll the device ● VFIO/UIO → Direct devices registers accesses from user-space ● NUMA awareness ● → Resources local to the Poll-Mode Driver’s (PMD) CPU ● Hugepages → Less TLB misses, no swap
9 DPDK – performance Overview To avoid: ● Interrupt handling → Kernel’s NAPI polling mode is not enough ● Context switching ● Kernel/user data copies ● Syscalls overhead → More than the time budget for a 64B packet at 14.88Mpps
10 DPDK - components Overview Credits: Tim O’Driscoll - Intel
11 Virtio/Vhost Overview ● Device emulation, direct assignment, VirtIO ● Vhost: In-kernel virtio device emulation ● device emulation code calls to directly call into kernel subsystems ● Vhost worker thread in host kernel ● Bypasses system calls from user to kernel space on host
12 Vhost driver model VM QEMU Userspace Kvm.ko ioeventfd irqfd ioeventfd Vhost Host Kernel
13 In-kernel device emulation ● In-kernel restricted to virtqueue emulation ● QEMU handles control plane, feature negotiation, migration, etc ● File descriptor polling done by vhost in kernel ● Buffers moved between tap device and virtqueues by kernel worker thread Overview
14 Vhost as user space interface ● Vhost architecture is not tied to KVM ● Backend: Vhost instance in user space ● Eventfd is set up to signal backend when new buffers are placed by guest (kickfd) ● Irqfd is set up to signal the guest about new buffers placed by backend (callfd) ● The beauty: backend only knows about guest memory mapping, kick eventfd and call eventfd ● Vhost-user implemented in DPDK in v16.7 Overview
Challenges
16 Performance Challenges DPDK is about performance, which is a trade-off between: ● Bandwidth → achieving line rate even for small packets ● Latency → as low as possible (of course) ● CPU utilization → $$$ → Prefer bandwidth & latency at the expense of CPU utilization → Take into account HW architectures as much as possible
17 0% packet-loss • Some use-cases of Virtio cannot afford packet loss, like NFV • Hard to achieve max perf without loss, as Virtio is CPU intensive → Scheduling “glitches” may cause packets drop Migration • Requires restoration of internal state including the backend • Interface exposed by QEMU must stay unchanged for cross- version migration • Interface exposed to guest depends on capabilities of third-party application • Support from the management tool is required Reliability Challenges
18 • Isolation of untrusted guests • Direct access to device from untrusted guests • Current implementations require mediator for guest -to- guest communication. • Zero-copy is problematic from security point of view Security Challenges
New & upcoming features
20 Pro: • Allows receiving packets larger than descriptors’ buffer size Con: • Introduce extra-cache miss in the dequeue path Rx mergeable buffers New & upcoming features Desc 0 Desc 1 Desc 2 Desc 3 Desc 4 Desc n Desc n-1 num_buffers = 1 num_buffers = 3
21 Indirect descriptors (DPDK v16.11) New & upcoming features Desc 0 Desc 1 Desc 2 Desc 3 Desc 4 Desc n Desc n-1 Desc 0 Desc 1 Desc 2 Desc 3 Desc 4 Desc n Desc n-1 iDesc 2-0 iDesc 2-1 iDesc 2-2 iDesc 2-3 Direct descriptors chaining Indirect descriptors table
22 Pros: • Increase ring capacity • Improve performance for large number of large requests • Improve 0% packet loss perf even for small requests → If system is not fine-tuned → If Virtio headers are in dedicated descriptor Cons: • One more level of indirection → Impacts raw performance (~-3%) Indirect descriptors (DPDK v16.11) New & upcoming features
23 Vhost dequeue 0-copy (DPDK v16.11) New & upcoming features addr len flags next descriptor desc's buf mbuf's buf addr paddr len ... mbuf Memcpy addr len flags next descriptor desc's buf addr paddr len ... mbuf Default dequeuing Zero-copy dequeuing
24 Pros: • Big perf improvement for standard & large packet sizes → More than +50% for VM-to-VM with iperf benchs • Reduces memory footprint Cons: • Performance degradation for small packets → But disabled by default • Only for VM-to-VM using Vhost lib API (No PMD support) • Does not work for VM-to-NIC → Mbuf lacks release notif mechanism / No headroom Vhost dequeue 0-copy (DPDK v16.11) New & upcoming features
25 Way for the host to share its max supported MTU • Can be used to set MTU values across the infra • Can improve performance for small packet → If MTU fits in rx buffer size, disable Rx mergeable buffers → Save one cache-miss when parsing the virtio-net header MTU feature (DPDK v17.05?) New & upcoming features
26 Vhost-pci (DPDK v17.05?) New & upcoming features vSwitch VM1 Virtio Vhost VM2 Virtio Vhost VM1 Vhost-pci VM2 Virtio Traditional VM to VM communication Direct VM to VM communication
27 Vhost-pci (DPDK v17.05?) New & upcoming features VM1 Vhost-pci driver 1 VM2 Virtio-pci driver Vhost-pci device 1 Virtio-pci device Vhost-pci client Vhost-pci server Vhost-pci protocol
28 Pros: • Performance improvement → The 2 VMs share the same virtqueues → Packets doesn’t go through host’s vSwitch • No change needed in Virtio’s guest drivers Vhost-pci (DPDK v17.05?) New & upcoming features
29 Cons: • Security → Vhost-pci’s VM maps all Virtio-pci’s VM memory space → Could be solved with IOTLB support • Live migration → Not supported in current version → Hard to implement as VMs are connected to each other through a socket Vhost-pci (DPDK v17.05?) New & upcoming features
30 IOTLB in kernel New & upcoming features VM IOMMU driver VM QEMU VHOST IOMMU Device IOTLB iova IOTLB miss IOTLB update, invalidate TLB invalidation hva memory Syscall/eventfd
31 IOTLB for vhost-user New & upcoming features VM IOMMU driver VM QEMU Backend IOMMU IOTLB Cache iova IOTLB miss IOTLB update, invalidate TLB invalidation hva memory Unix Socket
32 • DPDK support for VM is in active development • 10M pps is real in VM with DPDK • New features to boost performance of VM networking • Accelerate transition to NFV / SDN Conclusions
33 Q / A
THANK YOU

Devconf2017 - Can VMs networking benefit from DPDK

  • 1.
    Can VMs networkingbenefit from DPDK? Virtio/Vhost-user status & updates Maxime Coquelin – Victor Kaplansky 2017-01-27
  • 2.
    2 AGENDA Can VMs networkingbenefit from DPDK? ● Overview ● Challenges ● New & upcoming features
  • 3.
  • 4.
    4 DPDK – projectoverview Overview DPDK is a set of userspace libraries aimed at fast packet processing. ● Data Plane Development Kit ● Goal: ● Benefit from software flexibility ● Achieving performance close to dedicated HW solutions
  • 5.
    5 DPDK – projectoverview Overview ● License: BSD ● CPU architectures: x86, Power8, TILE-Gx & ARM ● NICs: Intel, Mellanox, Broadcom, Cisco,... ● Other HW: Crypto, SCSI for SPDK project ● Operating systems: Linux, BSD
  • 6.
    6 1st release by Intel as Zipfile 2012 2013 6Wind intiates dpdk.org community DPDK - project history Overview Packaged In Fedora 2014 2015 Power8 & TILE-Gx support ARM Support / Crypto 2016 2017 Moving to Linux Foundation v1.2 v1.3 v1.4 v1.5 v1.6 v1.7 v1.8 v2.0 v2.1 v2.2 v16.04v16.07 v16.11 v17.02v17.05v17.08v17.11 v16.11: ~750K LoC / ~6000 commits / ~350 contributors
  • 7.
    7 DPDK – comparison Overview User Kernel NIC Application Socket Netstack Driver User Kernel NIC Application DPDK VFIO
  • 8.
    8 DPDK – performance Overview DPDKuses: ● CPU isolation/partitioning & polling → Dedicated CPU cores to poll the device ● VFIO/UIO → Direct devices registers accesses from user-space ● NUMA awareness ● → Resources local to the Poll-Mode Driver’s (PMD) CPU ● Hugepages → Less TLB misses, no swap
  • 9.
    9 DPDK – performance Overview Toavoid: ● Interrupt handling → Kernel’s NAPI polling mode is not enough ● Context switching ● Kernel/user data copies ● Syscalls overhead → More than the time budget for a 64B packet at 14.88Mpps
  • 10.
  • 11.
    11 Virtio/Vhost Overview ● Device emulation, directassignment, VirtIO ● Vhost: In-kernel virtio device emulation ● device emulation code calls to directly call into kernel subsystems ● Vhost worker thread in host kernel ● Bypasses system calls from user to kernel space on host
  • 12.
  • 13.
    13 In-kernel device emulation ●In-kernel restricted to virtqueue emulation ● QEMU handles control plane, feature negotiation, migration, etc ● File descriptor polling done by vhost in kernel ● Buffers moved between tap device and virtqueues by kernel worker thread Overview
  • 14.
    14 Vhost as userspace interface ● Vhost architecture is not tied to KVM ● Backend: Vhost instance in user space ● Eventfd is set up to signal backend when new buffers are placed by guest (kickfd) ● Irqfd is set up to signal the guest about new buffers placed by backend (callfd) ● The beauty: backend only knows about guest memory mapping, kick eventfd and call eventfd ● Vhost-user implemented in DPDK in v16.7 Overview
  • 15.
  • 16.
    16 Performance Challenges DPDK is aboutperformance, which is a trade-off between: ● Bandwidth → achieving line rate even for small packets ● Latency → as low as possible (of course) ● CPU utilization → $$$ → Prefer bandwidth & latency at the expense of CPU utilization → Take into account HW architectures as much as possible
  • 17.
    17 0% packet-loss • Someuse-cases of Virtio cannot afford packet loss, like NFV • Hard to achieve max perf without loss, as Virtio is CPU intensive → Scheduling “glitches” may cause packets drop Migration • Requires restoration of internal state including the backend • Interface exposed by QEMU must stay unchanged for cross- version migration • Interface exposed to guest depends on capabilities of third-party application • Support from the management tool is required Reliability Challenges
  • 18.
    18 • Isolation ofuntrusted guests • Direct access to device from untrusted guests • Current implementations require mediator for guest -to- guest communication. • Zero-copy is problematic from security point of view Security Challenges
  • 19.
  • 20.
    20 Pro: • Allows receivingpackets larger than descriptors’ buffer size Con: • Introduce extra-cache miss in the dequeue path Rx mergeable buffers New & upcoming features Desc 0 Desc 1 Desc 2 Desc 3 Desc 4 Desc n Desc n-1 num_buffers = 1 num_buffers = 3
  • 21.
    21 Indirect descriptors (DPDKv16.11) New & upcoming features Desc 0 Desc 1 Desc 2 Desc 3 Desc 4 Desc n Desc n-1 Desc 0 Desc 1 Desc 2 Desc 3 Desc 4 Desc n Desc n-1 iDesc 2-0 iDesc 2-1 iDesc 2-2 iDesc 2-3 Direct descriptors chaining Indirect descriptors table
  • 22.
    22 Pros: • Increase ringcapacity • Improve performance for large number of large requests • Improve 0% packet loss perf even for small requests → If system is not fine-tuned → If Virtio headers are in dedicated descriptor Cons: • One more level of indirection → Impacts raw performance (~-3%) Indirect descriptors (DPDK v16.11) New & upcoming features
  • 23.
    23 Vhost dequeue 0-copy(DPDK v16.11) New & upcoming features addr len flags next descriptor desc's buf mbuf's buf addr paddr len ... mbuf Memcpy addr len flags next descriptor desc's buf addr paddr len ... mbuf Default dequeuing Zero-copy dequeuing
  • 24.
    24 Pros: • Big perfimprovement for standard & large packet sizes → More than +50% for VM-to-VM with iperf benchs • Reduces memory footprint Cons: • Performance degradation for small packets → But disabled by default • Only for VM-to-VM using Vhost lib API (No PMD support) • Does not work for VM-to-NIC → Mbuf lacks release notif mechanism / No headroom Vhost dequeue 0-copy (DPDK v16.11) New & upcoming features
  • 25.
    25 Way for thehost to share its max supported MTU • Can be used to set MTU values across the infra • Can improve performance for small packet → If MTU fits in rx buffer size, disable Rx mergeable buffers → Save one cache-miss when parsing the virtio-net header MTU feature (DPDK v17.05?) New & upcoming features
  • 26.
    26 Vhost-pci (DPDK v17.05?) New& upcoming features vSwitch VM1 Virtio Vhost VM2 Virtio Vhost VM1 Vhost-pci VM2 Virtio Traditional VM to VM communication Direct VM to VM communication
  • 27.
    27 Vhost-pci (DPDK v17.05?) New& upcoming features VM1 Vhost-pci driver 1 VM2 Virtio-pci driver Vhost-pci device 1 Virtio-pci device Vhost-pci client Vhost-pci server Vhost-pci protocol
  • 28.
    28 Pros: • Performance improvement →The 2 VMs share the same virtqueues → Packets doesn’t go through host’s vSwitch • No change needed in Virtio’s guest drivers Vhost-pci (DPDK v17.05?) New & upcoming features
  • 29.
    29 Cons: • Security → Vhost-pci’sVM maps all Virtio-pci’s VM memory space → Could be solved with IOTLB support • Live migration → Not supported in current version → Hard to implement as VMs are connected to each other through a socket Vhost-pci (DPDK v17.05?) New & upcoming features
  • 30.
    30 IOTLB in kernel New& upcoming features VM IOMMU driver VM QEMU VHOST IOMMU Device IOTLB iova IOTLB miss IOTLB update, invalidate TLB invalidation hva memory Syscall/eventfd
  • 31.
    31 IOTLB for vhost-user New& upcoming features VM IOMMU driver VM QEMU Backend IOMMU IOTLB Cache iova IOTLB miss IOTLB update, invalidate TLB invalidation hva memory Unix Socket
  • 32.
    32 • DPDK supportfor VM is in active development • 10M pps is real in VM with DPDK • New features to boost performance of VM networking • Accelerate transition to NFV / SDN Conclusions
  • 33.
  • 34.