Docker Networking @MadhuVenugopal @mrjana Control-plane & Data-plane
•Docker Networking •Features •Control plane & Data plane •Deep Dive •Control plane •Data plane •Q & A Agenda
Docker Networking 1.7 1.8 1.9 1.10 1.11 - Libnetwork - CNM - Migrated Bridge, host, none drivers to CNM - Overlay Driver - Network Plugins - IPAM Plugins - Network UX/API Service Discovery (using /etc/hosts) Distributed DNS - Aliases - DNS Round Robin LB 1.12 - Load Balancing - Encrypted Control and data plane - Routing Mesh - Built-in Swarm-mode networking
Networking planes Management plane Control plane Data plane UX, CLI, REST-API, SNMP, … Distributed (OSPF, BGP, Gossip-based), Centralized(OpenFlow, OVSDB) User/Operator/Tools managing Network Infrastructure Signaling between network entities to exchange reachability states Actual movement of application data packets IPTables, IPVS, OVS-DP, DPDK, BPF, Routing Tables, …
Docker networking planes Management plane Control plane Data plane Network-Scoped Gossip, Service-Discovery, Encryption key distribution Docker network UX, APIs and Network mgmt plugins Network plugins and built-in drivers Bridge, Overlay, macvlan, ipvlan, host, all other plugins… Libnetwork core & swarmkit allocator
Deep Dive - Control Plane
Control plane components • Centralized resources and policies • De-centralized events
Centralized resources and policies Manager Network Create Orchestrator Allocator Scheduler Dispatcher Service Create Task Create Task Dispatch Task Dispatch Gossip Worker1 Worker2 Engine Libnetwork Engine Libnetwork • Resources and policies are defined centrally • Networks are a definition of policy • Central resource allocation (IP Subnets, Addresses, VNIs) • Can mutate state as long as managers are available
• State is learned through de- centralized dissemination of events • Gossip based protocol • Fast convergence • Highly scalable • Continues to function even if all managers are Down De-centralized events Swarm Scope Gossip W1 W2 W3 W1 W5 W4 Network Scope Gossip Network Scope Gossip
• Completely de-centralized discovery of cluster nodes • Cluster membership is discovered using an implementation of Scalable Weakly-consistent Infection-style Process Group Membership Protocol (SWIM) • Two kinds of cluster membership: • Swarm level • Network level • Sequentially consistent state dissemination ordered by a lamport clock • Single writer at a record/entry level • Convergence time roughly has a O(logn) asymptotic time complexity Gossip in detail
Failure detection Node A Periodic probe node based on randomized round robin Node BXRandom node fails to ack Random Node C Random Node D Random Node E Suspect Node B Suspect Timeout Dead Node B 9 More nodes receive rebroadcast Rebroadcast Entire cluster receives rebroadcast Rebroadcast
State dissemination Node A Broadcast state change to unto 3 nodes which participate in the network that this entry belongs to Random Node C Random Node D Random Node E 9 More nodes receive rebroadcast Rebroadcast Entire cluster receives rebroadcast Rebroadcast Accept state update only if entry’s lamport time is greater than the lamport time of existing entry Random Node F Periodic bulk sync of the entire state for a single network to a random node participating in that network
Deep Dive - Data Plane Overlay driver
Overlay Networking Under the Hood • Virtual eXtensible Local Area Network(VXLAN) data transport • L2 Network over an L3 network ( overlay ) • RFC7348 • Host as VXLAN Tunnel End Point (VTEP) • Point-to-Multi-Point Tunnels • Proxy-ARP
Overlay Networking Under the Hood • A Linux Bridge per Subnet per Overlay Network per Host • A VXLAN interface per Subnet per Overlay Network per Host • 1 Linux Bridge per Host for default traffic (docker_gwbridge) • Lazy creation ( Only if container is attached to network)
Overlay Networking Under the Hood C1 C2 C3 C5 C4 br0 Veth Veth Veth Host NIC VXLAN Host NIC br0 Veth Veth VXLAN Docker Host 1 Docker Host 2
Linux Kernel NetFilter dataflow
Service , Port-Publish & Network iptables eth0 Host1 default_gwbridge ingress-sbox eth1 ingress-overlay-bridge Ingress- Network eth0 vxlan tunnel to host2 - vni-100vxlan tunnel to host3 - vni-100 eth0 Container-sbox eth1 eth2 mynet mynet-br vxlan tunnel to host2 - vni-101 docker service create —name=test —network=mynet -p 8080:80 —replicas=2 xxx iptables ipvs iptables ipvs Host1: 8080 DNS Resolver daemon embedded DNS server service -> VIP
Day in life of a packet - Internal LB eth0 Host1 container-sbox (service1) eth1 iptables MANGLE table OUTPUT MARK : VIP -> <fw-mark-id> IPVS Match <fw-mark-id> -> Masq {RR across container-IPs) mynet-overlay-bridge mynet eth2 Host2 mynet-overlay-bridgevxlan tunnel with vni mynet eth2 Container-sbox (service2) Application looks up service2 (using embedded-DNS @ 127.0.0.11) DNS Resolver daemon embedded DNS server service2 -> VIP2 vxlan tunnel with vni
• Builtin routing mesh for edge routing • Worker nodes themselves participate in ingress routing mesh • All worker nodes accept connection requests on PublishedPort • Port translation happens at the worker node • Same internal load balancing mechanism used to load balance external requests Routing mesh External Loadbalancer (optional) Task1 ServiceA Task1 ServiceA Task1 ServiceA Worker1 Worker2 Ingress Network 8080 8080 VIP LB VIP LB 8080->80 8080->80 8080->80
Day in life of a packet - Routing Mesh & Ingress LB iptables NAT table DOCKER-INGRESS DNAT : Published-Port -> ingress-sbox eth0 Host1 default_gwbridge ingress-sboxeth1 iptables MANGLE table PREROUTING MARK : Published-Port -> <fw-mark-id> IPVS Match <fw-mark-id> -> Masq {RR across container-IPs) ingress-overlay-bridge Ingress- Network eth0 iptables NAT table DOCKER-INGRESS DNAT : Published-Port -> ingress-sbox eth0 Host2 default_gwbridge ingress-sbox eth1 ingress-overlay-bridge eth0 vxlan tunnel with vni Ingress- Network eth0 Container-sbox (backs a task/service) eth1 iptables NAT table PREROUTING Redirect -> target-port
Q&A

Docker Networking: Control plane and Data plane

  • 1.
  • 2.
    •Docker Networking •Features •Control plane& Data plane •Deep Dive •Control plane •Data plane •Q & A Agenda
  • 3.
    Docker Networking 1.7 1.81.9 1.10 1.11 - Libnetwork - CNM - Migrated Bridge, host, none drivers to CNM - Overlay Driver - Network Plugins - IPAM Plugins - Network UX/API Service Discovery (using /etc/hosts) Distributed DNS - Aliases - DNS Round Robin LB 1.12 - Load Balancing - Encrypted Control and data plane - Routing Mesh - Built-in Swarm-mode networking
  • 4.
    Networking planes Management plane Controlplane Data plane UX, CLI, REST-API, SNMP, … Distributed (OSPF, BGP, Gossip-based), Centralized(OpenFlow, OVSDB) User/Operator/Tools managing Network Infrastructure Signaling between network entities to exchange reachability states Actual movement of application data packets IPTables, IPVS, OVS-DP, DPDK, BPF, Routing Tables, …
  • 5.
    Docker networking planes Managementplane Control plane Data plane Network-Scoped Gossip, Service-Discovery, Encryption key distribution Docker network UX, APIs and Network mgmt plugins Network plugins and built-in drivers Bridge, Overlay, macvlan, ipvlan, host, all other plugins… Libnetwork core & swarmkit allocator
  • 6.
    Deep Dive -Control Plane
  • 7.
    Control plane components •Centralized resources and policies • De-centralized events
  • 8.
    Centralized resources andpolicies Manager Network Create Orchestrator Allocator Scheduler Dispatcher Service Create Task Create Task Dispatch Task Dispatch Gossip Worker1 Worker2 Engine Libnetwork Engine Libnetwork • Resources and policies are defined centrally • Networks are a definition of policy • Central resource allocation (IP Subnets, Addresses, VNIs) • Can mutate state as long as managers are available
  • 9.
    • State islearned through de- centralized dissemination of events • Gossip based protocol • Fast convergence • Highly scalable • Continues to function even if all managers are Down De-centralized events Swarm Scope Gossip W1 W2 W3 W1 W5 W4 Network Scope Gossip Network Scope Gossip
  • 10.
    • Completely de-centralizeddiscovery of cluster nodes • Cluster membership is discovered using an implementation of Scalable Weakly-consistent Infection-style Process Group Membership Protocol (SWIM) • Two kinds of cluster membership: • Swarm level • Network level • Sequentially consistent state dissemination ordered by a lamport clock • Single writer at a record/entry level • Convergence time roughly has a O(logn) asymptotic time complexity Gossip in detail
  • 11.
    Failure detection Node A Periodicprobe node based on randomized round robin Node BXRandom node fails to ack Random Node C Random Node D Random Node E Suspect Node B Suspect Timeout Dead Node B 9 More nodes receive rebroadcast Rebroadcast Entire cluster receives rebroadcast Rebroadcast
  • 12.
    State dissemination Node A Broadcaststate change to unto 3 nodes which participate in the network that this entry belongs to Random Node C Random Node D Random Node E 9 More nodes receive rebroadcast Rebroadcast Entire cluster receives rebroadcast Rebroadcast Accept state update only if entry’s lamport time is greater than the lamport time of existing entry Random Node F Periodic bulk sync of the entire state for a single network to a random node participating in that network
  • 13.
    Deep Dive -Data Plane Overlay driver
  • 14.
    Overlay Networking Underthe Hood • Virtual eXtensible Local Area Network(VXLAN) data transport • L2 Network over an L3 network ( overlay ) • RFC7348 • Host as VXLAN Tunnel End Point (VTEP) • Point-to-Multi-Point Tunnels • Proxy-ARP
  • 15.
    Overlay Networking Underthe Hood • A Linux Bridge per Subnet per Overlay Network per Host • A VXLAN interface per Subnet per Overlay Network per Host • 1 Linux Bridge per Host for default traffic (docker_gwbridge) • Lazy creation ( Only if container is attached to network)
  • 16.
    Overlay Networking Underthe Hood C1 C2 C3 C5 C4 br0 Veth Veth Veth Host NIC VXLAN Host NIC br0 Veth Veth VXLAN Docker Host 1 Docker Host 2
  • 17.
  • 18.
    Service , Port-Publish& Network iptables eth0 Host1 default_gwbridge ingress-sbox eth1 ingress-overlay-bridge Ingress- Network eth0 vxlan tunnel to host2 - vni-100vxlan tunnel to host3 - vni-100 eth0 Container-sbox eth1 eth2 mynet mynet-br vxlan tunnel to host2 - vni-101 docker service create —name=test —network=mynet -p 8080:80 —replicas=2 xxx iptables ipvs iptables ipvs Host1: 8080 DNS Resolver daemon embedded DNS server service -> VIP
  • 19.
    Day in lifeof a packet - Internal LB eth0 Host1 container-sbox (service1) eth1 iptables MANGLE table OUTPUT MARK : VIP -> <fw-mark-id> IPVS Match <fw-mark-id> -> Masq {RR across container-IPs) mynet-overlay-bridge mynet eth2 Host2 mynet-overlay-bridgevxlan tunnel with vni mynet eth2 Container-sbox (service2) Application looks up service2 (using embedded-DNS @ 127.0.0.11) DNS Resolver daemon embedded DNS server service2 -> VIP2 vxlan tunnel with vni
  • 20.
    • Builtin routingmesh for edge routing • Worker nodes themselves participate in ingress routing mesh • All worker nodes accept connection requests on PublishedPort • Port translation happens at the worker node • Same internal load balancing mechanism used to load balance external requests Routing mesh External Loadbalancer (optional) Task1 ServiceA Task1 ServiceA Task1 ServiceA Worker1 Worker2 Ingress Network 8080 8080 VIP LB VIP LB 8080->80 8080->80 8080->80
  • 21.
    Day in lifeof a packet - Routing Mesh & Ingress LB iptables NAT table DOCKER-INGRESS DNAT : Published-Port -> ingress-sbox eth0 Host1 default_gwbridge ingress-sboxeth1 iptables MANGLE table PREROUTING MARK : Published-Port -> <fw-mark-id> IPVS Match <fw-mark-id> -> Masq {RR across container-IPs) ingress-overlay-bridge Ingress- Network eth0 iptables NAT table DOCKER-INGRESS DNAT : Published-Port -> ingress-sbox eth0 Host2 default_gwbridge ingress-sbox eth1 ingress-overlay-bridge eth0 vxlan tunnel with vni Ingress- Network eth0 Container-sbox (backs a task/service) eth1 iptables NAT table PREROUTING Redirect -> target-port
  • 22.