1 Kai Waehner | Technology Evangelist, Confluent contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
2Abstract Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. This session gives an overview of several scenarios that may require multi-cluster solutions and discusses real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments and global Kafka. Key takeaways: • In many scenarios, one Kafka cluster is not enough. Understand different architectures and alternatives for multi-cluster deployments. • Zero data loss and high availability are two key requirements. Understand how to realize this, including trade-offs. • Learn about features and limitations of Kafka for multi cluster deployments • Global Kafka and mission-critical multi-cluster deployments with zero data loss and high availability became the normal, not an exception. www.kai-waehner.de | @KaiWaehner
3 Agenda 1) Definition ‘Kafka Cluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
4 Agenda 1) Definition ‘Kafka Cluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
5 www.kai-waehner.de | @KaiWaehner
6 The Beginning of a New Era www.kai-waehner.de | @KaiWaehner https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying The first use case: Log Analytics. This is why Kafka was created!
7 Event Streaming Platform – The Commit Log www.kai-waehner.de | @KaiWaehner Time P C1 C2 C3
8 Event Streaming Platform – A Distributed System www.kai-waehner.de | @KaiWaehner Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
9Apache Kafka (kafka.apache.org) includes Kafka Connect and Kafka Streams www.kai-waehner.de | @KaiWaehner Kafka Streams Your app sinksource KafkaConnect KafkaConnect Kafka Cluster
10A Streaming Platform is the Underpinning of an Event-driven Architecture www.kai-waehner.de | @KaiWaehner Microservices DBs SaaS apps Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps
11 Apache Kafka at Scale at Tech Giants www.kai-waehner.de | @KaiWaehner > 7 trillion messages / day > 6 Petabytes / day “You name it” * Kafka Is not just used by tech giants ** Kafka is not just used for big data
12 www.kai-waehner.de | @KaiWaehner Improve Customer Experience (CX) Increase Revenue (make money) Business Value Decrease Costs (save money) Core Business Platform Increase Operational Efficiency Migrate to Cloud Mitigate Risk (protect money) Key Drivers Strategic Objectives (sample) Fraud Detection IoT sensor ingestion Digital replatforming/ Mainframe Offload Connected Car: Navigation & improved in-car experience: Audi Customer 360 Simplifying Omni-channel Retail at Scale: Target Faster transactional processing / analysis incl. Machine Learning / AI Mainframe Offload: RBC Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Regulatory Digital Transformation Application Modernization: Multiple Examples Website / Core Operations (Central Nervous System) The [Silicon Valley] Digital Natives; LinkedIn, Netflix, Uber, Yelp... Predictive Maintenance: Audi Streaming Platform in a regulated environment (e.g. Electronic Medical Records): Celmatix Real-time app updates Real Time Streaming Platform for Communications and Beyond: Capital One Developer Velocity - Building Stateful Financial Applications with Kafka Streams: Funding Circle Detect Fraud & Prevent Fraud in Real Time: PayPal Kafka as a Service - A Tale of Security and Multi-Tenancy: Apple Example Use Cases $↑ $↓ $ Example Case Studies (of many)
13 A Kafka Cluster www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect
14 A Kafka Cluster www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect “Server Side” of a Kafka Cluster 1 - * Kafka Broker 1 - * ZooKeeper 0 - * Schema Registry 0 - * Kafka Connect 0 - * REST Proxy Security, Ops, Monitoring, … “Client Side” of a Kafka Cluster Kafka Clients (Java, C, C++, Python, Go, JavaScript, …) Kafka Stream Processing Apps (Kafka Streams, ksqlDB) External Producers / Consumers (Oracle, Hadoop, Flink, …)
15 Why Multiple Kafka Clusters? www.kai-waehner.de | @KaiWaehner * Not a representative survey J ** Many DCs does NOT necessarily mean more than one Kafka Cluster
16 Disaster Recovery – RPO and RTO www.kai-waehner.de | @KaiWaehner RPO = Recovery Point Objective RTO = Recovery Time Objective
17 Agenda 1) Definition ‘Kafka Cluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
18 A Kafka Cluster www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect
19 A Kafka Cluster for High Availability www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect PROD / Pre-PROD / TEST 3 - * Kafka Broker 3 / 5 / 7 ZooKeeper 2 Schema Registry 2 - * Kafka Connect 2 - * REST Proxy Security, Ops, Monitoring, … DEV / Functional TEST 1 Kafka Broker 1 ZooKeeper 0 - 1 Schema Registry 0 - 1 Kafka Connect 0 - 1 REST Proxy Security, Ops, Monitoring, …
20 A Stretched Kafka Cluster over 3DC www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry DC1 DC2 DC3
21 A Stretched Kafka Cluster over 3DC www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry DC1 DC2 DC3 High availability (Survives DC outage) Zero data loss and zero downtime Automatic client fail-over Works well in cloud (3 AZs in 1 region) Requires “good” latency (à DCs ”close” to each other) Requires three DCs (Quorum / split brain) Complex to configure and operate
22 A Stretched Kafka Cluster over 2DC www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry DC1 DC2 Kafka Broker
23 A Stretched Kafka Cluster over 2DC www.kai-waehner.de | @KaiWaehner High availability (Survives DC outage) Zero data loss or zero downtime Automatic client fail-over Stopgap solution for on premise (if only 2 DCs available) à 2.5 DC deployment as workaround Requires “good” latency (à DCs ”close” to each other) Quorum in 2 DCs not possible Complex to configure and operate Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry DC1 DC2 Kafka Broker
24 A Single Kafka Cluster www.kai-waehner.de | @KaiWaehner Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Simple setup Works Often used “at the Edge” No high availability
25 Agenda 1) Definition ‘Kafka Cluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
26 Independent Kafka Clusters www.kai-waehner.de | @KaiWaehner Total Independence Owned by the project teams, central ops or SaaS Different sizing, security, infrastructure Related projects should run on the same Kafka cluster Independent projects can run on the same Kafka cluster • similar SLAs and requirements • e.g. NOT Instant payment vs. log analytics vs. file transfer vs. video streaming • ACLs / RBAC for fine-grained authentication and authorization • throughput typically no issue (Confluent Cloud processes 1 Gigabyte / sec and more in one cluster) • reduce overhead (operations, hardware, …) Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect
27 Hybrid Integration of 2 Kafka Clusters www.kai-waehner.de | @KaiWaehner Hybrid integration On premise and cloud or multi-cloud scenarios (due to technical, business or legal reasons) Uni- or bi-directional Know the best practices or get help Know your SLAs and timelines Choose the right (battle-tested?) tool Works Relatively easy to setup (some tools are complex / not up-to-date / not mature / not documented well) Example: Replicate data from production to analytics cluster Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect MirrorMaker 1 / 2 Confluent Replicator uReplicator (Uber) Mirus (Salesforce) Brooklin (LinkedIn) Custom Replication DC1 DC2 Streaming Replication
28 Migration of Kafka Clusters www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect MirrorMaker 1 / 2 Confluent Replicator uReplicator (Uber) Mirus (Salesforce) Brooklin (LinkedIn) Custom Replication DC1 DC2 Streaming Replication Common migration scenarios On premise à Cloud Cloud A à Cloud B Vendor 1 à Vendor 2 Self-Managed à SaaS Migration steps: 1) Create new Kafka cluster 2) Producer / Consumer re-configuration 3) Shutdown of old Cluster Know the best practices or get help Know your SLAs and timelines Choose the right (battle-tested?) tool
29 Disaster Recovery with 2 Kafka Clusters www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect MirrorMaker 1 / 2 Confluent Replicator uReplicator (Uber) Mirus (Salesforce) Brooklin (LinkedIn) Custom Replication DC1 DC2 Streaming Replication Know the trade-offs! If Kafka Cluster 1 is down, Kafka Cluster 2 is still live and running Timestamp preservation Offset translation Manual client-failover / custom client code Data loss in case of DC outage (asynchronous replication)
30 Disaster Recovery @ JPMorgan www.kai-waehner.de | @KaiWaehner https://www.confluent.io/kafka-summit-san-francisco-2019/secure-kafka-at-scale-in-true-multi-tenant-environment
31 Aggregation of Kafka Clusters www.kai-waehner.de | @KaiWaehner Local smaller Kafka Clusters in each site for critical real time applications (high SLAs) Central bigger Kafka Cluster for analytics use cases (often less critical SLAs) Works Relatively easy to setup (some tools are complex / not up-to-date / not mature / not documented well) Some tools do not support same topic name in each DC Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Kafka Connect Kafka Connect Kafka Broker Kafka Broker Kafka Broker
32 Aggregation Cluster @ Royal Caribbean www.kai-waehner.de | @KaiWaehner https://www.confluent.io/kafka-summit-lon19/seamless-guest-experience-with-kafka-streams/
33 Aggregation of Edge Kafka Clusters www.kai-waehner.de | @KaiWaehner Small Kafka clusters in each site for data collection (often low SLAs, sometimes single Kafka broker) Kafka at the edge sometimes OEM / hardware appliance Central big Kafka cluster for critical use cases and edge integration (high SLAs) Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Kafka Connect Kafka Connect Kafka Broker Kafka Broker Kafka Broker
34 Real Time Streaming ML at the Edge @ Severstal www.kai-waehner.de | @KaiWaehner https://www.confluent.io/customers/severstal/
35 Cross-Company Kafka Integration (Special Case of Hybrid Integration) www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect MirrorMaker 1 / 2 Confluent Replicator uReplicator (Uber) Mirus (Salesforce) Brooklin (LinkedIn) Custom Replication Company A Company B Streaming Replication Streaming integration between companies API Management (REST et al) not appropriate for streaming data Infosec and politics are your biggest enemy
36 Agenda 1) Definition ‘Kafka Cluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
37 A Stretched Kafka Cluster over 3 Regions? www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry US-East US-Central US-West No! Fail! Error! Sorry!
38Replication Between Kafka Clusters over Multiple Regions or Continents www.kai-waehner.de | @KaiWaehner Streaming replication works (MirrorMaker 2, Confluent Replicator) Same challenges as in one region (data loss, custom code for fail-over, offset translation, etc.) Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect China USA Europe
39 A Single Kafka Cluster over 3 Regions with Multi-Region Replication www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry US-East US-Central US-West
40A Single Multi Region Kafka Cluster (MRC) www.kai-waehner.de | @KaiWaehner High availability (Survives region outage) Zero data loss and zero downtime Automatic client fail-over over regions Works well in cloud and on premise No external tools (like MirrorMaker) needed Not part of Open Source Kafka à Build vs. Buy Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry US-East US-Central US-West How does this work? Region-awareness Synchronous or asynchronous replication per Topic Follower-fetching Regional topic locality Replication rules … (Confluent Platform)
41A Single Multi Region Kafka Cluster (MRC) www.kai-waehner.de | @KaiWaehner Broker 1 Broker 2 Broker 3 ZK1 Broker 4 Broker 5 Broker 6 Broker 1 Broker 2 ZK2 Client D Client F Client G Failover site ZK3 Broker 3 Broker 4 Broker 5 Broker 6 Client A Client B us-central-1 Client A Client B automated client failover Observer replicas us-west-1 us-east-1 Site failure! “tie-breaker” datacenter Single Kafka Cluster (Confluent Platform) $ bin/kafka-topics.sh --bootstrap-servers localhost:2181 --create --topic trades-west --partitions 3 --config replication-factor={us-west: 2} --config min.insync.replicas=2 --config async.replication-factor={us-east: 2} --config max.async.time.behind.min=5 --config replay.truncated.messages=true
42 Vision: One Global Kafka Cluster www.kai-waehner.de | @KaiWaehner Topic‘pos_payments’
43 ZooKeeper Removal (KIP-500) www.kai-waehner.de | @KaiWaehner https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
44 Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments… www.kai-waehner.de | @KaiWaehner à There is no best solution. It depends!
45 Agenda 1) Definition ‘Kafka Cluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
46 Infrastructure Options www.kai-waehner.de | @KaiWaehner Infrastructure is your choice! à Bare metal vs. VM vs. container vs. cloud… Software is your choice! à Open source vs. commercial vs. SaaS Ops and management are your choice! à Self-Managed vs. PaaS vs. fully-managed Integration is your choice! à Kafka-native vs. other tools / services Find the right solution for your business case and for your SLAs…
4747 CONFLUENT PLATFORM EFFICIENT OPERATIONS AT SCALE PRODUCTION-STAGE PREREQUISITES UNRESTRICTED DEVELOPER PRODUCTIVITY Multi-language Development Rich Pre-built Ecosystem SQL-based Stream Processing GUI-driven Mgmt & Monitoring Flexible DevOps Automation Dynamic Performance & Elasticity Enterprise-grade Security Data Compatibility Global Availability APACHE KAFKA Fully Managed Cloud ServiceSelf Managed Software FREEDOM OF CHOICE COMMITTER-LED EXPERTISE PartnersTraining Professional Services Enterprise Support DEVELOPER OPERATOR ARCHITECT Hybrid Infrastructure
48 Kafka as a Service – Fully Managed? Infrastructure management (commodity) Scaling ● Upgrades (latest stable version of Kafka) ● Patching ● Maintenance ● Sizing (retention, latency, throughput, storage, etc.) ● Data balancing for optimal performance ● Performance tuning for real-time and latency requirements ● Fixing Kafka bugs ● Uptime monitoring and proactive remediation of issues ● Recovery support from data corruption ● Scaling the cluster as needed ● Data balancing the cluster as nodes are added ● Support for any Kafka issue with less than X minutes response time Infra-as-a-Service Harness full power of Kafka Kafka-specific management Platform-as-a-Service Evolve as you need Future-proof Mission-critical reliability Most Kafka-as-a-Service offerings are partially-managed Kafka as a Service should be a serverless experience with consumption-based pricing!
4949 I N V E S T M E N T & T I M E VALUE 3 4 5 1 2 Event Streaming Maturity Model 49 Initial Awareness / Pilot (1 Kafka Cluster) Start to Build Pipeline / Deliver 1 New Outcome (1 Kafka Cluster) Mission-Critical Deployment (Stretched, Hybrid, Multi-Region) Build Contextual Event-Driven Apps (Stretched, Hybrid, Multi-Region) Central Nervous System (Global Kafka) Product, Support, Training, Partners, Technical Account Management...
50End-to-End Integration with 24/7 Uptime and Zero Data Loss? www.kai-waehner.de | @KaiWaehner …. more components, clusters, technologies means more risks, conflicts, incompatibilities, operations burden! ETL MQ Storage Streaming Messaging: Kafka Core Storage: Kafka Core Caching: Kafka Core Real-Time, Batch: Kafka Clients Integration: Kafka Connect Stream Processing: Kafka Streams / KSQL Request-Response: REST Proxy Replication between Kafka Clusters (Edge, On Premises, Hybrid, Cloud) Multi-Region Kafka Cluster ”Eat your own dog food” vs. http://www.kai-waehner.de/blog/2019/03/07/apache-kafka-middleware-mq-etl-esb-comparison/
51 Questions? Let’s connect... Kai Waehner Technology Evangelist kai.waehner@confluent.io @KaiWaehner www.confluent.io www.kai-waehner.de LinkedIn

Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments

  • 1.
    1 Kai Waehner |Technology Evangelist, Confluent contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
  • 2.
    2Abstract Architecture patterns fordistributed, hybrid, edge and global Apache Kafka deployments Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. This session gives an overview of several scenarios that may require multi-cluster solutions and discusses real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments and global Kafka. Key takeaways: • In many scenarios, one Kafka cluster is not enough. Understand different architectures and alternatives for multi-cluster deployments. • Zero data loss and high availability are two key requirements. Understand how to realize this, including trade-offs. • Learn about features and limitations of Kafka for multi cluster deployments • Global Kafka and mission-critical multi-cluster deployments with zero data loss and high availability became the normal, not an exception. www.kai-waehner.de | @KaiWaehner
  • 3.
    3 Agenda 1) Definition ‘KafkaCluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
  • 4.
    4 Agenda 1) Definition ‘KafkaCluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
  • 5.
  • 6.
    6 The Beginning ofa New Era www.kai-waehner.de | @KaiWaehner https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying The first use case: Log Analytics. This is why Kafka was created!
  • 7.
    7 Event Streaming Platform– The Commit Log www.kai-waehner.de | @KaiWaehner Time P C1 C2 C3
  • 8.
    8 Event Streaming Platform– A Distributed System www.kai-waehner.de | @KaiWaehner Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 9.
    9Apache Kafka (kafka.apache.org)includes Kafka Connect and Kafka Streams www.kai-waehner.de | @KaiWaehner Kafka Streams Your app sinksource KafkaConnect KafkaConnect Kafka Cluster
  • 10.
    10A Streaming Platform isthe Underpinning of an Event-driven Architecture www.kai-waehner.de | @KaiWaehner Microservices DBs SaaS apps Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps
  • 11.
    11 Apache Kafka atScale at Tech Giants www.kai-waehner.de | @KaiWaehner > 7 trillion messages / day > 6 Petabytes / day “You name it” * Kafka Is not just used by tech giants ** Kafka is not just used for big data
  • 12.
    12 www.kai-waehner.de | @KaiWaehner Improve Customer Experience (CX) Increase Revenue (makemoney) Business Value Decrease Costs (save money) Core Business Platform Increase Operational Efficiency Migrate to Cloud Mitigate Risk (protect money) Key Drivers Strategic Objectives (sample) Fraud Detection IoT sensor ingestion Digital replatforming/ Mainframe Offload Connected Car: Navigation & improved in-car experience: Audi Customer 360 Simplifying Omni-channel Retail at Scale: Target Faster transactional processing / analysis incl. Machine Learning / AI Mainframe Offload: RBC Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Regulatory Digital Transformation Application Modernization: Multiple Examples Website / Core Operations (Central Nervous System) The [Silicon Valley] Digital Natives; LinkedIn, Netflix, Uber, Yelp... Predictive Maintenance: Audi Streaming Platform in a regulated environment (e.g. Electronic Medical Records): Celmatix Real-time app updates Real Time Streaming Platform for Communications and Beyond: Capital One Developer Velocity - Building Stateful Financial Applications with Kafka Streams: Funding Circle Detect Fraud & Prevent Fraud in Real Time: PayPal Kafka as a Service - A Tale of Security and Multi-Tenancy: Apple Example Use Cases $↑ $↓ $ Example Case Studies (of many)
  • 13.
    13 A Kafka Cluster www.kai-waehner.de| @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect
  • 14.
    14 A Kafka Cluster www.kai-waehner.de| @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect “Server Side” of a Kafka Cluster 1 - * Kafka Broker 1 - * ZooKeeper 0 - * Schema Registry 0 - * Kafka Connect 0 - * REST Proxy Security, Ops, Monitoring, … “Client Side” of a Kafka Cluster Kafka Clients (Java, C, C++, Python, Go, JavaScript, …) Kafka Stream Processing Apps (Kafka Streams, ksqlDB) External Producers / Consumers (Oracle, Hadoop, Flink, …)
  • 15.
    15 Why Multiple KafkaClusters? www.kai-waehner.de | @KaiWaehner * Not a representative survey J ** Many DCs does NOT necessarily mean more than one Kafka Cluster
  • 16.
    16 Disaster Recovery –RPO and RTO www.kai-waehner.de | @KaiWaehner RPO = Recovery Point Objective RTO = Recovery Time Objective
  • 17.
    17 Agenda 1) Definition ‘KafkaCluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
  • 18.
    18 A Kafka Cluster www.kai-waehner.de| @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect
  • 19.
    19 A Kafka Clusterfor High Availability www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect PROD / Pre-PROD / TEST 3 - * Kafka Broker 3 / 5 / 7 ZooKeeper 2 Schema Registry 2 - * Kafka Connect 2 - * REST Proxy Security, Ops, Monitoring, … DEV / Functional TEST 1 Kafka Broker 1 ZooKeeper 0 - 1 Schema Registry 0 - 1 Kafka Connect 0 - 1 REST Proxy Security, Ops, Monitoring, …
  • 20.
    20 A Stretched KafkaCluster over 3DC www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry DC1 DC2 DC3
  • 21.
    21 A Stretched KafkaCluster over 3DC www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry DC1 DC2 DC3 High availability (Survives DC outage) Zero data loss and zero downtime Automatic client fail-over Works well in cloud (3 AZs in 1 region) Requires “good” latency (à DCs ”close” to each other) Requires three DCs (Quorum / split brain) Complex to configure and operate
  • 22.
    22 A Stretched KafkaCluster over 2DC www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry DC1 DC2 Kafka Broker
  • 23.
    23 A Stretched KafkaCluster over 2DC www.kai-waehner.de | @KaiWaehner High availability (Survives DC outage) Zero data loss or zero downtime Automatic client fail-over Stopgap solution for on premise (if only 2 DCs available) à 2.5 DC deployment as workaround Requires “good” latency (à DCs ”close” to each other) Quorum in 2 DCs not possible Complex to configure and operate Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry DC1 DC2 Kafka Broker
  • 24.
    24 A Single KafkaCluster www.kai-waehner.de | @KaiWaehner Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Simple setup Works Often used “at the Edge” No high availability
  • 25.
    25 Agenda 1) Definition ‘KafkaCluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
  • 26.
    26 Independent Kafka Clusters www.kai-waehner.de| @KaiWaehner Total Independence Owned by the project teams, central ops or SaaS Different sizing, security, infrastructure Related projects should run on the same Kafka cluster Independent projects can run on the same Kafka cluster • similar SLAs and requirements • e.g. NOT Instant payment vs. log analytics vs. file transfer vs. video streaming • ACLs / RBAC for fine-grained authentication and authorization • throughput typically no issue (Confluent Cloud processes 1 Gigabyte / sec and more in one cluster) • reduce overhead (operations, hardware, …) Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect
  • 27.
    27 Hybrid Integration of2 Kafka Clusters www.kai-waehner.de | @KaiWaehner Hybrid integration On premise and cloud or multi-cloud scenarios (due to technical, business or legal reasons) Uni- or bi-directional Know the best practices or get help Know your SLAs and timelines Choose the right (battle-tested?) tool Works Relatively easy to setup (some tools are complex / not up-to-date / not mature / not documented well) Example: Replicate data from production to analytics cluster Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect MirrorMaker 1 / 2 Confluent Replicator uReplicator (Uber) Mirus (Salesforce) Brooklin (LinkedIn) Custom Replication DC1 DC2 Streaming Replication
  • 28.
    28 Migration of KafkaClusters www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect MirrorMaker 1 / 2 Confluent Replicator uReplicator (Uber) Mirus (Salesforce) Brooklin (LinkedIn) Custom Replication DC1 DC2 Streaming Replication Common migration scenarios On premise à Cloud Cloud A à Cloud B Vendor 1 à Vendor 2 Self-Managed à SaaS Migration steps: 1) Create new Kafka cluster 2) Producer / Consumer re-configuration 3) Shutdown of old Cluster Know the best practices or get help Know your SLAs and timelines Choose the right (battle-tested?) tool
  • 29.
    29 Disaster Recovery with2 Kafka Clusters www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect MirrorMaker 1 / 2 Confluent Replicator uReplicator (Uber) Mirus (Salesforce) Brooklin (LinkedIn) Custom Replication DC1 DC2 Streaming Replication Know the trade-offs! If Kafka Cluster 1 is down, Kafka Cluster 2 is still live and running Timestamp preservation Offset translation Manual client-failover / custom client code Data loss in case of DC outage (asynchronous replication)
  • 30.
    30 Disaster Recovery @JPMorgan www.kai-waehner.de | @KaiWaehner https://www.confluent.io/kafka-summit-san-francisco-2019/secure-kafka-at-scale-in-true-multi-tenant-environment
  • 31.
    31 Aggregation of KafkaClusters www.kai-waehner.de | @KaiWaehner Local smaller Kafka Clusters in each site for critical real time applications (high SLAs) Central bigger Kafka Cluster for analytics use cases (often less critical SLAs) Works Relatively easy to setup (some tools are complex / not up-to-date / not mature / not documented well) Some tools do not support same topic name in each DC Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Kafka Connect Kafka Connect Kafka Broker Kafka Broker Kafka Broker
  • 32.
    32 Aggregation Cluster @Royal Caribbean www.kai-waehner.de | @KaiWaehner https://www.confluent.io/kafka-summit-lon19/seamless-guest-experience-with-kafka-streams/
  • 33.
    33 Aggregation of EdgeKafka Clusters www.kai-waehner.de | @KaiWaehner Small Kafka clusters in each site for data collection (often low SLAs, sometimes single Kafka broker) Kafka at the edge sometimes OEM / hardware appliance Central big Kafka cluster for critical use cases and edge integration (high SLAs) Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Zookeeper Kafka Broker Schema Registry OPC-UA MQTT PLC4X KSQL Grafana Postgres Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Kafka Connect Kafka Connect Kafka Broker Kafka Broker Kafka Broker
  • 34.
    34 Real Time StreamingML at the Edge @ Severstal www.kai-waehner.de | @KaiWaehner https://www.confluent.io/customers/severstal/
  • 35.
    35 Cross-Company Kafka Integration (SpecialCase of Hybrid Integration) www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect MirrorMaker 1 / 2 Confluent Replicator uReplicator (Uber) Mirus (Salesforce) Brooklin (LinkedIn) Custom Replication Company A Company B Streaming Replication Streaming integration between companies API Management (REST et al) not appropriate for streaming data Infosec and politics are your biggest enemy
  • 36.
    36 Agenda 1) Definition ‘KafkaCluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
  • 37.
    37 A Stretched KafkaCluster over 3 Regions? www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry US-East US-Central US-West No! Fail! Error! Sorry!
  • 38.
    38Replication Between KafkaClusters over Multiple Regions or Continents www.kai-waehner.de | @KaiWaehner Streaming replication works (MirrorMaker 2, Confluent Replicator) Same challenges as in one region (data loss, custom code for fail-over, offset translation, etc.) Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Schema Registry Schema Registry Producer Consumer Kafka Connect Kafka Connect China USA Europe
  • 39.
    39 A Single KafkaCluster over 3 Regions with Multi-Region Replication www.kai-waehner.de | @KaiWaehner Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry US-East US-Central US-West
  • 40.
    40A Single MultiRegion Kafka Cluster (MRC) www.kai-waehner.de | @KaiWaehner High availability (Survives region outage) Zero data loss and zero downtime Automatic client fail-over over regions Works well in cloud and on premise No external tools (like MirrorMaker) needed Not part of Open Source Kafka à Build vs. Buy Zookeeper Zookeeper Zookeeper Kafka Broker Kafka Broker Kafka Broker Kafka Connect Schema Registry Producer Consumer Producer Consumer Kafka Connect Schema Registry US-East US-Central US-West How does this work? Region-awareness Synchronous or asynchronous replication per Topic Follower-fetching Regional topic locality Replication rules … (Confluent Platform)
  • 41.
    41A Single MultiRegion Kafka Cluster (MRC) www.kai-waehner.de | @KaiWaehner Broker 1 Broker 2 Broker 3 ZK1 Broker 4 Broker 5 Broker 6 Broker 1 Broker 2 ZK2 Client D Client F Client G Failover site ZK3 Broker 3 Broker 4 Broker 5 Broker 6 Client A Client B us-central-1 Client A Client B automated client failover Observer replicas us-west-1 us-east-1 Site failure! “tie-breaker” datacenter Single Kafka Cluster (Confluent Platform) $ bin/kafka-topics.sh --bootstrap-servers localhost:2181 --create --topic trades-west --partitions 3 --config replication-factor={us-west: 2} --config min.insync.replicas=2 --config async.replication-factor={us-east: 2} --config max.async.time.behind.min=5 --config replay.truncated.messages=true
  • 42.
    42 Vision: One GlobalKafka Cluster www.kai-waehner.de | @KaiWaehner Topic‘pos_payments’
  • 43.
    43 ZooKeeper Removal (KIP-500) www.kai-waehner.de| @KaiWaehner https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
  • 44.
    44 Architecture patterns for distributed,hybrid, edge and global Apache Kafka deployments… www.kai-waehner.de | @KaiWaehner à There is no best solution. It depends!
  • 45.
    45 Agenda 1) Definition ‘KafkaCluster’ 2) One Kafka Cluster 3) Multiple Kafka Clusters 4) Multi-Region / Global Kafka Cluster 5) Infrastructure Options www.kai-waehner.de | @KaiWaehner
  • 46.
    46 Infrastructure Options www.kai-waehner.de |@KaiWaehner Infrastructure is your choice! à Bare metal vs. VM vs. container vs. cloud… Software is your choice! à Open source vs. commercial vs. SaaS Ops and management are your choice! à Self-Managed vs. PaaS vs. fully-managed Integration is your choice! à Kafka-native vs. other tools / services Find the right solution for your business case and for your SLAs…
  • 47.
    4747 CONFLUENT PLATFORM EFFICIENT OPERATIONS ATSCALE PRODUCTION-STAGE PREREQUISITES UNRESTRICTED DEVELOPER PRODUCTIVITY Multi-language Development Rich Pre-built Ecosystem SQL-based Stream Processing GUI-driven Mgmt & Monitoring Flexible DevOps Automation Dynamic Performance & Elasticity Enterprise-grade Security Data Compatibility Global Availability APACHE KAFKA Fully Managed Cloud ServiceSelf Managed Software FREEDOM OF CHOICE COMMITTER-LED EXPERTISE PartnersTraining Professional Services Enterprise Support DEVELOPER OPERATOR ARCHITECT Hybrid Infrastructure
  • 48.
    48 Kafka as aService – Fully Managed? Infrastructure management (commodity) Scaling ● Upgrades (latest stable version of Kafka) ● Patching ● Maintenance ● Sizing (retention, latency, throughput, storage, etc.) ● Data balancing for optimal performance ● Performance tuning for real-time and latency requirements ● Fixing Kafka bugs ● Uptime monitoring and proactive remediation of issues ● Recovery support from data corruption ● Scaling the cluster as needed ● Data balancing the cluster as nodes are added ● Support for any Kafka issue with less than X minutes response time Infra-as-a-Service Harness full power of Kafka Kafka-specific management Platform-as-a-Service Evolve as you need Future-proof Mission-critical reliability Most Kafka-as-a-Service offerings are partially-managed Kafka as a Service should be a serverless experience with consumption-based pricing!
  • 49.
    4949 I N VE S T M E N T & T I M E VALUE 3 4 5 1 2 Event Streaming Maturity Model 49 Initial Awareness / Pilot (1 Kafka Cluster) Start to Build Pipeline / Deliver 1 New Outcome (1 Kafka Cluster) Mission-Critical Deployment (Stretched, Hybrid, Multi-Region) Build Contextual Event-Driven Apps (Stretched, Hybrid, Multi-Region) Central Nervous System (Global Kafka) Product, Support, Training, Partners, Technical Account Management...
  • 50.
    50End-to-End Integration with24/7 Uptime and Zero Data Loss? www.kai-waehner.de | @KaiWaehner …. more components, clusters, technologies means more risks, conflicts, incompatibilities, operations burden! ETL MQ Storage Streaming Messaging: Kafka Core Storage: Kafka Core Caching: Kafka Core Real-Time, Batch: Kafka Clients Integration: Kafka Connect Stream Processing: Kafka Streams / KSQL Request-Response: REST Proxy Replication between Kafka Clusters (Edge, On Premises, Hybrid, Cloud) Multi-Region Kafka Cluster ”Eat your own dog food” vs. http://www.kai-waehner.de/blog/2019/03/07/apache-kafka-middleware-mq-etl-esb-comparison/
  • 51.
    51 Questions? Let’s connect... Kai Waehner TechnologyEvangelist kai.waehner@confluent.io @KaiWaehner www.confluent.io www.kai-waehner.de LinkedIn