Paul Brebner Technology Evangelist www.instaclustr.com sales@instaclustr.com © Instaclustr Pty Limited, 2022 [https://www.instaclustr.com/ company/policies/terms-conditions/]. Except as permitted by the copyright law applicable to you, you may not reproduce, distribute, publish, display, communicate or transmit any of the content of this document, in any form, but any means, without the prior written permission of Instaclustr Pty Limited.
In this Visual Introduction to Kafka, we’re going to build a Postal Service We’ll learn about Kafka Producers, Consumers, Topics, Partitions, Keys, Records, Delivery Semantics (Guaranteed delivery, and who gets what messages), Consumer Groups, Kafka Connect and Streams! ©Instaclustr Pty Limited 2019, 2021, 2022
Kafka is a distributed streams processing system, it allows distributed producers to send messages to distributed consumers via a Kafka cluster. What is ©Instaclustr Pty Limited 2019, 2021, 2022 Kafka?
Kafka has lots of benefits: It’s Fast: It has high throughput and low latency It’s Scalable: It’s horizontally scalable, to scale just add nodes and partitions It’s Reliable: It’s distributed and fault tolerant It has Zero Data Loss: Messages are persisted to disk with an immutable log It’s Open Source: An Apache project And it’s available as an Instaclustr Managed Service: On multiple cloud platforms Managed Service Fast Scalable Reliable Durable Open Source ©Instaclustr Pty Limited 2019, 2021, 2022
But the usual Kafka diagram (right) is a bit monochrome and boring. ©Instaclustr Pty Limited 2019, 2021, 2022
This visual introduction will be more colourful and it’s going to be an extended story… ©Instaclustr Pty Limited 2019, 2021, 2022
Let’s build a modern day fully electronic postal service T o send messages from A to B Postal Service A B ©Instaclustr Pty Limited 2019, 2021, 2022
T o B, the consumer, the recipient of the message. A is a producer, it sends a message… First, we need an “A”. ©Instaclustr Pty Limited 2019, 2021, 2022
Due to the decline in “snail mail” volumes, direct deliveries have been canceled. CANCELED Actually, not. ©Instaclustr Pty Limited 2019, 2021, 2022
Consumers poll for messages by visiting the counter at the post office. Poste Restante is not a post office in a restaurant, it’s called general delivery (in the US). The mail is delivered to a post office, and they hold it for you until you call for it. Instead we have “Poste Restante” Image: La Poste Restante, Francois-Auguste Biard (Wikimedia) ©Instaclustr Pty Limited 2019, 2021, 2022
Disconnected delivery—consumer doesn’t need to be available to receive messages There’s less effort for the messaging service— only has to deliver to a few locations not many consumer addresses And it can scale better and handle more complex delivery semantics! Postal Service Kafka topics act like a Post Office. What are the benefits? ©Instaclustr Pty Limited 2019, 2021, 2022
Kafka Topics have 1 or more Partitions. Partitions function like multiple counters and enable high concurrency. A single counter introduces delays and limits concurrency. More counters increases concurrency and reduces delays. First lets see how it scales. What if there are many consumers for a topic? ©Instaclustr Pty Limited 2019, 2021, 2022
Santa North Pole Let’s see what a message looks like. In Kafka a message is called a Record and is a bit like a letter. The topic is the destination, The North Pole. ©Instaclustr Pty Limited 2019, 2021, 2022
Santa North Pole Time semantics are flexible, either the time of event creation, ingestion, or processing. timestamp, offset, partition T opic The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. ©Instaclustr Pty Limited 2019, 2021, 2022
Santa North Pole We want this letter sent to Santa not just a random Elf. timestamp, offset, partition T opic Key Partition (optional) There’s also a thing called a Key, which is optional. It refines the destination so it’s a bit like the rest of the address. ©Instaclustr Pty Limited 2019, 2021, 2022
Santa North Pole And the value is the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value. timestamp, offset, partition T opic Key Partition (optional) Value (Content) ©Instaclustr Pty Limited 2019, 2021, 2022
Kafka doesn’t look inside the value, but the Producer and Consumer do, and the Consumer can try and make sense of the message (Can you?!) Image: Dear Santa by Zack Poitras / http://theinclusive.net/article.php?id=268 ©Instaclustr Pty Limited 2019, 2021, 2022
let’s look at delivery semantics For example, do we care if the message actually arrives or not? Next ©Instaclustr Pty Limited 2019, 2021, 2022
Last century, homing pigeons were prone to getting lost or eaten by predators, so the same message was sent with several pigeons. Yes we do! Guaranteed message delivery is desirable. ©Instaclustr Pty Limited 2019, 2021, 2022
How does Kafka guarantee delivery? The message is always persisted to disk. This makes it resilient to power failure A Message (M1) is written to a broker (2). Producer M1 M1 Broker 1 Broker 2 Broker 3 ©Instaclustr Pty Limited 2019, 2021, 2022
Producer Broker 1 Broker Broker 3 M1 M1 M1 The message is also replicated on multiple brokers, 3 is typical. 2 ©Instaclustr Pty Limited 2019, 2021, 2022
Producer M1 M1 M1 And makes it resilient to loss of some servers (all but one). Broker 1 ©Instaclustr Pty Limited 2019, 2021, 2022
Finally the producer gets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async). Producer M1 Broker 1 Broker 2 Broker 3 M1 M1 Acknowledgement This also increases the read concurrency as partitions are spread over multiple brokers. The message is now available from more than one broker in case some fail. ©Instaclustr Pty Limited 2019, 2021, 2022
let’s look at another aspect of delivery semantics Who gets the messages and how many times are messages delivered? Now ©Instaclustr Pty Limited 2019, 2021, 2022
Producer Consumer Consumer Consumer Consumer ? Kafka is “pub-sub”. It’s loosely coupled, producers and consumers don’t know about each other. ©Instaclustr Pty Limited 2019, 2021, 2022
Filtering, or which consumers get which messages, is topic based. - Producers send messages to topics. - Consumers subscribe to topics of interest, e.g. parties. - When they poll they only receive messages sent to those topics. None of these consumers will receive messages sent to the “Work” topic. Producer Consumer Consumer Consumer Consumer Topic “Parties” Topic “Work” Consumers subscribed to Topic “Parties” Consumers poll to receive messages from “Parties” Consumers not subscribed to “Work” messages ©Instaclustr Pty Limited 2019, 2021, 2022
A few more details and we can see how this works. Kafka works like Amish Barn raising. Partitions and a consumer group share work across multiple consumers, the more partitions a topic has the more consumers it supports. Image: Paul Cyr ©2018 NorthernMainePhotos.com ©Instaclustr Pty Limited 2019, 2021, 2022
Kafka also works like Clones. It supports delivery of the same message to multiple consumers with consumer groups. Kafka doesn’t throw messages away immediately they are delivered, so the same message can be delivered to multiple consumer groups. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
Consumers subscribed to ”parties” topic are allocated partitions. When they poll they will only get messages from their allocated partitions. Consumer Partition n Topic “Parties” Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Group Consumer Consumer ©Instaclustr Pty Limited 2019, 2021, 2022
This enables consumers in the same group to share the work around. Each consumer gets only a subset of the available messages. Partition n Topic “Parties” Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Consumers share work within groups Consumer ©Instaclustr Pty Limited 2019, 2021, 2022
Multiple groups enable message broadcasting. Messages are duplicated across groups, as each consumer group receives a copy of each message. Consumer Consumer Consumer Consumer Topic “Parties” Partition 1 Partition 2 Partition n Producer Consumer Group Consumer Group Messages are duplicated across Consumer groups ©Instaclustr Pty Limited 2019, 2021, 2022
Which messages are delivered to which consumers? The final aspect of delivery semantics is to do with message keys. If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order (within partitions) is guaranteed. Key ©Instaclustr Pty Limited 2019, 2021, 2022
But if the key is null, then Kafka uses round robin delivery. Each message is delivered to the next partition. Round robin delivery ©Instaclustr Pty Limited 2019, 2021, 2022
Let’s look at a concrete example with two consumer groups: Group 1: Nerds which has multiple consumers Group 2: The Pugsters which has a single consumer, Zug Image: Shutterstock.com Bill Paul Penny Kate Millie Jenny Image: Nenad Aksic / Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
Consumer 1 (Bill) Consumer 2 (Jenny) Consumer 1 (Zug from The Pugsters) Topic “Parties” Partition 1 Partition 2 Partition n Producer Group “Nerds” Group “Pugsters” Consumers subscribe to “Parties” Each message (1, 2, etc.) is sent to the next partition, and consumers allocated to that partition will receive the message when they poll next. Looking at the case where there’s No Keyfirst Round robin No Key 1 2 etc 1 2 1 2 Consumer n ©Instaclustr Pty Limited 2019, 2021, 2022
Here’s what actually happens. We’re not showing the producer, topics, or partitions for simplicity. You’ll have to imagine them. Bill Paul Penny Kate Millie Jenny No Key ©Instaclustr Pty Limited 2019, 2021, 2022
Bill Penny Kate Millie Jenny Both Groups subscribe to T opic“parties” (assuming 6 partitions, each consumer in the Nerds group gets 1 partition each; Zug gets them all) 1 Paul Subscribe to “Parties” No Key ©Instaclustr Pty Limited 2019, 2021, 2022
Bill Pau l Penny Kate Millie Jenny Producer sends record with the value “Pool party—Invitation” to “parties” topic (there’s no key) 2 Invitation No Key ©Instaclustr Pty Limited 2019, 2021, 2022 Value
Bill Paul Penny Kate Millie Jenny Bill and Zug receive a copy of the invitation and plan to attend 3 Invitation Invitation No Key ©Instaclustr Pty Limited 2019, 2021, 2022
Bill Pen ny Pau l Kate Millie Jenny The Producer sends another record with the value “Pool party—Canceled” 4 No Key Invitation Canceled ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation
Bill Paul Penny Kate Millie Jenny In the Nerds group, Jenny gets the message this time as it’s round robin, and Zug gets it as he’s the only consumer in his group: ▶ Jenny ignores it as she didn’t get the original invite ▶ Bill wastes his time trying to go (as he doesn’t know it’s canceled) ▶ The rest of the gang aren’t surprised at not receiving any invites and stay home to do some hacking 5 Invitation Canceled No Key Invitation Canceled ©Instaclustr Pty Limited 2019, 2021, 2022
Zug plans something else fun instead… A jam session with his band Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
Consumer 1 (Bill) Consumer 2 (Jenny) Consumer 1 (Zug) Topic “Parties” Partition 1 Partition 2 Partition n Producer Group “Nerds” Group “Pugster” Consumers subscribe to “Parties” The key is hashed to a partition, so the Message is always sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to the same partition. How does it work if there is a Key? 1,2 3 etc 1,2 3 1,2 3 Consumer n Hashed to partition Key ©Instaclustr Pty Limited 2019, 2021, 2022
Bill Paul Penny Kate Millie Jenny As before Both Groups subscribe to Topic “parties” The Producer sends a record with the key equal to “Pool Party” and the value equal to “Invitation” to “parties” topic Here’s what happens with a key, assuming that the key is the “title” of the message (“Pool Party”), and the value is invitation or canceled 1 2 Key Invitation Key Value ©Instaclustr Pty Limited 2019, 2021, 2022
Bill Paul Penny Kate Millie Jenny As before, Bill and Zug receive a copy of the invitation and plan to attend 3 Invitation Key Key Invitation Key Value ©Instaclustr Pty Limited 2019, 2021, 2022 Value
Bill Pen n y Kate Millie Jenny The Producer sends another record with the same key but with the value “canceled” to “parties” topic 4 Key Invitation Key Value Canceled Paul Value Key ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation Key Value
Paul Penny Kate Millie Jenny This time, Bill and Zug receive the cancelation (the same consumers as the key is identical) 5 BK il el y Key Value ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation Key Value Bill Canceled Key Value Key Invitation BK il el y Key Canceled Key Value Key
Paul Penny Kate Millie The Producer sends out another invitation to a Halloween party. The key is different this time. 6 Key ©Instaclustr Pty Limited 2019, 2021, 2022 Key Jenny Bill
Paul Penny Kate Millie Jenny Jenny receives the Halloween invitation as the key is different and the record is sent to Jenny’s partition. Zug is the only consumer in his group so he gets every record no matter what partition it’s sent to. 7 Key ©Instaclustr Pty Limited 2019, 2021, 2022 Bill Jenny’s partitionkey
This time Zug gets dressed up and has fun at the party. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
But wait! There’s more— event reprocessing (time travel)! Kafka stores message streams on disk, so Consumers can go back and request the same messages they’ve already received, earlier messages, or ignore some messages etc. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
©Instaclustr Pty Limited 2019, 2021, 2022 So Zug can go “back to the future”! ©Instaclustr Pty Limited 2019, 2021, 2022
But! The postal system is global and heterogeneous ©Instaclustr Pty Limited 2019, 2021, 2022
How can post offices be connected? ©Instaclustr Pty Limited 2019, 2021, 2022
Underground pneumatic tubes delivered mail between postal facilities in USA cities in the 1900’s (Source: Wikimediacommons) Compressed Air? ©Instaclustr Pty Limited 2019, 2021, 2022
Sink Source Kafka Connect enables message flows across heterogenous systems. From Sources to Sinks via a Lake (Kafka) ©Instaclustr Pty Limited 2019, 2021, 2022
Kafka Connect Architecture: Source and Sink Connectors ©Instaclustr Pty Limited 2019, 2021, 2022
Form Pipelines (Berlin) ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Paul Brebner)
For Beer? ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Paul Brebner)
Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elasticsearch sink connector Tides Index Example of a Kafka Connect IoT pipeline: Tidal Data à Kafka à Elasticsearch à Kibana ©Instaclustr Pty Limited, 2021 REST source connector {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} ©Instaclustr Pty Limited 2019, 2021, 2022
Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elasticsearch sink connector Tides Index Example of a Kafka Connect IoT pipeline: Tidal Data à Kafka à Elasticsearch à Kibana ©Instaclustr Pty Limited, 2021 REST source connector {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} ©Instaclustr Pty Limited 2019, 2021, 2022 Connectors require Configuration
Source: https://commons.wikimedia.org/wiki/File:Royal_mail_sorting.jpg What’s Still Missing? Mail Sorting ©Instaclustr Pty Limited 2019, 2021, 2022
Automated! Source: https://commons.wikimedia.org/wiki/File:Post_Sorting_Machine_(4479045801).jpg ©Instaclustr Pty Limited 2019, 2021, 2022
At Scale! (Source: https://commons.wikimedia.org/w/index.php?search=mail+sorting&title=Special:MediaSearch&go=Go&type=image) ©Instaclustr Pty Limited 2019, 2021, 2022
Kafka Streams: Topics in, Topics out, via Streams ©Instaclustr Pty Limited 2019, 2021, 2022 All Kafka APIs: Producer, Consumer, Connect, Streams
(Source: Shutterstock) Simple Streams Topology ©Instaclustr Pty Limited 2019, 2021, 2022
Join Group Filter Aggregate etc. Complex Streams Topology
(Source: Shutterstock) Kafka Streams = Rapids!? ©Instaclustr Pty Limited 2019, 2021, 2022
(Source: Shutterstock) Streams get complicated quickly! One way to keep dry… ©Instaclustr Pty Limited 2019, 2021, 2022
Or, This Diagram Which Explains The Order of Streams DSL Operations (Source: https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html) ©Instaclustr Pty Limited 2019, 2021, 2022
Dr Black has been murdered in the Billiard Room with a Candlestick! Whodunnit?! [KSTREAM-FILTER-0000000024]: Conservatory: Professor Plum has no alibi [KSTREAM-FILTER-0000000024]: Library: Colonel Mustard has no alibi [KSTREAM-FILTER-0000000024]: Billiard Room: Mrs White has no alibi Cluedo Kafka Streams Example Tracks who’s in what rooms and when, and emits list of suspects without an alibi ©Instaclustr Pty Limited 2019, 2021, 2022
Topology of Cluedo Streams Example This tool is very useful for visualizing and debugging streams https://zz85.github.io/kafka-streams-viz/ ©Instaclustr Pty Limited 2019, 2021, 2022
Some Kafka Use Cases ©Instaclustr Pty Limited 2019, 2021, 2022
Example 1 - Kafka ”Kongo” Logistics IoT Application – Goods, Warehouses, Trucks, Sensors and Rules ©Instaclustr Pty Limited 2019, 2021, 2022
Detect transportation and storage violations in real-time ©Instaclustr Pty Limited 2019, 2021, 2022
And Kafka Streams to prevent Truck Overloading ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Shutterstock)
Example 2 - One of these things is not like the others ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Shutterstock)
Massively Scalable Anomaly Detection with Kafka and Cassandra ©Instaclustr Pty Limited 2019, 2021, 2022
19 Billion Checks/day with 470 CPU Cores ©Instaclustr Pty Limited 2019, 2021, 2022 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 Billion checks/day Total CPU Cores Anomaly checks/day (billion) 19 Billion
Example 3 - Which Came First? State or Events? ©Instaclustr Pty Limited 2019, 2021, 2022 State (Source: Shutterstock) State Events
Change Data Capture (CDC) with Debezium and Kafka Connect State to Events and back to State again ©Instaclustr Pty Limited 2019, 2021, 2022 State Events State
Apache Kafka: https://kafka.apache.org/ Gently down the Stream: www.gentlydownthe.stream That’s it for this short visual introduction to Apache Kafka. For more information please have a look at the Apache Kafka docs, the Instaclustr Blogs, and check out our free Kafka trial. “Gently down the Stream” - another “Visual” introduction to Kafka, with Otters! ©Instaclustr Pty Limited 2019, 2021, 2022
All of my blogs (Cassandra, Kafka, MirrorMaker, Spark, Zookeeper, OpenSearch, Redis, PostgreSQL, Debezium, Cadence, etc) www.instaclustr.com/paul-brebner/ Kafka Streams Cluedo Example (part of “Kongo” Kafka intro series) www.instaclustr.com/blog/kongo-5-3-apache-kafka-streams-examples/ Kafka Connect Pipeline Series (Tides data processing) www.instaclustr.com/blog/kafka-connect-pipelines-conclusion-pipeline-series-part-10/ Kafka Xmas Tree Lights Simulation (my 1st Kafka program) www.instaclustr.com/blog/seasons-greetings-instaclustr-kafka-christmas-tree-light-simulation/ Instaclustr’s Managed Kafka (Free Trial) www.instaclustr.com/platform/managed-apache-kafka/ Instaclustr Blogs © Instaclustr Pty Limited 2019, 2021, 2022 [https://www.instaclustr.com/company/policies/terms-conditions/]. Except as permitted by the copyright law applicable to you, you may not reproduce, distribute, publish, display, communicate or transmit any of the content of this document, in any form, but any means, without the prior written permission of Instaclustr Pty Limited.

A Visual Introduction to Apache Kafka

  • 1.
    Paul Brebner Technology Evangelist www.instaclustr.com sales@instaclustr.com ©Instaclustr Pty Limited, 2022 [https://www.instaclustr.com/ company/policies/terms-conditions/]. Except as permitted by the copyright law applicable to you, you may not reproduce, distribute, publish, display, communicate or transmit any of the content of this document, in any form, but any means, without the prior written permission of Instaclustr Pty Limited.
  • 2.
    In this VisualIntroduction to Kafka, we’re going to build a Postal Service We’ll learn about Kafka Producers, Consumers, Topics, Partitions, Keys, Records, Delivery Semantics (Guaranteed delivery, and who gets what messages), Consumer Groups, Kafka Connect and Streams! ©Instaclustr Pty Limited 2019, 2021, 2022
  • 3.
    Kafka is adistributed streams processing system, it allows distributed producers to send messages to distributed consumers via a Kafka cluster. What is ©Instaclustr Pty Limited 2019, 2021, 2022 Kafka?
  • 4.
    Kafka has lotsof benefits: It’s Fast: It has high throughput and low latency It’s Scalable: It’s horizontally scalable, to scale just add nodes and partitions It’s Reliable: It’s distributed and fault tolerant It has Zero Data Loss: Messages are persisted to disk with an immutable log It’s Open Source: An Apache project And it’s available as an Instaclustr Managed Service: On multiple cloud platforms Managed Service Fast Scalable Reliable Durable Open Source ©Instaclustr Pty Limited 2019, 2021, 2022
  • 5.
    But the usual Kafkadiagram (right) is a bit monochrome and boring. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 6.
    This visual introduction willbe more colourful and it’s going to be an extended story… ©Instaclustr Pty Limited 2019, 2021, 2022
  • 7.
    Let’s build amodern day fully electronic postal service T o send messages from A to B Postal Service A B ©Instaclustr Pty Limited 2019, 2021, 2022
  • 8.
    T o B, theconsumer, the recipient of the message. A is a producer, it sends a message… First, we need an “A”. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 9.
    Due to thedecline in “snail mail” volumes, direct deliveries have been canceled. CANCELED Actually, not. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 10.
    Consumers poll formessages by visiting the counter at the post office. Poste Restante is not a post office in a restaurant, it’s called general delivery (in the US). The mail is delivered to a post office, and they hold it for you until you call for it. Instead we have “Poste Restante” Image: La Poste Restante, Francois-Auguste Biard (Wikimedia) ©Instaclustr Pty Limited 2019, 2021, 2022
  • 11.
    Disconnected delivery—consumer doesn’tneed to be available to receive messages There’s less effort for the messaging service— only has to deliver to a few locations not many consumer addresses And it can scale better and handle more complex delivery semantics! Postal Service Kafka topics act like a Post Office. What are the benefits? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 12.
    Kafka Topics have1 or more Partitions. Partitions function like multiple counters and enable high concurrency. A single counter introduces delays and limits concurrency. More counters increases concurrency and reduces delays. First lets see how it scales. What if there are many consumers for a topic? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 13.
    Santa North Pole Let’s see whata message looks like. In Kafka a message is called a Record and is a bit like a letter. The topic is the destination, The North Pole. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 14.
    Santa North Pole Time semanticsare flexible, either the time of event creation, ingestion, or processing. timestamp, offset, partition T opic The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 15.
    Santa North Pole We want thisletter sent to Santa not just a random Elf. timestamp, offset, partition T opic Key Partition (optional) There’s also a thing called a Key, which is optional. It refines the destination so it’s a bit like the rest of the address. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 16.
    Santa North Pole And the valueis the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value. timestamp, offset, partition T opic Key Partition (optional) Value (Content) ©Instaclustr Pty Limited 2019, 2021, 2022
  • 17.
    Kafka doesn’t look insidethe value, but the Producer and Consumer do, and the Consumer can try and make sense of the message (Can you?!) Image: Dear Santa by Zack Poitras / http://theinclusive.net/article.php?id=268 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 18.
    let’s look at deliverysemantics For example, do we care if the message actually arrives or not? Next ©Instaclustr Pty Limited 2019, 2021, 2022
  • 19.
    Last century, homingpigeons were prone to getting lost or eaten by predators, so the same message was sent with several pigeons. Yes we do! Guaranteed message delivery is desirable. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 20.
    How does Kafkaguarantee delivery? The message is always persisted to disk. This makes it resilient to power failure A Message (M1) is written to a broker (2). Producer M1 M1 Broker 1 Broker 2 Broker 3 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 21.
    Producer Broker 1 Broker Broker 3 M1 M1M1 The message is also replicated on multiple brokers, 3 is typical. 2 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 22.
    Producer M1 M1 M1 And makesit resilient to loss of some servers (all but one). Broker 1 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 23.
    Finally the producergets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async). Producer M1 Broker 1 Broker 2 Broker 3 M1 M1 Acknowledgement This also increases the read concurrency as partitions are spread over multiple brokers. The message is now available from more than one broker in case some fail. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 24.
    let’s look atanother aspect of delivery semantics Who gets the messages and how many times are messages delivered? Now ©Instaclustr Pty Limited 2019, 2021, 2022
  • 25.
    Producer Consumer Consumer Consumer Consumer ? Kafka is “pub-sub”.It’s loosely coupled, producers and consumers don’t know about each other. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 26.
    Filtering, or whichconsumers get which messages, is topic based. - Producers send messages to topics. - Consumers subscribe to topics of interest, e.g. parties. - When they poll they only receive messages sent to those topics. None of these consumers will receive messages sent to the “Work” topic. Producer Consumer Consumer Consumer Consumer Topic “Parties” Topic “Work” Consumers subscribed to Topic “Parties” Consumers poll to receive messages from “Parties” Consumers not subscribed to “Work” messages ©Instaclustr Pty Limited 2019, 2021, 2022
  • 27.
    A few moredetails and we can see how this works. Kafka works like Amish Barn raising. Partitions and a consumer group share work across multiple consumers, the more partitions a topic has the more consumers it supports. Image: Paul Cyr ©2018 NorthernMainePhotos.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 28.
    Kafka also workslike Clones. It supports delivery of the same message to multiple consumers with consumer groups. Kafka doesn’t throw messages away immediately they are delivered, so the same message can be delivered to multiple consumer groups. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 29.
    Consumers subscribed to”parties” topic are allocated partitions. When they poll they will only get messages from their allocated partitions. Consumer Partition n Topic “Parties” Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Group Consumer Consumer ©Instaclustr Pty Limited 2019, 2021, 2022
  • 30.
    This enables consumersin the same group to share the work around. Each consumer gets only a subset of the available messages. Partition n Topic “Parties” Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Consumers share work within groups Consumer ©Instaclustr Pty Limited 2019, 2021, 2022
  • 31.
    Multiple groups enablemessage broadcasting. Messages are duplicated across groups, as each consumer group receives a copy of each message. Consumer Consumer Consumer Consumer Topic “Parties” Partition 1 Partition 2 Partition n Producer Consumer Group Consumer Group Messages are duplicated across Consumer groups ©Instaclustr Pty Limited 2019, 2021, 2022
  • 32.
    Which messages aredelivered to which consumers? The final aspect of delivery semantics is to do with message keys. If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order (within partitions) is guaranteed. Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 33.
    But if thekey is null, then Kafka uses round robin delivery. Each message is delivered to the next partition. Round robin delivery ©Instaclustr Pty Limited 2019, 2021, 2022
  • 34.
    Let’s look ata concrete example with two consumer groups: Group 1: Nerds which has multiple consumers Group 2: The Pugsters which has a single consumer, Zug Image: Shutterstock.com Bill Paul Penny Kate Millie Jenny Image: Nenad Aksic / Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 35.
    Consumer 1 (Bill) Consumer 2 (Jenny) Consumer1 (Zug from The Pugsters) Topic “Parties” Partition 1 Partition 2 Partition n Producer Group “Nerds” Group “Pugsters” Consumers subscribe to “Parties” Each message (1, 2, etc.) is sent to the next partition, and consumers allocated to that partition will receive the message when they poll next. Looking at the case where there’s No Keyfirst Round robin No Key 1 2 etc 1 2 1 2 Consumer n ©Instaclustr Pty Limited 2019, 2021, 2022
  • 36.
    Here’s what actuallyhappens. We’re not showing the producer, topics, or partitions for simplicity. You’ll have to imagine them. Bill Paul Penny Kate Millie Jenny No Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 37.
    Bill Penny Kate Millie Jenny Both Groups subscribeto T opic“parties” (assuming 6 partitions, each consumer in the Nerds group gets 1 partition each; Zug gets them all) 1 Paul Subscribe to “Parties” No Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 38.
    Bill Pau l Penny Kate Millie Jenny Producer sends recordwith the value “Pool party—Invitation” to “parties” topic (there’s no key) 2 Invitation No Key ©Instaclustr Pty Limited 2019, 2021, 2022 Value
  • 39.
    Bill Paul Penny Kate Millie Jenny Bill and Zugreceive a copy of the invitation and plan to attend 3 Invitation Invitation No Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 40.
    Bill Pen ny Pau l Kate Millie Jenny The Producersends another record with the value “Pool party—Canceled” 4 No Key Invitation Canceled ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation
  • 41.
    Bill Paul Penny Kate Millie Jenny In the Nerdsgroup, Jenny gets the message this time as it’s round robin, and Zug gets it as he’s the only consumer in his group: ▶ Jenny ignores it as she didn’t get the original invite ▶ Bill wastes his time trying to go (as he doesn’t know it’s canceled) ▶ The rest of the gang aren’t surprised at not receiving any invites and stay home to do some hacking 5 Invitation Canceled No Key Invitation Canceled ©Instaclustr Pty Limited 2019, 2021, 2022
  • 42.
    Zug plans something else funinstead… A jam session with his band Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 43.
    Consumer 1 (Bill) Consumer 2 (Jenny) Consumer1 (Zug) Topic “Parties” Partition 1 Partition 2 Partition n Producer Group “Nerds” Group “Pugster” Consumers subscribe to “Parties” The key is hashed to a partition, so the Message is always sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to the same partition. How does it work if there is a Key? 1,2 3 etc 1,2 3 1,2 3 Consumer n Hashed to partition Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 44.
    Bill Paul Penny Kate Millie Jenny As before BothGroups subscribe to Topic “parties” The Producer sends a record with the key equal to “Pool Party” and the value equal to “Invitation” to “parties” topic Here’s what happens with a key, assuming that the key is the “title” of the message (“Pool Party”), and the value is invitation or canceled 1 2 Key Invitation Key Value ©Instaclustr Pty Limited 2019, 2021, 2022
  • 45.
    Bill Paul Penny Kate Millie Jenny As before, Billand Zug receive a copy of the invitation and plan to attend 3 Invitation Key Key Invitation Key Value ©Instaclustr Pty Limited 2019, 2021, 2022 Value
  • 46.
    Bill Pen n y Kate Millie Jenny The Producersends another record with the same key but with the value “canceled” to “parties” topic 4 Key Invitation Key Value Canceled Paul Value Key ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation Key Value
  • 47.
    Paul Penny Kate Millie Jenny This time, Billand Zug receive the cancelation (the same consumers as the key is identical) 5 BK il el y Key Value ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation Key Value Bill Canceled Key Value Key Invitation BK il el y Key Canceled Key Value Key
  • 48.
    Paul Penny Kate Millie The Producer sendsout another invitation to a Halloween party. The key is different this time. 6 Key ©Instaclustr Pty Limited 2019, 2021, 2022 Key Jenny Bill
  • 49.
    Paul Penny Kate Millie Jenny Jenny receives theHalloween invitation as the key is different and the record is sent to Jenny’s partition. Zug is the only consumer in his group so he gets every record no matter what partition it’s sent to. 7 Key ©Instaclustr Pty Limited 2019, 2021, 2022 Bill Jenny’s partitionkey
  • 50.
    This time Zug getsdressed up and has fun at the party. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 51.
    But wait! There’smore— event reprocessing (time travel)! Kafka stores message streams on disk, so Consumers can go back and request the same messages they’ve already received, earlier messages, or ignore some messages etc. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 52.
    ©Instaclustr Pty Limited2019, 2021, 2022 So Zug can go “back to the future”! ©Instaclustr Pty Limited 2019, 2021, 2022
  • 53.
    But! The postalsystem is global and heterogeneous ©Instaclustr Pty Limited 2019, 2021, 2022
  • 54.
    How can post offices beconnected? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 55.
    Underground pneumatic tubesdelivered mail between postal facilities in USA cities in the 1900’s (Source: Wikimediacommons) Compressed Air? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 56.
    Sink Source Kafka Connect enablesmessage flows across heterogenous systems. From Sources to Sinks via a Lake (Kafka) ©Instaclustr Pty Limited 2019, 2021, 2022
  • 57.
    Kafka Connect Architecture: Sourceand Sink Connectors ©Instaclustr Pty Limited 2019, 2021, 2022
  • 58.
    Form Pipelines (Berlin) ©InstaclustrPty Limited 2019, 2021, 2022 (Source: Paul Brebner)
  • 59.
    For Beer? ©Instaclustr PtyLimited 2019, 2021, 2022 (Source: Paul Brebner)
  • 60.
    Tides Topic REST call JSONresult {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elasticsearch sink connector Tides Index Example of a Kafka Connect IoT pipeline: Tidal Data à Kafka à Elasticsearch à Kibana ©Instaclustr Pty Limited, 2021 REST source connector {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} ©Instaclustr Pty Limited 2019, 2021, 2022
  • 61.
    Tides Topic REST call JSONresult {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elasticsearch sink connector Tides Index Example of a Kafka Connect IoT pipeline: Tidal Data à Kafka à Elasticsearch à Kibana ©Instaclustr Pty Limited, 2021 REST source connector {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} ©Instaclustr Pty Limited 2019, 2021, 2022 Connectors require Configuration
  • 62.
    Source: https://commons.wikimedia.org/wiki/File:Royal_mail_sorting.jpg What’s StillMissing? Mail Sorting ©Instaclustr Pty Limited 2019, 2021, 2022
  • 63.
  • 64.
  • 65.
    Kafka Streams: Topics in,Topics out, via Streams ©Instaclustr Pty Limited 2019, 2021, 2022 All Kafka APIs: Producer, Consumer, Connect, Streams
  • 66.
    (Source: Shutterstock) Simple StreamsTopology ©Instaclustr Pty Limited 2019, 2021, 2022
  • 67.
  • 68.
    (Source: Shutterstock) Kafka Streams= Rapids!? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 69.
    (Source: Shutterstock) Streams getcomplicated quickly! One way to keep dry… ©Instaclustr Pty Limited 2019, 2021, 2022
  • 70.
    Or, This DiagramWhich Explains The Order of Streams DSL Operations (Source: https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html) ©Instaclustr Pty Limited 2019, 2021, 2022
  • 71.
    Dr Black hasbeen murdered in the Billiard Room with a Candlestick! Whodunnit?! [KSTREAM-FILTER-0000000024]: Conservatory: Professor Plum has no alibi [KSTREAM-FILTER-0000000024]: Library: Colonel Mustard has no alibi [KSTREAM-FILTER-0000000024]: Billiard Room: Mrs White has no alibi Cluedo Kafka Streams Example Tracks who’s in what rooms and when, and emits list of suspects without an alibi ©Instaclustr Pty Limited 2019, 2021, 2022
  • 72.
    Topology of CluedoStreams Example This tool is very useful for visualizing and debugging streams https://zz85.github.io/kafka-streams-viz/ ©Instaclustr Pty Limited 2019, 2021, 2022
  • 73.
    Some Kafka UseCases ©Instaclustr Pty Limited 2019, 2021, 2022
  • 74.
    Example 1 -Kafka ”Kongo” Logistics IoT Application – Goods, Warehouses, Trucks, Sensors and Rules ©Instaclustr Pty Limited 2019, 2021, 2022
  • 75.
    Detect transportation andstorage violations in real-time ©Instaclustr Pty Limited 2019, 2021, 2022
  • 76.
    And Kafka Streamsto prevent Truck Overloading ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Shutterstock)
  • 77.
    Example 2 -One of these things is not like the others ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Shutterstock)
  • 78.
    Massively Scalable Anomaly Detectionwith Kafka and Cassandra ©Instaclustr Pty Limited 2019, 2021, 2022
  • 79.
    19 Billion Checks/daywith 470 CPU Cores ©Instaclustr Pty Limited 2019, 2021, 2022 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 Billion checks/day Total CPU Cores Anomaly checks/day (billion) 19 Billion
  • 80.
    Example 3 -Which Came First? State or Events? ©Instaclustr Pty Limited 2019, 2021, 2022 State (Source: Shutterstock) State Events
  • 81.
    Change Data Capture(CDC) with Debezium and Kafka Connect State to Events and back to State again ©Instaclustr Pty Limited 2019, 2021, 2022 State Events State
  • 82.
    Apache Kafka: https://kafka.apache.org/ Gently downthe Stream: www.gentlydownthe.stream That’s it for this short visual introduction to Apache Kafka. For more information please have a look at the Apache Kafka docs, the Instaclustr Blogs, and check out our free Kafka trial. “Gently down the Stream” - another “Visual” introduction to Kafka, with Otters! ©Instaclustr Pty Limited 2019, 2021, 2022
  • 83.
    All of myblogs (Cassandra, Kafka, MirrorMaker, Spark, Zookeeper, OpenSearch, Redis, PostgreSQL, Debezium, Cadence, etc) www.instaclustr.com/paul-brebner/ Kafka Streams Cluedo Example (part of “Kongo” Kafka intro series) www.instaclustr.com/blog/kongo-5-3-apache-kafka-streams-examples/ Kafka Connect Pipeline Series (Tides data processing) www.instaclustr.com/blog/kafka-connect-pipelines-conclusion-pipeline-series-part-10/ Kafka Xmas Tree Lights Simulation (my 1st Kafka program) www.instaclustr.com/blog/seasons-greetings-instaclustr-kafka-christmas-tree-light-simulation/ Instaclustr’s Managed Kafka (Free Trial) www.instaclustr.com/platform/managed-apache-kafka/ Instaclustr Blogs © Instaclustr Pty Limited 2019, 2021, 2022 [https://www.instaclustr.com/company/policies/terms-conditions/]. Except as permitted by the copyright law applicable to you, you may not reproduce, distribute, publish, display, communicate or transmit any of the content of this document, in any form, but any means, without the prior written permission of Instaclustr Pty Limited.