1 What is Apache Kafka and What is an Event Streaming Platform? Bern Apache Kafka® Meetup
2 Join the Confluent Community Slack Channel Subscribe to the Confluent blog cnfl.io/community-slack cnfl.io/read Welcome to the Apache Kafka® Meetup in Bern! 6:00pm Doors open 6:00pm - 6:30pm Food, Drinks and Networking 6:30pm – 6:50pm Matthias Imsand, Amanox Solutions 6:50pm - 7:35pm Gabriel Schenker, Confluent 7:35pm - 8:00pm Additional Q&A & Networking Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event.
3 About Me ● Gabriel N. Schenker ● Lead Curriculum Developer @ Confluent ● Formerly at Docker, Alienvault, … ● Lives in Appenzell, AI ● Github: github.org/gnschenker ● Twitter: @gnschenker
44 What is an Event Streaming Platform?
5 Event Streaming Platforms should do two things: Reliably store streams of events Process streams of events
66 The Event Streaming Paradigm
77 ETL/Data Integration Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency)
88 ETL/Data Integration Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency)
99 ETL/Data Integration Messaging Transient MessagesStored records ETL/Data Integration MessagingMessaging Batch Expensive Time Consuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency) Event Streaming Paradigm High Throughput Durable Persistent Maintains Order Fast (Low Latency)
1010 Fast (Low Latency) Event Streaming Paradigm To rethink data as not stored records or transient messages, but instead as a continually updating stream of events
1111 Fast (Low Latency) Event Streaming Paradigm
12
13
14
15
16C O N F I D E N T I A L Mainframes Hadoop Data Warehouse ... Device Logs ... Splunk ... App App Microservice ... Data Stores Custom Apps/MicroservicesLogs 3rd Party Apps Universal Event Pipeline Real-Time Inventory Real-Time Fraud Detection Real-Time Customer 360 Machine Learning Models Real-Time Data Transformation ... Contextual Event-Driven Apps Apache Kafka® STREAMS CONNECT CLIENTS
17
18 A Modern, Distributed Platform for Data Streams
19 Apache Kafka® is made up of distributed, immutable, append-only commit logs
20 Writers Kafka cluster Readers
2121 Kafka: Scalability of a filesystem • hundreds of MB/s throughput • many TB per server • commodity hardware
2222 Kafka: Guarantees of a Database • Strict ordering • Persistence
2323 Kafka: Rewind and Replay Rewind & Replay Reset to any point in the shared narrative
2424 Kafka: Distributed by design • Replication • Fault Tolerance • Partitioning • Elastic Scaling
2525 Kafka Topics my-topic my-topic-partition-0 my-topic-partition-1 my-topic-partition-2 broker-1 broker-2 broker-3
2626 Creating a topic $ kafka-topics --bootstrap-server broker101:9092 --create --topic my-topic --replication-factor 3 --partitions 3
2727 Producing to Kafka Time
2828 Producing to Kafka Time C CC
2929 Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
3030 Partition Leadership and Replication - node failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
3131 Producing to Kafka
3232 Producer Clients - Producer Design Producer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No Can’t retry, throw exception Success: return metadata Yes [Headers] [Key]
3333 The Serializer Kafka doesn’t care about what you send to it as long as it’s been converted to a byte stream beforehand. JSON CSV Avro Protobufs XML SERIALIZERS 01001010 01010011 01001111 01001110 01000011 01010011 01010110 01001010 01010011 01001111 01001110 01010000 01110010 01101111 01110100 ... 01011000 01001101 01001100 (if you must) Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
3434 The Serializer private Properties settings = new Properties(); settings.put("bootstrap.servers", "broker1:9092,broker2:9092"); settings.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); settings.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer"); settings.put("schema.registry.url", "https://schema-registry:8083"); producer = new KafkaProducer<String, Invoice>(settings); Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
3535 Producer Record Topic [Partition] [Key] Value Record keys determine the partition with the default kafka partitioner If a key isn’t provided, messages will be produced in a round robin fashion partitioner Record Keys and why they’re important - Ordering
3636 Producer Record Topic [Partition] AAAA Value Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record Keys and why they’re important - Ordering
3737 Producer Record Topic [Partition] BBBB Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Record Keys and why they’re important - Ordering
3838 Producer Record Topic [Partition] CCCC Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Record Keys and why they’re important - Ordering
3939 Record Keys and why they’re important - Ordering Producer Record Topic [Partition] DDDD Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
4040 Record Keys and why they’re important - Key Cardinality Consumers Key cardinality affects the amount of work done by the individual consumers in a group. Poor key choice can lead to uneven workloads. Keys in Kafka don’t have to be primitives, like strings or ints. Like values, they can be be anything: JSON, Avro, etc… So create a key that will evenly distribute groups of records around the partitions. Car·di·nal·i·ty /ˌkärdəˈnalədē/ Noun the number of elements in a set or other grouping, as a property of that grouping.
4141 { “Name”: “John Smith”, “Address”: “123 Apple St.”, “Zip”: “19101” } You don’t have to but... use a Schema! Data Producer Service Data Consumer Service { "Name": "John Smith", "Address": "123 Apple St.", "City": "Philadelphia", "State": "PA", "Zip": "19101" } send JSON “Where’s record.City?” Reference https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you -really-need-one/
4242 Schema Registry: Make Data Backwards Compatible and Future-Proof ● Define the expected fields for each Kafka topic ● Automatically handle schema changes (e.g. new fields) ● Prevent backwards incompatible changes ● Support multi-data center environments Elastic Cassandra HDFS Example Consumers Serializer App 1 Serializer App 2 ! Kafka Topic! Schema Registry Open Source Feature
4343 Developing with Confluent Schema Registry We provide several Maven plugins for developing with the Confluent Schema Registry ● download - download a subject’s schema to your project ● register - register a new schema to the schema registry from your development env ● test-compatibility - test changes made to a schema against compatibility rules set by the schema registry Reference https://docs.confluent.io/current/schema-registry/docs/maven-plugin.html <plugin> <groupId>io.confluent</groupId> <artifactId>kafka-schema-registry-maven-plugin</ <version>5.0.0</version> <configuration> <schemaRegistryUrls> <param>http://192.168.99.100:8081</p </schemaRegistryUrls> <outputDirectory>src/main/avro</outputDi <subjectPatterns> <param>^TestSubject000-(key|value)$< </subjectPatterns> </configuration> </plugin>
4444 { "Name": "John Smith", "Address": "123 Apple St.", "Zip": "19101", "City": "NA", "State": "NA" } Avro allows for evolution of schemas { "Name": "John Smith", "Address": "123 Apple St.", "City": "Philadelphia", "State": "PA", "Zip": "19101" } Data Producer Service Data Consumer Service send AvroRecord Schema Registry Version 1Version 2 Reference https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you -really-need-one/
4545 Use Kafka’s Headers Reference https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers Producer Record Topic [Partition] [Timestamp] Value [Headers] [Key] Kafka Headers are simply an interface that requires a key of type String, and a value of type byte[], the headers are stored in an iterator in the ProducerRecord . Example Use Cases ● Data lineage: reference previous topic partition/offsets ● Producing host/application/owner ● Message routing ● Encryption metadata (which key pair was this message payload encrypted with?)
4646 Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=0 Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
4747 Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 ack Producer Properties acks=1 Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
4848 Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 ack
4949 Producer Guarantees - without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234 data: abcd} - offset 3345 Failed ack Successful write Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
5050 Producer Guarantees - without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234, data: abcd} - offset 3345 {key: 1234, data: abcd} - offset 3346 retry ack dupe! Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
5151 Producer Guarantees - with exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties enable.idempotence=true max.inflight.requests.per.connection=5 acks = "all" retries > 0 (preferably MAX_INT) (pid, seq) [payload] (100, 1) {key: 1234, data: abcd} - offset 3345 (100, 1) {key: 1234, data: abcd} - rejected, ack re-sent (100, 2) {key: 5678, data: efgh} - offset 3346 retry ack no dupe! Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
5252 Transactional Producer Producer T1 T1 T1 T1 T1 KafkaProducer producer = createKafkaProducer( "bootstrap.servers", "broker:9092", "transactional.id", "my-transactional-id"); producer.initTransactions(); -- send some records -- producer.commitTransaction(); Consumer KafkaConsumer consumer = createKafkaConsumer( "bootstrap.servers", "broker:9092", "group.id", "my-group-id", "isolation.level", "read_committed"); Reference https://www.confluent.io/blog/transactions-apache-kafka/
5353 Consuming from Kafka
5454 A basic Java Consumer final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props); consumer.subscribe(Arrays.asList(topic)); try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { // Do Some Work … } } } finally { consumer.close(); } }
5555 Consuming From Kafka - Single Consumer C
5656 Consuming From Kafka - Grouped Consumers CC C1 CC C2
5757 Consuming From Kafka - Grouped Consumers C C C C
5858 Consuming From Kafka - Grouped Consumers 0 1 2 3
5959 Consuming From Kafka - Grouped Consumers 0 1 2 3
6060 Consuming From Kafka - Grouped Consumers 0, 3 1 2 3
6161 Resources Free E-Books from Confluent! https://www.confluent.io/apache-kafka-stream-processing-book-bundle Confluent Blog: https://www.confluent.io/blog Thank You! gabriel@confluent.io @gnschenker
6262 Thank You!
25% off! KS19Comm25 25% off! KS19Comm25

What is Apache Kafka and What is an Event Streaming Platform?

  • 1.
    1 What is ApacheKafka and What is an Event Streaming Platform? Bern Apache Kafka® Meetup
  • 2.
    2 Join the Confluent CommunitySlack Channel Subscribe to the Confluent blog cnfl.io/community-slack cnfl.io/read Welcome to the Apache Kafka® Meetup in Bern! 6:00pm Doors open 6:00pm - 6:30pm Food, Drinks and Networking 6:30pm – 6:50pm Matthias Imsand, Amanox Solutions 6:50pm - 7:35pm Gabriel Schenker, Confluent 7:35pm - 8:00pm Additional Q&A & Networking Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event.
  • 3.
    3 About Me ● GabrielN. Schenker ● Lead Curriculum Developer @ Confluent ● Formerly at Docker, Alienvault, … ● Lives in Appenzell, AI ● Github: github.org/gnschenker ● Twitter: @gnschenker
  • 4.
    44 What is an EventStreaming Platform?
  • 5.
    5 Event Streaming Platforms should do two things: Reliablystore streams of events Process streams of events
  • 6.
  • 7.
    77 ETL/Data Integration Messaging Batch Expensive TimeConsuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 8.
    88 ETL/Data Integration Messaging Batch Expensive TimeConsuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 9.
    99 ETL/Data Integration Messaging TransientMessagesStored records ETL/Data Integration MessagingMessaging Batch Expensive Time Consuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency) Event Streaming Paradigm High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 10.
    1010 Fast (Low Latency) EventStreaming Paradigm To rethink data as not stored records or transient messages, but instead as a continually updating stream of events
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    16C O NF I D E N T I A L Mainframes Hadoop Data Warehouse ... Device Logs ... Splunk ... App App Microservice ... Data Stores Custom Apps/MicroservicesLogs 3rd Party Apps Universal Event Pipeline Real-Time Inventory Real-Time Fraud Detection Real-Time Customer 360 Machine Learning Models Real-Time Data Transformation ... Contextual Event-Driven Apps Apache Kafka® STREAMS CONNECT CLIENTS
  • 17.
  • 18.
    18 A Modern, DistributedPlatform for Data Streams
  • 19.
    19 Apache Kafka® ismade up of distributed, immutable, append-only commit logs
  • 20.
  • 21.
    2121 Kafka: Scalability ofa filesystem • hundreds of MB/s throughput • many TB per server • commodity hardware
  • 22.
    2222 Kafka: Guarantees ofa Database • Strict ordering • Persistence
  • 23.
    2323 Kafka: Rewind andReplay Rewind & Replay Reset to any point in the shared narrative
  • 24.
    2424 Kafka: Distributed bydesign • Replication • Fault Tolerance • Partitioning • Elastic Scaling
  • 25.
  • 26.
    2626 Creating a topic $kafka-topics --bootstrap-server broker101:9092 --create --topic my-topic --replication-factor 3 --partitions 3
  • 27.
  • 28.
  • 29.
    2929 Partition Leadership andReplication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 30.
    3030 Partition Leadership andReplication - node failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 31.
  • 32.
    3232 Producer Clients - ProducerDesign Producer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No Can’t retry, throw exception Success: return metadata Yes [Headers] [Key]
  • 33.
    3333 The Serializer Kafka doesn’tcare about what you send to it as long as it’s been converted to a byte stream beforehand. JSON CSV Avro Protobufs XML SERIALIZERS 01001010 01010011 01001111 01001110 01000011 01010011 01010110 01001010 01010011 01001111 01001110 01010000 01110010 01101111 01110100 ... 01011000 01001101 01001100 (if you must) Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
  • 34.
    3434 The Serializer private Propertiessettings = new Properties(); settings.put("bootstrap.servers", "broker1:9092,broker2:9092"); settings.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); settings.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer"); settings.put("schema.registry.url", "https://schema-registry:8083"); producer = new KafkaProducer<String, Invoice>(settings); Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
  • 35.
    3535 Producer Record Topic [Partition] [Key] Value Record keysdetermine the partition with the default kafka partitioner If a key isn’t provided, messages will be produced in a round robin fashion partitioner Record Keys and why they’re important - Ordering
  • 36.
    3636 Producer Record Topic [Partition] AAAA Value Record keysdetermine the partition with the default kafka partitioner, and therefore guarantee order for a key Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record Keys and why they’re important - Ordering
  • 37.
    3737 Producer Record Topic [Partition] BBBB Value Keys areused in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Record Keys and why they’re important - Ordering
  • 38.
    3838 Producer Record Topic [Partition] CCCC Value Keys areused in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Record Keys and why they’re important - Ordering
  • 39.
    3939 Record Keys andwhy they’re important - Ordering Producer Record Topic [Partition] DDDD Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
  • 40.
    4040 Record Keys andwhy they’re important - Key Cardinality Consumers Key cardinality affects the amount of work done by the individual consumers in a group. Poor key choice can lead to uneven workloads. Keys in Kafka don’t have to be primitives, like strings or ints. Like values, they can be be anything: JSON, Avro, etc… So create a key that will evenly distribute groups of records around the partitions. Car·di·nal·i·ty /ˌkärdəˈnalədē/ Noun the number of elements in a set or other grouping, as a property of that grouping.
  • 41.
    4141 { “Name”: “John Smith”, “Address”:“123 Apple St.”, “Zip”: “19101” } You don’t have to but... use a Schema! Data Producer Service Data Consumer Service { "Name": "John Smith", "Address": "123 Apple St.", "City": "Philadelphia", "State": "PA", "Zip": "19101" } send JSON “Where’s record.City?” Reference https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you -really-need-one/
  • 42.
    4242 Schema Registry: MakeData Backwards Compatible and Future-Proof ● Define the expected fields for each Kafka topic ● Automatically handle schema changes (e.g. new fields) ● Prevent backwards incompatible changes ● Support multi-data center environments Elastic Cassandra HDFS Example Consumers Serializer App 1 Serializer App 2 ! Kafka Topic! Schema Registry Open Source Feature
  • 43.
    4343 Developing with ConfluentSchema Registry We provide several Maven plugins for developing with the Confluent Schema Registry ● download - download a subject’s schema to your project ● register - register a new schema to the schema registry from your development env ● test-compatibility - test changes made to a schema against compatibility rules set by the schema registry Reference https://docs.confluent.io/current/schema-registry/docs/maven-plugin.html <plugin> <groupId>io.confluent</groupId> <artifactId>kafka-schema-registry-maven-plugin</ <version>5.0.0</version> <configuration> <schemaRegistryUrls> <param>http://192.168.99.100:8081</p </schemaRegistryUrls> <outputDirectory>src/main/avro</outputDi <subjectPatterns> <param>^TestSubject000-(key|value)$< </subjectPatterns> </configuration> </plugin>
  • 44.
    4444 { "Name": "John Smith", "Address":"123 Apple St.", "Zip": "19101", "City": "NA", "State": "NA" } Avro allows for evolution of schemas { "Name": "John Smith", "Address": "123 Apple St.", "City": "Philadelphia", "State": "PA", "Zip": "19101" } Data Producer Service Data Consumer Service send AvroRecord Schema Registry Version 1Version 2 Reference https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you -really-need-one/
  • 45.
    4545 Use Kafka’s Headers Reference https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers ProducerRecord Topic [Partition] [Timestamp] Value [Headers] [Key] Kafka Headers are simply an interface that requires a key of type String, and a value of type byte[], the headers are stored in an iterator in the ProducerRecord . Example Use Cases ● Data lineage: reference previous topic partition/offsets ● Producing host/application/owner ● Message routing ● Encryption metadata (which key pair was this message payload encrypted with?)
  • 46.
    4646 Producer Guarantees P Broker 1Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=0 Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 47.
    4747 Producer Guarantees P Broker 1Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 ack Producer Properties acks=1 Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 48.
    4848 Producer Guarantees P Broker 1Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 ack
  • 49.
    4949 Producer Guarantees -without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234 data: abcd} - offset 3345 Failed ack Successful write Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 50.
    5050 Producer Guarantees -without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234, data: abcd} - offset 3345 {key: 1234, data: abcd} - offset 3346 retry ack dupe! Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 51.
    5151 Producer Guarantees -with exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties enable.idempotence=true max.inflight.requests.per.connection=5 acks = "all" retries > 0 (preferably MAX_INT) (pid, seq) [payload] (100, 1) {key: 1234, data: abcd} - offset 3345 (100, 1) {key: 1234, data: abcd} - rejected, ack re-sent (100, 2) {key: 5678, data: efgh} - offset 3346 retry ack no dupe! Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 52.
    5252 Transactional Producer Producer T1 T1T1 T1 T1 KafkaProducer producer = createKafkaProducer( "bootstrap.servers", "broker:9092", "transactional.id", "my-transactional-id"); producer.initTransactions(); -- send some records -- producer.commitTransaction(); Consumer KafkaConsumer consumer = createKafkaConsumer( "bootstrap.servers", "broker:9092", "group.id", "my-group-id", "isolation.level", "read_committed"); Reference https://www.confluent.io/blog/transactions-apache-kafka/
  • 53.
  • 54.
    5454 A basic JavaConsumer final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props); consumer.subscribe(Arrays.asList(topic)); try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { // Do Some Work … } } } finally { consumer.close(); } }
  • 55.
    5555 Consuming From Kafka- Single Consumer C
  • 56.
    5656 Consuming From Kafka- Grouped Consumers CC C1 CC C2
  • 57.
    5757 Consuming From Kafka- Grouped Consumers C C C C
  • 58.
    5858 Consuming From Kafka- Grouped Consumers 0 1 2 3
  • 59.
    5959 Consuming From Kafka- Grouped Consumers 0 1 2 3
  • 60.
    6060 Consuming From Kafka- Grouped Consumers 0, 3 1 2 3
  • 61.
    6161 Resources Free E-Books fromConfluent! https://www.confluent.io/apache-kafka-stream-processing-book-bundle Confluent Blog: https://www.confluent.io/blog Thank You! gabriel@confluent.io @gnschenker
  • 62.
  • 63.