1Confidential Introducing ExactlyOnce Semantics in Apache® KafkaTM Jason Gustafson, Apurva Mehta, Guozhang Wang, and Sriram Subramaniam Matthias J. Sax | Software Engineer matthias@confluent.io @MatthiasJSax
2Confidential Outline • Kafka’sexisting delivery semantics. • What’s new? • How do you use it? • Summary.
3Confidential What Kafka offers today • At-least-once,in-order delivery per partition. • Producer retries can introduce duplicates.
4Confidential Stream Processing with Apache Spark Read-process-write pattern:
5Confidential Example: duplicate write Producer Broker Topic Partition
6Confidential Example: duplicate write Producer Broker Topic Partition send (k,v)
7Confidential Example: duplicate write Producer Broker (k,v) Topic Partition append (k,v)
8Confidential Example: duplicate write Producer Broker (k,v) Topic Partition ack
9Confidential Example: duplicate write Producer Broker (k,v) Topic Partition send (k,v)
10Confidential Example: duplicate write Producer Broker (k,v) (k,v) Topic Partition append (k,v)
11Confidential Example: duplicate write Producer Broker (k,v) (k,v) Topic Partition ack
12Confidential Why improve? • Stream processing is becoming a bigger part of the data landscape. • Apache Kafka is the foundation for such stream processing. • Strengthening Kafka’ssemantics expands the universe of streaming applications.
13Confidential What’s new?
14Confidential What’s new • Idempotent producer:exactly-once writes. • Transactionalproducer:Atomic writes across multiple partitions. • Exactly-once stream processing: read-process-write.
15Confidential What’s new Idempotent Producer
16Confidential Example: Idempotent Producer Producer Broker Topic Partition
17Confidential Example: Idempotent Producer Producer Broker send (k,v) seq = 0 pid = 73 Topic Partition
18Confidential Example: Idempotent Producer Producer Broker append (k,v) seq = 0 pid = 73 Topic Partition (k,v) seq = 0 pid = 73
19Confidential Example: Idempotent Producer Producer Broker ack Topic Partition (k,v) seq = 0 pid = 73
20Confidential Example: Idempotent Producer Producer Broker send (k,v) seq = 0 pid = 73 Topic Partition (k,v) seq = 0 pid = 73
21Confidential Example: Idempotent Producer Producer Broker ack (dup) Topic Partition (k,v) seq = 0 pid = 73
22Confidential What’s new Atomic Multi-Partition Writes (aka “transactions”)
23Confidential Atomic Multi-Partition Writes Producer Topic A, Partition 0 Topic B, Partition 0 Topic B, Partition 1
24Confidential Atomic Multi-Partition Writes Producer Topic A, Partition 0 m1 m5 m3 m4 m6Topic B, Partition 0 m2Topic B, Partition 1
25Confidential Atomic Multi-Partition Writes Producer Topic A, Partition 0 m1 m5 C m3 m4 m6Topic B, Partition 0 m2Topic B, Partition 1 C C atomic commit
26Confidential Atomic Multi-Partition Writes Consumer Topic A, Partition 0 m1 m5 C m3 m4 m6Topic B, Partition 0 m2Topic B, Partition 1 C C read committed
27Confidential TransactionalAPI producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.sendOffsetsToTxn(…); producer.commitTransaction(); } catch (ProducerFencedException e) { producer.close(); } catch (KafkaException e) { producer.abortTransaction(); }
28Confidential How to use exactly-once capabilities: • Streams API (the easiest way to use exactly-once semantics) • Config parameter processing.mode = “exactly_once” • Idempotent Producer • Config parameter enable.idempotence = true • Transactional Producer • Config parameter transactional.id = “my-unique-tid” • And Transactional API (hard to use!) • Transactional Consumer • Config parameter isolation.level = “read_committed” (default: “read_uncommitted”)
29Confidential Stream Processing with Kafka’s Streams API Transactional read-process-write-commit pattern:
30Confidential Stream Processing with Apache Spark Transactional read-process-write-commit pattern:
31Confidential When to use this? Available in Kafka 0.11, June 2017. Try it out!
32Confidential Putting it together • We understood Kafka’sexisting delivery semantics. • Learned how these have been strengthened. • Learned how the new semantics work. • Saw, it’s easy to use with higher levelAPIs like Kafka Streams or Apache Spark.
33Confidential Thank You We are hiring!

Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax