1 Apache Kafka Event Streaming Platform March, 2019 / Boston, MA @gamussa | #BostonKafka | @ConfluentINc
@gamussa | #BostonKafka | @ConfluentINc 2
@gamussa | #BostonKafka | @ConfluentINc
Raffle, yeah 🚀 Follow @gamussa 📸 🖼 🏋 Tag @gamussa With #BostonKafka
@gamussa | #BostonKafka | @ConfluentINc 5 A company is build on DATA FLOWS but All we have is DATA STORES
@gamussa | #BostonKafka | @ConfluentINc 6 Pre-Streaming
@gamussa | #BostonKafka | @ConfluentINc 7
@gamussa | #BostonKafka | @ConfluentINc 8 New World Streaming first • DB/DWH + Many more distributed data systems • Monolith -> Microservices • Batch -> Real-time
@gamussa | #BostonKafka | @ConfluentINc 9 Origins in Stream Processing Serving Layer (Microservices, Elastic, etc.) Java Apps with Kafka Streams or KSQL Continuous Computation High Throughput Streaming platform API based clustering
@gamussa | #BostonKafka | @ConfluentINc 10 Streaming Platform Storage Pub / Sub Processing
@gamussa | #BostonKafka | @ConfluentINc 11 Storage
@gamussa | #BostonKafka | @ConfluentINc 12 ● DB - table ● Hadoop - file ● Kafka - ? Core Abstraction
@gamussa | #BostonKafka | @ConfluentINc 13 LOG
@gamussa | #BostonKafka | @ConfluentINc 14 The log is a simple idea Messages are added at the end of the log Old New
@gamussa | #BostonKafka | @ConfluentINc 15 Messages are added at the end of the log Old New The log is a simple idea
@gamussa | #BostonKafka | @ConfluentINc 16 Pub / Sub
@gamussa | #BostonKafka | @ConfluentINc 17 Time
@gamussa | #BostonKafka | @ConfluentINc 18 C2 C3C1 Time
@gamussa | #BostonKafka | @ConfluentINc 19 Time A B C D hash(key) % numPartitions = N
@gamussa | #BostonKafka | @ConfluentINc 20 Messages will be produced in a round robin fashion Time
@gamussa | #BostonKafka | @ConfluentINc 21 Consumers have a position all of their own Old New Robin is here Scan Viktor is here Scan Ricardo is here Scan
@gamussa | #BostonKafka | @ConfluentINc 22 Old New Robin is here Scan Viktor is here Scan Ricardo is here Scan Consumers have a position all of their own
@gamussa | #BostonKafka | @ConfluentINc 23 Old New Robin is here Scan Viktor is here Scan Ricardo is here Scan Consumers have a position all of their own
@gamussa | #BostonKafka | @ConfluentINc 24 Only Sequential Access Old New Read to offset & scan
CONSUMER GROUP COORDINATOR CONSUMERS CONSUMER GROUP
@gamussa | #BostonKafka | @ConfluentINc 26 C
@gamussa | #BostonKafka | @ConfluentINc 27 CC C1 CC C2
@gamussa | #BostonKafka | @ConfluentINc 28 C C C C
@gamussa | #BostonKafka | @ConfluentINc 29 0 1 2 3
@gamussa | #BostonKafka | @ConfluentINc 30 0 1 2 3
@gamussa | #BostonKafka | @ConfluentINc 31 0, 3 1 2 3
@gamussa | #BostonKafka | @ConfluentINc 32 Linearly Scalable Architecture Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Producers Consumers
@gamussa | #BostonKafka | @ConfluentINc 33 Replicate to get fault replicate msg msg leader Machine A Machine B
@gamussa | #BostonKafka | @ConfluentINc 34 Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
@gamussa | #BostonKafka | @ConfluentINc 35 Replication provides resiliency A replica takes over on machine failure
@gamussa | #BostonKafka | @ConfluentINc 36 Partition Leadership and Replication - node failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
37 Similar to a traditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage The log is a type of durable messaging system
@gamussa | #BostonKafka | @ConfluentINc Stop! Demo time!
@gamussa | #BostonKafka | @ConfluentINc 39 Processing
@gamussa | #BostonKafka | @ConfluentINc 40 Streaming 
 is the toolset for dealing with events 
 as they move!
@gamussa | #BostonKafka | @ConfluentINc 41 authorization_attempts possible_fraud What exactly is Stream Processing?
@gamussa | #BostonKafka | @ConfluentINc 42 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@gamussa | #BostonKafka | @ConfluentINc 43 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@gamussa | #BostonKafka | @ConfluentINc 44 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@gamussa | #BostonKafka | @ConfluentINc 45 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@gamussa | #BostonKafka | @ConfluentINc 46 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@gamussa | #BostonKafka | @ConfluentINc 47 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@gamussa | #BostonKafka | @ConfluentINc 48 Lower the bar to enter the world of streaming User Population CodingSophistication Core developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams
@gamussa | #BostonKafka | @ConfluentINc 49 KSQL #FTW 4 Headless1 UI 2 CLI ksql> 3 REST POST /query
@gamussa | #BostonKafka | @ConfluentINc 50 Interaction with Kafka Kafka
 (data) KSQL
 (processing) JVM application
 with Kafka Streams (processing) Does not run on 
 Kafka brokers Does not run on 
 Kafka brokers
@gamussa | #BostonKafka | @ConfluentINc 51 Standing on the shoulders of Streaming Giants Producer, Consumer APIs Kafka Streams KSQL Ease of use Flexibility KSQL UDFs Powered by Powered by
@gamussa | #BostonKafka | @ConfluentINc 52 Find your local Meetup Group
 https://cnfl.io/kafka-meetups Join us in Slack
 http://cnfl.io/slack Grab Stream Processing books https://cnfl.io/book-bundle
@gamussa | #BostonKafka | @ConfluentINc 53 One more thing…
@gamussa | #BostonKafka | @ConfluentINc 54
@gamussa | #BostonKafka | @ConfluentINc 55
@@gamussa | @tlberglund | #DEVnexus https://kafka-summit.org Gamov30
@ @gamussa | #BostonKafka | @ConfluentINc Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/

What is Apache Kafka, and What is an Event Streaming Platform?

  • 1.
    1 Apache Kafka Event StreamingPlatform March, 2019 / Boston, MA @gamussa | #BostonKafka | @ConfluentINc
  • 2.
    @gamussa | #BostonKafka| @ConfluentINc 2
  • 3.
    @gamussa | #BostonKafka| @ConfluentINc
  • 4.
    Raffle, yeah 🚀 Follow@gamussa 📸 🖼 🏋 Tag @gamussa With #BostonKafka
  • 5.
    @gamussa | #BostonKafka| @ConfluentINc 5 A company is build on DATA FLOWS but All we have is DATA STORES
  • 6.
    @gamussa | #BostonKafka| @ConfluentINc 6 Pre-Streaming
  • 7.
    @gamussa | #BostonKafka| @ConfluentINc 7
  • 8.
    @gamussa | #BostonKafka| @ConfluentINc 8 New World Streaming first • DB/DWH + Many more distributed data systems • Monolith -> Microservices • Batch -> Real-time
  • 9.
    @gamussa | #BostonKafka| @ConfluentINc 9 Origins in Stream Processing Serving Layer (Microservices, Elastic, etc.) Java Apps with Kafka Streams or KSQL Continuous Computation High Throughput Streaming platform API based clustering
  • 10.
    @gamussa | #BostonKafka| @ConfluentINc 10 Streaming Platform Storage Pub / Sub Processing
  • 11.
    @gamussa | #BostonKafka| @ConfluentINc 11 Storage
  • 12.
    @gamussa | #BostonKafka| @ConfluentINc 12 ● DB - table ● Hadoop - file ● Kafka - ? Core Abstraction
  • 13.
    @gamussa | #BostonKafka| @ConfluentINc 13 LOG
  • 14.
    @gamussa | #BostonKafka| @ConfluentINc 14 The log is a simple idea Messages are added at the end of the log Old New
  • 15.
    @gamussa | #BostonKafka| @ConfluentINc 15 Messages are added at the end of the log Old New The log is a simple idea
  • 16.
    @gamussa | #BostonKafka| @ConfluentINc 16 Pub / Sub
  • 17.
    @gamussa | #BostonKafka| @ConfluentINc 17 Time
  • 18.
    @gamussa | #BostonKafka| @ConfluentINc 18 C2 C3C1 Time
  • 19.
    @gamussa | #BostonKafka| @ConfluentINc 19 Time A B C D hash(key) % numPartitions = N
  • 20.
    @gamussa | #BostonKafka| @ConfluentINc 20 Messages will be produced in a round robin fashion Time
  • 21.
    @gamussa | #BostonKafka| @ConfluentINc 21 Consumers have a position all of their own Old New Robin is here Scan Viktor is here Scan Ricardo is here Scan
  • 22.
    @gamussa | #BostonKafka| @ConfluentINc 22 Old New Robin is here Scan Viktor is here Scan Ricardo is here Scan Consumers have a position all of their own
  • 23.
    @gamussa | #BostonKafka| @ConfluentINc 23 Old New Robin is here Scan Viktor is here Scan Ricardo is here Scan Consumers have a position all of their own
  • 24.
    @gamussa | #BostonKafka| @ConfluentINc 24 Only Sequential Access Old New Read to offset & scan
  • 25.
  • 26.
    @gamussa | #BostonKafka| @ConfluentINc 26 C
  • 27.
    @gamussa | #BostonKafka| @ConfluentINc 27 CC C1 CC C2
  • 28.
    @gamussa | #BostonKafka| @ConfluentINc 28 C C C C
  • 29.
    @gamussa | #BostonKafka| @ConfluentINc 29 0 1 2 3
  • 30.
    @gamussa | #BostonKafka| @ConfluentINc 30 0 1 2 3
  • 31.
    @gamussa | #BostonKafka| @ConfluentINc 31 0, 3 1 2 3
  • 32.
    @gamussa | #BostonKafka| @ConfluentINc 32 Linearly Scalable Architecture Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Producers Consumers
  • 33.
    @gamussa | #BostonKafka| @ConfluentINc 33 Replicate to get fault replicate msg msg leader Machine A Machine B
  • 34.
    @gamussa | #BostonKafka| @ConfluentINc 34 Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 35.
    @gamussa | #BostonKafka| @ConfluentINc 35 Replication provides resiliency A replica takes over on machine failure
  • 36.
    @gamussa | #BostonKafka| @ConfluentINc 36 Partition Leadership and Replication - node failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 37.
    37 Similar to atraditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage The log is a type of durable messaging system
  • 38.
    @gamussa | #BostonKafka| @ConfluentINc Stop! Demo time!
  • 39.
    @gamussa | #BostonKafka| @ConfluentINc 39 Processing
  • 40.
    @gamussa | #BostonKafka| @ConfluentINc 40 Streaming 
 is the toolset for dealing with events 
 as they move!
  • 41.
    @gamussa | #BostonKafka| @ConfluentINc 41 authorization_attempts possible_fraud What exactly is Stream Processing?
  • 42.
    @gamussa | #BostonKafka| @ConfluentINc 42 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  • 43.
    @gamussa | #BostonKafka| @ConfluentINc 43 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  • 44.
    @gamussa | #BostonKafka| @ConfluentINc 44 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  • 45.
    @gamussa | #BostonKafka| @ConfluentINc 45 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  • 46.
    @gamussa | #BostonKafka| @ConfluentINc 46 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  • 47.
    @gamussa | #BostonKafka| @ConfluentINc 47 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
  • 48.
    @gamussa | #BostonKafka| @ConfluentINc 48 Lower the bar to enter the world of streaming User Population CodingSophistication Core developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams
  • 49.
    @gamussa | #BostonKafka| @ConfluentINc 49 KSQL #FTW 4 Headless1 UI 2 CLI ksql> 3 REST POST /query
  • 50.
    @gamussa | #BostonKafka| @ConfluentINc 50 Interaction with Kafka Kafka
 (data) KSQL
 (processing) JVM application
 with Kafka Streams (processing) Does not run on 
 Kafka brokers Does not run on 
 Kafka brokers
  • 51.
    @gamussa | #BostonKafka| @ConfluentINc 51 Standing on the shoulders of Streaming Giants Producer, Consumer APIs Kafka Streams KSQL Ease of use Flexibility KSQL UDFs Powered by Powered by
  • 52.
    @gamussa | #BostonKafka| @ConfluentINc 52 Find your local Meetup Group
 https://cnfl.io/kafka-meetups Join us in Slack
 http://cnfl.io/slack Grab Stream Processing books https://cnfl.io/book-bundle
  • 53.
    @gamussa | #BostonKafka| @ConfluentINc 53 One more thing…
  • 54.
    @gamussa | #BostonKafka| @ConfluentINc 54
  • 55.
    @gamussa | #BostonKafka| @ConfluentINc 55
  • 56.
    @@gamussa | @tlberglund| #DEVnexus https://kafka-summit.org Gamov30
  • 57.
    @ @gamussa | #BostonKafka| @ConfluentINc Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/