1 By Colin McCabe Building Microservices with Apache Kafka™
2 About Me
3 Roadmap ● Example network service • Why microservices? • Why Kafka? ● Apache Kafka background ● How Kafka helps scale microservices ● Kafka APIs • Kafka Connect API • Kafka Streams API ● Wrap up ● New Kafka features and improvements
4 Newsfeed Application
5 Single Process First Try: Monolithic Service
6 Emailer Second Try: Microservices with REST HDFS Connector Metrics Connector Frontend
7 Third Try: Microservices with Kafka Frontend
8 Themes ● Improving Decoupling • Everything in one big app: no decoupling • Microservices with REST: multiple services • Microservices with Kafka: decoupled services sharing data ● Improving Scalability • Everything in one big app: single node • Microservices with REST: one node per service • Microservices with Kafka: scalable microservices
9 Apache Kafka ● A distributed streaming platform ● https://kafka.apache.org/intro ● Kafka was built at LinkedIn around 2010 ● Multi-platform: clients in Java, Scala, C, C++, Python, Go, C#, …
10 Kafka Adoption
11 Kafka Concepts: the 10,000 foot view ● 4 APIs • Producer • Consumer • Connector • Stream Processor
12 Producers and Consumers Producer Consumer Producer Producer Consumer Consumer Consumer write messages read messages message ● key ● value
13 Topics Frontend { ‘story’: ‘my news story’, ‘user’: ‘foo’, ‘timestamp’: <time> } ‘views’ topic Backend
14 Kafka is Durable Frontend ● Data is replicated to multiple servers and persisted to disk. ● Configurable log retention. ● Consumers can read from any part of the log. ‘views’ topic
15 Scaling with Kafka ● Can have multiple producers writing to a topic ● Can have multiple consumers reading from a topic ● Can add new microservices to consume data easily • Example: add more microservices processing views • Organize microservices around data, rather than APIs ● Can add more Kafka brokers to handle more messages and topics • Horizontal scalability
16 Scaling a Topic with Multiple Partitions Frontend events topic Backend Backend Backend
17 Load Balancing with Multiple Consumers Frontend emailer consumer group story_emails topic
18 Partition Reassignment Frontend emailer consumer group story_emails topic
19 Connecting to External Services Frontend Kafka Connect API
20 Kafka Connect API docs.confluent.io/current/connect/ Connector Instance ● Responsible for copying data between Kafka and an external system Connector Task Connector Plugin
21 Kafka Streams API kafka.apache.org/ documentation/streams ● Process streams of data. ● Fault-tolerant and scalable.
22 Calculating News Reader Metrics Alice 13 Bob 4 Chao 25 Bob 19 Dave 55 ... Alice europe Bob us Chao asia Bob us Dave europe ... europe 68 us 23 asia 25 ... + = clicks locations clicks per location
23 Kafka Streams API ● Inputs and outputs are Kafka streams ● Fault-tolerance, rebalancing, scalability provided by Kafka ● KStream ● KTable
24 Joining the Clicks and Location Streams in KStreams KStream<String, Long> userClicksStream = builder.stream(..., "user-clicks-topic"); KTable<String, String> userRegionsTable = builder.table(..., "user-regions-topic") KTable<String, Long> clicksPerRegion = userClicksStream .leftJoin(userRegionsTable, (c, r) -> new RegionWithClicks(r == null ? "UNKNOWN" : r, c)) .map((user, regionWithClicks) -> new KeyValue<>(regionWithClicks.getRegion(), regionWithClicks.getClicks())). reduceByKey((c1, c2) -> c1 + c2, ...); clicksPerRegion.to("clicks-per-region-topic", ...);
25 Wrap-Up Frontend Kafka Connect Kafka Streams load balancing & scalability decouple front-end and back-end
26 New Kafka Features and Improvements ● Exactly once semantics in Kafka 0.11 • https://www.confluent.io/blog/exactly-once-semantics- are-possible-heres-how-apache-kafka-does-it/ ● Consumer and producer performance improvements • Up to +20% producer throughput • Up to +50% consumer throughput ● Better CLASSPATH isolation for Kafka Connect connectors
27 Conclusion ● The loose coupling, deployability, and testability of microservices makes them a great way to scale. ● Apache Kafka is an incredibly useful building block for many different microservices. ● Kafka is reliable and does the heavy lifting ● Kafka Connect is a great API for connecting with external databases, Hadoop clusters, and other external systems. ● Kafka Streams can process data in realtime. ● https://www.confluent.io/solutions/microservices/
28 Thank You! https://www.confluent.io/download https://www.confluent.io/careers

Building Microservices with Apache Kafka