The best of Apache Kafka Architecture

The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

About Me ❏ Graduated as Civil Engineer. ❏ <dev> 10+ years </dev> ❏ <Thoughtworker from=”India”/> ❏ Organizer of Hyderabad Scalability Meetup with 2000+ members.

“Form follows function.” - Louis Sullivan

Gravity Dam Indirasagar Dam, India img src: http://www.montanhydraulik.in

Forces on a gravity dam Dam weight Head Water Tail Water Uplift

❏ publish-subscribe messaging service ❏ distributed commit/write-ahead log “producers produce, consumers consume, in large distributed reliable way -- real time”

❏ DBs ❏ Logs ❏ Brokers ❏ HDFS “For highly distributed messages, Kafka stands out.” Why Kafka?

Kafka Vs ________ src: https://softwaremill.com/mqperf/

Timeline 2011 2012 2013 2014 2015 Open sourced by LinkedIn, as version 0.6 Graduated from Apache Latest stable - 0.8.2.1 Several Engineers who built Kakfa create Confluent

A Kafka Message CRC attributes key length key message message length message content kafka.message.Message magic Change requested:KAFKA-2511

Producers - push Kafka Broker org.apache.kafka.clients.producer.KafkaProducer Response => [TopicName [Partition ErrorCode Offset]] Request => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]

Topic number of messages time size Remove messages based on kafka.common.Topic

Partitions kafka.cluster.Partition Serves: Horizontal scaling, Parallel consumer reads

Consumers - pull kafka.consumer.ConsumerConnector, kafka.consumer.SimpleConsumer Consumer 1 Consumer 2

Consumer offsets committing and fetching consumer offsets img src: http://www.reynanprinting.com/photos/undefined/impresion-offset1.jpg

kafka:// - protocol ● Metadata ● Send ● Fetch ● Offsets ● Offset commit ● Offset fetch “Binary protocol over TCP”

Mechanical Sympathy "The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry." - Henry Peteroski Image source: http://www.theguide2surrey.com

Persistence “Everything is faster till the disk IO.”

Disk faster than RAM src: http://queue.acm.org/detail.cfm?id=1563874

Linear Read & Writes On high level there are only two operations: Append to end of log fetch messages from a partition beginning from a particular message id sequential file I/O

Linux Page Cache “Kafka ate my RAM”

ZeroCopy src: http://www.ibm.com/developerworks/library/j-zerocopy/

Batching small latency to improve throughput img src: https://prashanthpanduranga.files.wordpress.com/2015/05/tirupati.jpg

Compression bandwidth is more expensive per-byte to scale than disk I/O, CPU, or network bandwidth capacity within a facility kafka.message.CompressionCodec

Log compaction img src: http://kafka.apache.org/083/documentation.html kafka.log.LogCleaner, LogCleanerManager

Message Delivery Atleast once Atmost once Exactly once

Replication un-replicated = replication factor of one

Quorum based ● Better latency ● To tolerate “f” failures, need “2f+1” replicas

Primary-backup replication Broker 1 Broker 2 Broker 3 Broker 4 Topic 1 Topic 1 Topic 1 Topic 2 Topic 2 Topic 2 Topic 3 Topic 3Topic 3

THANK YOU For questions or suggestions: Ran.ga.na.than B ranganab@thoughtworks.com @ran_than

The best of Apache Kafka Architecture

More Related Content

What's hot

Viewers also liked

Similar to The best of Apache Kafka Architecture

More from techmaddy

Recently uploaded

The best of Apache Kafka Architecture