11 Introduction to Apache Kafka and Confluent ... and why they matter! Kafka Meetup - Johannesburg Tuesday, March 20th 2018 18:00 – 20:00 SSA - Maxwell Office Park, Magwa Cres, Waterfall City, Midrand, 2090 · Midrand https://www.meetup.com/Johannesburg-Kafka-Meetup/events/248465767/
22 How Organizations Handle Data Flows: a Giant Mess Data Warehouse Hadoop NoSQL Oracle SFDC Logging Bloomberg …any sink/source Web Custom Apps Microservices Monitoring Analytics …and more OLTP ActiveMQ App App Caches OLTP OLTPAppAppApp
33 Apache Kafka™: A Distributed Streaming Platform Apache Kafka Offline Batch (+1 Hour)Near-Real Time (>100s ms)Real Time (0-100 ms) Data Warehouse Hadoop NoSQL Oracle SFDC Twitter Bloomberg …any sink/source …any sink/source …and more Web Custom Apps Microservices Monitoring Analytics
44 More than 1 petabyte of data in Kafka Over 1.2 trillion messages per day Thousands of data streams Source of all data warehouse & Hadoop data Over 300 billion user- related events per day
55 Over 35% of Fortune 500’s are using Apache Kafka™ 6 of top 10 Travel 7 of top 10 Global banks 8 of top 10 Insurance 9 of top 10 Telecom
66 Industry Trends… and why Apache Kafka matters! 1. From ‘big data’ (batch) to ‘fast data’ (stream processing) 2. Internet of Things (IoT) and sensor data 3. Microservices and asynchronous communication (coordination messages and data streams) between loosely coupled and fine- grained services
77 Apache Kafka APIs – A UNIX Analogy $ cat < in.txt | grep "apache" | tr a-z A-Z > out.txt Connect APIs Streams APIs Producer / Consumer APIs
88 Apache Kafka API – ETL Analogy Source SinkConnectAPI ConnectAPI Streams API Extract Transform Load
99 Apache Kafka 101 Internals and Core Concepts
1010 Apache Kafka Concepts: Persistent Log Data Producer 0 1 2 3 4 5 6 7 8 9 10 11 12 writes Data Consumer (offset = 7) Data Consumer (offset = 11) reads reads
1111 Apache Kafka Concepts: Anatomy of a Topic 0 1 2 3 4 5 6 7 8 9 10 11 12partition 0 0 1 2 3 4 5 6 7 40 1 2 3 5 partition 1 partition 2 writes
1212 Apache Kafka Concepts: Log Storage offset index timestamp index offsets: 0 - 10000 offset index timestamp index offsets: 10001 - 20000 offset index timestamp index offsets: 20001 - 30000
1313 Apache Kafka Concepts: Message Format 8 bytes 4 bytes 4 bytes 8 bytes 4 bytes varies 4 bytes varies offset length CRC timesta mp key length value length key content value content magic byte 1 byte attribute 1 byte
1414 Apache Kafka Concepts: Producers and Consumers Producer Producer Producer Consumer Consumer Broker Broker Broker
1515 Apache Kafka Concepts: Topics and Partitions Producer Producer Producer Consumer Consumer Broker Broker Broker T0: P0 T0: P2 T0: P1 T0: P3 T1: P0 T1: P1
1616 Apache Kafka Concepts: Fault Tolerance and Replication Producer Producer Producer Consumer Consumer Broker Broker Broker T0: P0 T0: P0 (Replica 1) T1: P0 T1: P0 (Replica 1)
1717 Apache Kafka Concepts: Consumer Groups Producer Producer Producer Consumer Broker Broker Broker T0: P0 T0: P2 T0: P1 T0: P3 T1: P0 T1: P1 Consumer Consumer Consumer
1818 The Connect API of Apache Kafka®  Centralized management and configuration  Support for hundreds of technologies including RDBMS, Elasticsearch, HDFS, S3  Supports CDC ingest of events from RDBMS  Preserves data schema  Fault tolerant and automatically load balanced  Extensible API  Single Message Transforms  Part of Apache Kafka, included in Confluent Open Source Reliable and scalable integration of Kafka with other systems – no coding required. { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/
1919 Build Applications, not Clusters <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>1.0.0</version> </dependency>
2020 Spot the Difference(s)!
2121 How do I run in production?
2222 How do I run in production? Uncool Cool
2323 How do I run in production? http://docs.confluent.io/current/streams/introduction.html
2424 Elastic and Scalable http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application
2525 Elastic and Scalable http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application
2626 Elastic and Scalable http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application
2727 Typical High Level Architecture Real-time Data Ingestion
2828 Typical High Level Architecture Stream Processing Real-time Data Ingestion
2929 Typical High Level Architecture Stream Processing Storage Real-time Data Ingestion
3030 Typical High Level Architecture Data Publishing / Visualization Stream Processing Storage Real-time Data Ingestion
3131 How many clusters do you count? NoSQL (Cassandra, HBase, Couchbase, MongoDB, …) or Elasticsearch, Solr, … Storm, Flink, Spark Streaming, Ignite, Akka Streams, Apex, … HDFS, NFS, Ceph, GlusterFS, Lustre, ... Apache Kafka
3232 Simplicity is the Ultimate Sophistication Node.js Apache Kafka Distributed Streaming Platform Publish & Subscribe to streams of data like a messaging system Store streams of data safely in a distributed replicated cluster Process streams of data efficiently and in real-time
3333 Duality of Streams and Tables http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
3434 Duality of Streams and Tables http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
3535 Interactive Queries http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-interactive-queries
3636 Interactive Queries http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-interactive-queries
3737 Kafka Streams DSL http://docs.confluent.io/current/streams/developer-guide.html#kafka-streams-dsl
3838 WordCount (and Java 8+) WordCountLambdaExample.java final Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example"); streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers); ... final Serde<String> stringSerde = Serdes.String(); final Serde<Long> longSerde = Serdes.Long(); final KStreamBuilder builder = new KStreamBuilder(); final KStream<String, String> textLines = builder.stream(stringSerde, stringSerde, "TextLinesTopic"); final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS); final KTable<String, Long> wordCounts = textLines .flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase()))) .groupBy((key, word) -> word) .count("Counts"); wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic"); final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration); streams.cleanUp(); streams.start(); Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
3939 Easy to Develop with, Easy to Test WordCountLambdaIntegrationTest.java EmbeddedSingleNodeKafkaCluster CLUSTER = new EmbeddedSingleNodeKafkaCluster(); ... CLUSTER.createTopic(inputTopic); ... Properties producerConfig = new Properties(); producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, CLUSTER.bootstrapServers());
4040 The Streams API of Apache Kafka®  No separate processing cluster required  Develop on Mac, Linux, Windows  Deploy to containers, VMs, bare metal, cloud  Powered by Kafka: elastic, scalable, distributed, battle-tested  Perfect for small, medium, large use cases  Fully integrated with Kafka security  Exactly-once processing semantics  Part of Apache Kafka, included in Confluent Open Source Write standard Java applications and microservices to process your data in real-time KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic"); KTable<Windowed<User>, Long> viewsPerUserSession = pageViews .groupByKey() .count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views"); https://docs.confluent.io/current/streams/
4141 KSQL: a Streaming SQL Engine for Apache Kafka® from Confluent  No coding required, all you need is SQL  No separate processing cluster required  Powered by Kafka: elastic, scalable, distributed, battle-tested CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.userid WHERE u.level = 'Platinum'; KSQL is the simplest way to process streams of data in real-time  Perfect for streaming ETL, anomaly detection, event monitoring, and more  Part of Confluent Open Source https://github.com/confluentinc/ksql
Do you think that’s a table you are querying ?
4343 KSQL in less than 5 minutes https://www.youtube.com/watch?v=A45uRzJiv7I
4444 Confluent Enterprise: Logical Architecture Kafka Cluster Mainframe Kafka Connect Servers Kafka ConnectRDBMS Hadoop Cassandra Elasticsearch Kafka Connect Servers Kafka Connect Files Producer Application Consumer ApplicationZookeeper Kafka Broker REST Proxy Servers REST Proxy REST Client Control Center Servers Control Center Schema Registry Servers Schema Registry Kafka Producer APIs Kafka Consumer APIs Stream Processing Application 1 Stream Client Stream Processing Application 2 Stream Client REST Proxy Servers REST Proxy REST Client
4545 Confluent Enterprise: Physical Architecture Rack 1 Kafka Broker #1 ToR Switch ToR Switch Schema Registry #1 Kafka Connect #1 Zookeeper #1 REST Proxy #1 Kafka Broker #4 Zookeeper #4 Rack 2 Kafka Broker #2 ToR Switch ToR Switch Schema Registry #2 Kafka Connect #2 Zookeeper #2 Kafka Broker #5 Zookeeper #5 Rack 3 Kafka Broker #3 ToR Switch ToR Switch Kafka Connect #3 Zookeeper #3 Core Switch Core Switch REST Proxy #2 Load Balancer Load Balancer Control Center #1 Control Center #2
4646 Confluent Completes Kafka Feature Benefit Apache Kafka Confluent Open Source Confluent Enterprise Apache Kafka High throughput, low latency, high availability, secure distributed streaming platform Kafka Connect API Advanced API for connecting external sources/destinations into Kafka Kafka Streams API Simple library that enables streaming application development within the Kafka framework Additional Clients Supports non-Java clients; C, C++, Python, .NET and several others REST Proxy Provides universal access to Kafka from any network connected device via HTTP Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable Pre-Built Connectors HDFS, JDBC, Elasticsearch, Amazon S3 and other connectors fully certified and supported by Confluent JMS Client Support for legacy Java Message Service (JMS) applications consuming and producing directly from Kafka Confluent Control Center Enables easy connector management, monitoring and alerting for a Kafka cluster Auto Data Balancer Rebalancing data across cluster to remove bottlenecks Replicator Multi-datacenter replication simplifies and automates MDC Kafka clusters Support Enterprise class support to keep your Kafka environment running at top performance Community Community 24x7x365
4747 Big Data and Fast Data Ecosystems Synchronous Req/Response 0 – 100s ms Near Real Time > 100s ms Offline Batch > 1 hour Apache Kafka Stream Data Platform Search RDBMS Apps Monitoring Real-time Analytics NoSQL Stream Processing Apache Hadoop Data Lake Impala DWH Hive Spark Map-Reduce Confluent HDFS Connector (exactly once semantics) https://www.confluent.io/blog/the-value-of-apache-kafka-in-big-data-ecosystem/
4848 Building a Microservices Ecosystem with Kafka Streams and KSQL https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/ https://github.com/confluentinc/kafka-streams-examples/tree/3.3.0-post/src/main/java/io/confluent/examples/streams/microservices
4949 Microservices: References Blog posts series: Part 1: The Data Dichotomy: Rethinking the Way We Treat Data and Services https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/ Part 2: Build Services on a Backbone of Events https://www.confluent.io/blog/build-services-backbone-events/ Part 3: Using Apache Kafka as a Scalable, Event-Driven Backbone for Service Architectures https://www.confluent.io/blog/apache-kafka-for-service-architectures/ Part 4: Chain Services with Exactly Once Guarantees https://www.confluent.io/blog/chain-services-exactly-guarantees/ Part 5: Messaging as the Single Source of Truth https://www.confluent.io/blog/messaging-single-source-truth/ Part 6: Leveraging the Power of a Database Unbundled https://www.confluent.io/blog/leveraging-power-database-unbundled/ Part 7: Building a Microservices Ecosystem with Kafka Streams and KSQL https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/ Whitepaper: Microservices in the Apache Kafka™ Ecosystem https://www.confluent.io/resources/microservices-in-the-apache-kafka-ecosystem/
5050 Apache Kafka Security Security • Processes customer data • Regulatory requirements • Legal compliance • Internal security policies • Need is not limited to industries such as finance, healthcare, or governmental services Authentication • Scenario example: “Only certain applications may talk to the production Kafka cluster” • Client authentication via SASL – e.g. Kerberos, Active Directory Authorization • Scenario example: “Only certain applications may read data from sensitive Kafka topics” • Restrict who can create, write to, read from topics, and more Encryption • Scenario example: “Data-in-transit between apps and Kafka clusters must be encrypted” • SSL supported • Encrypts data exchanged between Kafka brokers, between Kafka brokers and Kafka clients/apps Help meeting security requirements by supporting:
5151 Enterprise Ready Multi-Datacenter Replication for Kafka Data Center in USA Kafka Cluster (USA) Kafka Broker 1 Kafka Broker 2 Kafka Broker 3 ZooKeeper 1 ZooKeeper 2 ZooKeeper 3 Control Center Kafka Connect Cluster Replicator 1 Replicator 2 Data Center in EMEA Kafka Cluster (EU) Kafka Broker 1 Kafka Broker 2 Kafka Broker 3 ZooKeeper 1 ZooKeeper 2 ZooKeeper 3 Control Center Kafka Connect Cluster Replicator 1 Replicator 2 Available only with Confluent Enterprise Apache Kafka and Confluent Open Source
5252 Cloud Synchronization and Migrations with Confluent Enterprise: Before DC1 DB2 DB1 DWH App2 App3 App4 KV2KV3 DB3 App2-v2 App5 App7 App1-v2 AWS App8 DWH App1 Challenges • Each team/department must execute their own cloud migration • May be moving the same data multiple times • Each box represented here require development, testing, deployment, monitoring and maintenance KV
5353 DC1 Cloud Synchronization and Migrations with Confluent Enterprise: After DB2 DB1 KV DWH App2 App4 KV2KV3 App2-v2 App5 App7 App1-v2 AWS App8 DWH App1 Kafka Kafka App3 Benefits • Continuous low-latency synchronization • Centralized manageability and monitoring – Track at event level data produced in all data centers • Security and governance – Track and control where data comes from and who is accessing it • Cost Savings – Move Data Once DB3
5454 About Confluent and Apache Kafka™ 70% of active Kafka Committers Founded September 2014 Technology developed while at LinkedIn Founded by the creators of Apache Kafka
5555 Apache Kafka: PMC members and committers https://kafka.apache.org/committers PMC PMC PMC PMCPMC PMC PMC PMC PMC PMC PMC
5656 Download Confluent Platform: the easiest way to get you started https://www.confluent.io/download/
5757 Books: get them all three in PDF format from Confluent website! https://www.confluent.io/apache-kafka-stream-processing-book-bundle
5858 Discount code: kacom17 Presented by https://kafka-summit.org/ Presented by

Introduction to apache kafka, confluent and why they matter

  • 1.
    11 Introduction to ApacheKafka and Confluent ... and why they matter! Kafka Meetup - Johannesburg Tuesday, March 20th 2018 18:00 – 20:00 SSA - Maxwell Office Park, Magwa Cres, Waterfall City, Midrand, 2090 · Midrand https://www.meetup.com/Johannesburg-Kafka-Meetup/events/248465767/
  • 2.
    22 How Organizations HandleData Flows: a Giant Mess Data Warehouse Hadoop NoSQL Oracle SFDC Logging Bloomberg …any sink/source Web Custom Apps Microservices Monitoring Analytics …and more OLTP ActiveMQ App App Caches OLTP OLTPAppAppApp
  • 3.
    33 Apache Kafka™: ADistributed Streaming Platform Apache Kafka Offline Batch (+1 Hour)Near-Real Time (>100s ms)Real Time (0-100 ms) Data Warehouse Hadoop NoSQL Oracle SFDC Twitter Bloomberg …any sink/source …any sink/source …and more Web Custom Apps Microservices Monitoring Analytics
  • 4.
    44 More than 1 petabyteof data in Kafka Over 1.2 trillion messages per day Thousands of data streams Source of all data warehouse & Hadoop data Over 300 billion user- related events per day
  • 5.
    55 Over 35% ofFortune 500’s are using Apache Kafka™ 6 of top 10 Travel 7 of top 10 Global banks 8 of top 10 Insurance 9 of top 10 Telecom
  • 6.
    66 Industry Trends… andwhy Apache Kafka matters! 1. From ‘big data’ (batch) to ‘fast data’ (stream processing) 2. Internet of Things (IoT) and sensor data 3. Microservices and asynchronous communication (coordination messages and data streams) between loosely coupled and fine- grained services
  • 7.
    77 Apache Kafka APIs– A UNIX Analogy $ cat < in.txt | grep "apache" | tr a-z A-Z > out.txt Connect APIs Streams APIs Producer / Consumer APIs
  • 8.
    88 Apache Kafka API– ETL Analogy Source SinkConnectAPI ConnectAPI Streams API Extract Transform Load
  • 9.
  • 10.
    1010 Apache Kafka Concepts:Persistent Log Data Producer 0 1 2 3 4 5 6 7 8 9 10 11 12 writes Data Consumer (offset = 7) Data Consumer (offset = 11) reads reads
  • 11.
    1111 Apache Kafka Concepts:Anatomy of a Topic 0 1 2 3 4 5 6 7 8 9 10 11 12partition 0 0 1 2 3 4 5 6 7 40 1 2 3 5 partition 1 partition 2 writes
  • 12.
    1212 Apache Kafka Concepts:Log Storage offset index timestamp index offsets: 0 - 10000 offset index timestamp index offsets: 10001 - 20000 offset index timestamp index offsets: 20001 - 30000
  • 13.
    1313 Apache Kafka Concepts:Message Format 8 bytes 4 bytes 4 bytes 8 bytes 4 bytes varies 4 bytes varies offset length CRC timesta mp key length value length key content value content magic byte 1 byte attribute 1 byte
  • 14.
    1414 Apache Kafka Concepts:Producers and Consumers Producer Producer Producer Consumer Consumer Broker Broker Broker
  • 15.
    1515 Apache Kafka Concepts:Topics and Partitions Producer Producer Producer Consumer Consumer Broker Broker Broker T0: P0 T0: P2 T0: P1 T0: P3 T1: P0 T1: P1
  • 16.
    1616 Apache Kafka Concepts:Fault Tolerance and Replication Producer Producer Producer Consumer Consumer Broker Broker Broker T0: P0 T0: P0 (Replica 1) T1: P0 T1: P0 (Replica 1)
  • 17.
    1717 Apache Kafka Concepts:Consumer Groups Producer Producer Producer Consumer Broker Broker Broker T0: P0 T0: P2 T0: P1 T0: P3 T1: P0 T1: P1 Consumer Consumer Consumer
  • 18.
    1818 The Connect APIof Apache Kafka®  Centralized management and configuration  Support for hundreds of technologies including RDBMS, Elasticsearch, HDFS, S3  Supports CDC ingest of events from RDBMS  Preserves data schema  Fault tolerant and automatically load balanced  Extensible API  Single Message Transforms  Part of Apache Kafka, included in Confluent Open Source Reliable and scalable integration of Kafka with other systems – no coding required. { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/
  • 19.
    1919 Build Applications, notClusters <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>1.0.0</version> </dependency>
  • 20.
  • 21.
    2121 How do Irun in production?
  • 22.
    2222 How do Irun in production? Uncool Cool
  • 23.
    2323 How do Irun in production? http://docs.confluent.io/current/streams/introduction.html
  • 24.
  • 25.
  • 26.
  • 27.
    2727 Typical High LevelArchitecture Real-time Data Ingestion
  • 28.
    2828 Typical High LevelArchitecture Stream Processing Real-time Data Ingestion
  • 29.
    2929 Typical High LevelArchitecture Stream Processing Storage Real-time Data Ingestion
  • 30.
    3030 Typical High LevelArchitecture Data Publishing / Visualization Stream Processing Storage Real-time Data Ingestion
  • 31.
    3131 How many clustersdo you count? NoSQL (Cassandra, HBase, Couchbase, MongoDB, …) or Elasticsearch, Solr, … Storm, Flink, Spark Streaming, Ignite, Akka Streams, Apex, … HDFS, NFS, Ceph, GlusterFS, Lustre, ... Apache Kafka
  • 32.
    3232 Simplicity is theUltimate Sophistication Node.js Apache Kafka Distributed Streaming Platform Publish & Subscribe to streams of data like a messaging system Store streams of data safely in a distributed replicated cluster Process streams of data efficiently and in real-time
  • 33.
    3333 Duality of Streamsand Tables http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
  • 34.
    3434 Duality of Streamsand Tables http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
  • 35.
  • 36.
  • 37.
  • 38.
    3838 WordCount (and Java8+) WordCountLambdaExample.java final Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example"); streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers); ... final Serde<String> stringSerde = Serdes.String(); final Serde<Long> longSerde = Serdes.Long(); final KStreamBuilder builder = new KStreamBuilder(); final KStream<String, String> textLines = builder.stream(stringSerde, stringSerde, "TextLinesTopic"); final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS); final KTable<String, Long> wordCounts = textLines .flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase()))) .groupBy((key, word) -> word) .count("Counts"); wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic"); final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration); streams.cleanUp(); streams.start(); Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
  • 39.
    3939 Easy to Developwith, Easy to Test WordCountLambdaIntegrationTest.java EmbeddedSingleNodeKafkaCluster CLUSTER = new EmbeddedSingleNodeKafkaCluster(); ... CLUSTER.createTopic(inputTopic); ... Properties producerConfig = new Properties(); producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, CLUSTER.bootstrapServers());
  • 40.
    4040 The Streams APIof Apache Kafka®  No separate processing cluster required  Develop on Mac, Linux, Windows  Deploy to containers, VMs, bare metal, cloud  Powered by Kafka: elastic, scalable, distributed, battle-tested  Perfect for small, medium, large use cases  Fully integrated with Kafka security  Exactly-once processing semantics  Part of Apache Kafka, included in Confluent Open Source Write standard Java applications and microservices to process your data in real-time KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic"); KTable<Windowed<User>, Long> viewsPerUserSession = pageViews .groupByKey() .count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views"); https://docs.confluent.io/current/streams/
  • 41.
    4141 KSQL: a StreamingSQL Engine for Apache Kafka® from Confluent  No coding required, all you need is SQL  No separate processing cluster required  Powered by Kafka: elastic, scalable, distributed, battle-tested CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.userid WHERE u.level = 'Platinum'; KSQL is the simplest way to process streams of data in real-time  Perfect for streaming ETL, anomaly detection, event monitoring, and more  Part of Confluent Open Source https://github.com/confluentinc/ksql
  • 42.
    Do you thinkthat’s a table you are querying ?
  • 43.
    4343 KSQL in lessthan 5 minutes https://www.youtube.com/watch?v=A45uRzJiv7I
  • 44.
    4444 Confluent Enterprise: LogicalArchitecture Kafka Cluster Mainframe Kafka Connect Servers Kafka ConnectRDBMS Hadoop Cassandra Elasticsearch Kafka Connect Servers Kafka Connect Files Producer Application Consumer ApplicationZookeeper Kafka Broker REST Proxy Servers REST Proxy REST Client Control Center Servers Control Center Schema Registry Servers Schema Registry Kafka Producer APIs Kafka Consumer APIs Stream Processing Application 1 Stream Client Stream Processing Application 2 Stream Client REST Proxy Servers REST Proxy REST Client
  • 45.
    4545 Confluent Enterprise: PhysicalArchitecture Rack 1 Kafka Broker #1 ToR Switch ToR Switch Schema Registry #1 Kafka Connect #1 Zookeeper #1 REST Proxy #1 Kafka Broker #4 Zookeeper #4 Rack 2 Kafka Broker #2 ToR Switch ToR Switch Schema Registry #2 Kafka Connect #2 Zookeeper #2 Kafka Broker #5 Zookeeper #5 Rack 3 Kafka Broker #3 ToR Switch ToR Switch Kafka Connect #3 Zookeeper #3 Core Switch Core Switch REST Proxy #2 Load Balancer Load Balancer Control Center #1 Control Center #2
  • 46.
    4646 Confluent Completes Kafka FeatureBenefit Apache Kafka Confluent Open Source Confluent Enterprise Apache Kafka High throughput, low latency, high availability, secure distributed streaming platform Kafka Connect API Advanced API for connecting external sources/destinations into Kafka Kafka Streams API Simple library that enables streaming application development within the Kafka framework Additional Clients Supports non-Java clients; C, C++, Python, .NET and several others REST Proxy Provides universal access to Kafka from any network connected device via HTTP Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable Pre-Built Connectors HDFS, JDBC, Elasticsearch, Amazon S3 and other connectors fully certified and supported by Confluent JMS Client Support for legacy Java Message Service (JMS) applications consuming and producing directly from Kafka Confluent Control Center Enables easy connector management, monitoring and alerting for a Kafka cluster Auto Data Balancer Rebalancing data across cluster to remove bottlenecks Replicator Multi-datacenter replication simplifies and automates MDC Kafka clusters Support Enterprise class support to keep your Kafka environment running at top performance Community Community 24x7x365
  • 47.
    4747 Big Data andFast Data Ecosystems Synchronous Req/Response 0 – 100s ms Near Real Time > 100s ms Offline Batch > 1 hour Apache Kafka Stream Data Platform Search RDBMS Apps Monitoring Real-time Analytics NoSQL Stream Processing Apache Hadoop Data Lake Impala DWH Hive Spark Map-Reduce Confluent HDFS Connector (exactly once semantics) https://www.confluent.io/blog/the-value-of-apache-kafka-in-big-data-ecosystem/
  • 48.
    4848 Building a MicroservicesEcosystem with Kafka Streams and KSQL https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/ https://github.com/confluentinc/kafka-streams-examples/tree/3.3.0-post/src/main/java/io/confluent/examples/streams/microservices
  • 49.
    4949 Microservices: References Blog postsseries: Part 1: The Data Dichotomy: Rethinking the Way We Treat Data and Services https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/ Part 2: Build Services on a Backbone of Events https://www.confluent.io/blog/build-services-backbone-events/ Part 3: Using Apache Kafka as a Scalable, Event-Driven Backbone for Service Architectures https://www.confluent.io/blog/apache-kafka-for-service-architectures/ Part 4: Chain Services with Exactly Once Guarantees https://www.confluent.io/blog/chain-services-exactly-guarantees/ Part 5: Messaging as the Single Source of Truth https://www.confluent.io/blog/messaging-single-source-truth/ Part 6: Leveraging the Power of a Database Unbundled https://www.confluent.io/blog/leveraging-power-database-unbundled/ Part 7: Building a Microservices Ecosystem with Kafka Streams and KSQL https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/ Whitepaper: Microservices in the Apache Kafka™ Ecosystem https://www.confluent.io/resources/microservices-in-the-apache-kafka-ecosystem/
  • 50.
    5050 Apache Kafka Security Security •Processes customer data • Regulatory requirements • Legal compliance • Internal security policies • Need is not limited to industries such as finance, healthcare, or governmental services Authentication • Scenario example: “Only certain applications may talk to the production Kafka cluster” • Client authentication via SASL – e.g. Kerberos, Active Directory Authorization • Scenario example: “Only certain applications may read data from sensitive Kafka topics” • Restrict who can create, write to, read from topics, and more Encryption • Scenario example: “Data-in-transit between apps and Kafka clusters must be encrypted” • SSL supported • Encrypts data exchanged between Kafka brokers, between Kafka brokers and Kafka clients/apps Help meeting security requirements by supporting:
  • 51.
    5151 Enterprise Ready Multi-DatacenterReplication for Kafka Data Center in USA Kafka Cluster (USA) Kafka Broker 1 Kafka Broker 2 Kafka Broker 3 ZooKeeper 1 ZooKeeper 2 ZooKeeper 3 Control Center Kafka Connect Cluster Replicator 1 Replicator 2 Data Center in EMEA Kafka Cluster (EU) Kafka Broker 1 Kafka Broker 2 Kafka Broker 3 ZooKeeper 1 ZooKeeper 2 ZooKeeper 3 Control Center Kafka Connect Cluster Replicator 1 Replicator 2 Available only with Confluent Enterprise Apache Kafka and Confluent Open Source
  • 52.
    5252 Cloud Synchronization andMigrations with Confluent Enterprise: Before DC1 DB2 DB1 DWH App2 App3 App4 KV2KV3 DB3 App2-v2 App5 App7 App1-v2 AWS App8 DWH App1 Challenges • Each team/department must execute their own cloud migration • May be moving the same data multiple times • Each box represented here require development, testing, deployment, monitoring and maintenance KV
  • 53.
    5353 DC1 Cloud Synchronization andMigrations with Confluent Enterprise: After DB2 DB1 KV DWH App2 App4 KV2KV3 App2-v2 App5 App7 App1-v2 AWS App8 DWH App1 Kafka Kafka App3 Benefits • Continuous low-latency synchronization • Centralized manageability and monitoring – Track at event level data produced in all data centers • Security and governance – Track and control where data comes from and who is accessing it • Cost Savings – Move Data Once DB3
  • 54.
    5454 About Confluent andApache Kafka™ 70% of active Kafka Committers Founded September 2014 Technology developed while at LinkedIn Founded by the creators of Apache Kafka
  • 55.
    5555 Apache Kafka: PMCmembers and committers https://kafka.apache.org/committers PMC PMC PMC PMCPMC PMC PMC PMC PMC PMC PMC
  • 56.
    5656 Download Confluent Platform:the easiest way to get you started https://www.confluent.io/download/
  • 57.
    5757 Books: get themall three in PDF format from Confluent website! https://www.confluent.io/apache-kafka-stream-processing-book-bundle
  • 58.
    5858 Discount code: kacom17 Presentedby https://kafka-summit.org/ Presented by