1 KSQL and Kafka Streams When to use which, and when to use both Dr. Michael Noll Product Manager Confluent MÜNCHEN - 09. OKTOBER 2018
2 Agenda ● KSQL and Kafka Streams in 3 minutes ● Example Use Cases ● Similarities & Differences ● Guidance Duration: 20m
33 KSQL and Kafka Streams in 3 minutes
4 In a nutshell streams KSQL The streaming SQL engine for Apache Kafka® to write real-time applications in SQL Apache Kafka® library to write real-time applications and microservices in Java and Scala
5 Hello, Streaming World streams KSQL You write only SQL. No Java, Python, or other boilerplate to wrap around it! CREATE STREAM fraudulent_payments AS SELECT * FROM payments WHERE fraudProbability > 0.8; But you can create KSQL User Defined Functions in Java, if you want to.
6 Interaction with Kafka Kafka (data) KSQL (processing) JVM application with Kafka Streams (processing) Does not run on Kafka brokers Does not run on Kafka brokers
7 KSQL can be used interactively + programmatically ksql> POST /query 1 UI 2 CLI 3 REST 4 Headless
88 Example Use Cases (focus on KSQL)
9 KSQL for Data Exploration An easy way to inspect your data in Kafka SHOW TOPICS; SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; PRINT 'my-topic' FROM BEGINNING;
10 KSQL for Data Transformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’ VALUE_FORMAT='JSON') AS SELECT * FROM clickstream PARTITION BY user_id; Change number of partitions1 Convert data to JSON2 Repartition the data3
11 KSQL for Real-Time, Streaming ETL Filter, cleanse, process data while it is in motion CREATE STREAM clicks_from_vip_users AS SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level ='Platinum'; Pick only VIP users1
12 Example: CDC from DB via Kafka to Elastic
13 KSQL for Real-time Data Enrichment Join data from a variety of sources to see the full picture CREATE STREAM enriched_payments AS SELECT payment_id, c.country, total FROM payments_stream p LEFT JOIN customers_table c ON p.user_id = c.user_id; Stream-Table Join1
14 Example: Retail
15 KSQL for Real-Time Monitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 5; Now we know to alert, and whom1
16 Example: IoT, Automotive, Connected Cars streams
17 KSQL for Anomaly Detection Aggregate data to identify patterns and anomalies in real-time CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3; Aggregate data1 … per 30-sec windows2
1818 Workflow Comparison
19 Typical developer interaction streams KSQL
20 KSQL: typical workflow from development to production Interactive KSQL for development Headless KSQL in production
21 Kafka Streams: typical workflow from development to production Local development and testing with Java/Scala IDE Production
2222 Similarities
23 subscribe(), poll(), send(), flush(), beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … Shoulders of Streaming Giants KSQL UDFs
24 Similarities of KSQL & Kafka Streams Open Source Elastic, Scalable, Fault-tolerant Supports Streams and Tables Runs Everywhere Exactly-Once Processing Event-Time Processing Kafka Security Integration Powerful Processing incl. Filters, Transforms, Joins, Aggregations, Windowing Enterprise Support Can Be Used Together
25 Runs Everywhere, Integrates Smoothly with What You Have ...and many more...
26 Fault-Tolerance, powered by Kafka (here: KSQL)
2727 Differences
28 Differences streams KSQL You write... KSQL statements JVM applications UI included for human interaction Yes, in Confluent Enterprise No CLI included for human interaction Yes No Data formats Avro, JSON, CSV Any data format, including Avro, JSON, CSV, Protobuf, XML REST API Yes No Runtime included Yes, the KSQL server Not needed, applications run as standard JVM processes Queryable state Not yet Yes
2929 Guidance
30 KSQL is usually not yet a good fit for: BI reports & ad-hoc querying, queries with random access patterns (because no indexes, no native JDBC) ● Prefer writing and deploying JVM applications like Java and Scala; e.g. due to people skills, tech environment ● Use case is not naturally expressible through SQL, e.g. finite state machines ● Building microservices ● Must integrate with external services, or use 3rd-party libraries (but KSQL UDFs may help) ● To customize or fine-tune a use case, e.g. with Kafka Streams’ Processor API ● Need for queryable state, which is not yet supported by KSQL Start with KSQL when... Start with Kafka Streams when... streams KSQL ● New to streaming and Kafka ● To quicken and broaden the adoption & value of Kafka in your organization ● Prefer an interactive experience with UI and CLI ● Prefer SQL to writing code in Java or Scala ● Use cases include enriching data; joining data sources; filtering, transforming, and masking data; identifying anomalous events ● Use case is naturally expressible through SQL, with optional help from User Defined Functions as “get out jail free” card ● Want the power of Kafka Streams but you are not on the JVM: use the KSQL REST API from Python, Go, C#, JavaScript, shell
31 THANK YOU ! Learn more: confluent.io/download confluent.io/product/ksql/ confluent.io/confluent-cloud/

KSQL and Kafka Streams – When to Use Which, and When to Use Both

  • 1.
    1 KSQL and KafkaStreams When to use which, and when to use both Dr. Michael Noll Product Manager Confluent MÜNCHEN - 09. OKTOBER 2018
  • 2.
    2 Agenda ● KSQL andKafka Streams in 3 minutes ● Example Use Cases ● Similarities & Differences ● Guidance Duration: 20m
  • 3.
    33 KSQL and KafkaStreams in 3 minutes
  • 4.
    4 In a nutshell streams KSQL Thestreaming SQL engine for Apache Kafka® to write real-time applications in SQL Apache Kafka® library to write real-time applications and microservices in Java and Scala
  • 5.
    5 Hello, Streaming World streams KSQL Youwrite only SQL. No Java, Python, or other boilerplate to wrap around it! CREATE STREAM fraudulent_payments AS SELECT * FROM payments WHERE fraudProbability > 0.8; But you can create KSQL User Defined Functions in Java, if you want to.
  • 6.
    6 Interaction with Kafka Kafka (data) KSQL (processing) JVMapplication with Kafka Streams (processing) Does not run on Kafka brokers Does not run on Kafka brokers
  • 7.
    7 KSQL can beused interactively + programmatically ksql> POST /query 1 UI 2 CLI 3 REST 4 Headless
  • 8.
  • 9.
    9 KSQL for DataExploration An easy way to inspect your data in Kafka SHOW TOPICS; SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; PRINT 'my-topic' FROM BEGINNING;
  • 10.
    10 KSQL for DataTransformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’ VALUE_FORMAT='JSON') AS SELECT * FROM clickstream PARTITION BY user_id; Change number of partitions1 Convert data to JSON2 Repartition the data3
  • 11.
    11 KSQL for Real-Time,Streaming ETL Filter, cleanse, process data while it is in motion CREATE STREAM clicks_from_vip_users AS SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level ='Platinum'; Pick only VIP users1
  • 12.
    12 Example: CDC fromDB via Kafka to Elastic
  • 13.
    13 KSQL for Real-timeData Enrichment Join data from a variety of sources to see the full picture CREATE STREAM enriched_payments AS SELECT payment_id, c.country, total FROM payments_stream p LEFT JOIN customers_table c ON p.user_id = c.user_id; Stream-Table Join1
  • 14.
  • 15.
    15 KSQL for Real-TimeMonitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 5; Now we know to alert, and whom1
  • 16.
    16 Example: IoT, Automotive,Connected Cars streams
  • 17.
    17 KSQL for AnomalyDetection Aggregate data to identify patterns and anomalies in real-time CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3; Aggregate data1 … per 30-sec windows2
  • 18.
  • 19.
  • 20.
    20 KSQL: typical workflowfrom development to production Interactive KSQL for development Headless KSQL in production
  • 21.
    21 Kafka Streams: typicalworkflow from development to production Local development and testing with Java/Scala IDE Production
  • 22.
  • 23.
    23 subscribe(), poll(), send(), flush(),beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … Shoulders of Streaming Giants KSQL UDFs
  • 24.
    24 Similarities of KSQL& Kafka Streams Open Source Elastic, Scalable, Fault-tolerant Supports Streams and Tables Runs Everywhere Exactly-Once Processing Event-Time Processing Kafka Security Integration Powerful Processing incl. Filters, Transforms, Joins, Aggregations, Windowing Enterprise Support Can Be Used Together
  • 25.
    25 Runs Everywhere, IntegratesSmoothly with What You Have ...and many more...
  • 26.
  • 27.
  • 28.
    28 Differences streams KSQL You write...KSQL statements JVM applications UI included for human interaction Yes, in Confluent Enterprise No CLI included for human interaction Yes No Data formats Avro, JSON, CSV Any data format, including Avro, JSON, CSV, Protobuf, XML REST API Yes No Runtime included Yes, the KSQL server Not needed, applications run as standard JVM processes Queryable state Not yet Yes
  • 29.
  • 30.
    30 KSQL is usuallynot yet a good fit for: BI reports & ad-hoc querying, queries with random access patterns (because no indexes, no native JDBC) ● Prefer writing and deploying JVM applications like Java and Scala; e.g. due to people skills, tech environment ● Use case is not naturally expressible through SQL, e.g. finite state machines ● Building microservices ● Must integrate with external services, or use 3rd-party libraries (but KSQL UDFs may help) ● To customize or fine-tune a use case, e.g. with Kafka Streams’ Processor API ● Need for queryable state, which is not yet supported by KSQL Start with KSQL when... Start with Kafka Streams when... streams KSQL ● New to streaming and Kafka ● To quicken and broaden the adoption & value of Kafka in your organization ● Prefer an interactive experience with UI and CLI ● Prefer SQL to writing code in Java or Scala ● Use cases include enriching data; joining data sources; filtering, transforming, and masking data; identifying anomalous events ● Use case is naturally expressible through SQL, with optional help from User Defined Functions as “get out jail free” card ● Want the power of Kafka Streams but you are not on the JVM: use the KSQL REST API from Python, Go, C#, JavaScript, shell
  • 31.
    31 THANK YOU ! Learnmore: confluent.io/download confluent.io/product/ksql/ confluent.io/confluent-cloud/