© 2024 NetApp, Inc. All rights reserved. © 2024 NetApp, Inc. All rights reserved. Kafka Summit, Bangalore 2024 Superpower your Apache Kafka® applications development with complementary open source technologies Paul Brebner Instaclustr Technology Evangelist
© 2024 NetApp, Inc. All rights reserved. Focus on complementary technologies – different to Kafka “Colours seem more brilliant when they are in contrast with their complementary colours.” Monet
© 2024 NetApp, Inc. All rights reserved. Complementary Colours Matisse, Goldfish - Red/Green complementary colors (Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved. Contrasting flowers from the Bengaluru market Bengaluru market flowers (Paul Brebner)
© 2024 NetApp, Inc. All rights reserved. Complementary Kafka Technologies Cassandra PostgreSQL Superset Camel Cadence OpenTelemetry TensorFlow RisingWave LLMs Guava EventBus Kubernetes Prometheus Grafana Parallel Consumer OpenSearch + Dashboard Matisse, Goldfish - Red/Green complementary colors (Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved. C.f. analogous Kafka technologies • Apache Pulsar, Flink, Storm, Spark Streaming, Beam, ActiveMQ, RocketMQ, StreamPark, RisingWave etc. Van Gogh, Sunflowers on Yellow Background, (Source: Wikimedia) But we will look at RisingWave
© 2024 NetApp, Inc. All rights reserved. Approach Use Cases Technologies Superpowers
© 2024 NetApp, Inc. All rights reserved. 0. Apache Kafka®
© 2024 NetApp, Inc. All rights reserved. Apache Kafka® Postal Delivery Service Railway Post Office: Mail bags snatched by speeding train (Source: Wikimedia CCL)
© 2024 NetApp, Inc. All rights reserved. Apache Kafka visual introduction My first Kafka talk: Visual introduction to a Kafka postal service
© 2024 NetApp, Inc. All rights reserved. Christmas tree lights simulation Christmas 2017 My first Kafka demo application 100% Kafka A simple simulation – to start with
© 2024 NetApp, Inc. All rights reserved. Use case 1: “Kongo” IoT logistics simulation • Real-time logistics • IoT transportation and rules checking • Complex simulation
© 2024 NetApp, Inc. All rights reserved. Design 1: Pure Kafka, many topics 1000s of locations (warehouses, trucks) and millions of goods Each location has a topic and multiple consumer groups (all goods at that location) 7,000 TPS → SLOW! Many topics/partitions (without increasing cluster resources) reduced throughput on older versions of Kafka
© 2024 NetApp, Inc. All rights reserved. 1. Guava EventBus
© 2024 NetApp, Inc. All rights reserved. Guava EventBus Telegram messengers (Source: Wikimedia CCL)
© 2024 NetApp, Inc. All rights reserved. Design 2: One topic + Guava EventBus for notifications Single topic, one consumer group Kafka supplemented with Guava Event Bus to handle high fan-out notifications 1.2M TPS → FAST! Uber’s Cadence can be/has been used for scalable notifications
© 2024 NetApp, Inc. All rights reserved. Use case 2: Anomaly detection at scale One of these things is not like the others… (Source: Shutterstock)
© 2024 NetApp, Inc. All rights reserved. Streaming anomaly detection Incoming Event Stream Run Anomaly Check – Quickly! Persist new event Get previous 50 events for key Run algorithm Fast writes → Cassandra Application scaling → Kubernetes Initially single threaded consumers
© 2024 NetApp, Inc. All rights reserved. 2. Apache Cassandra®
© 2024 NetApp, Inc. All rights reserved. Apache Cassandra® Fast Writes Office typing pool, 1918 (Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved. Apache Cassandra® What? • NoSQL horizontally scalable key-value database Superpowers • Fast writes (lots of typewriters) • Wide column store • Good for ML feature stores • Clustering columns • Good for hierarchical data modeling (eg. Geospatial) • In-built multi-DC replication
© 2024 NetApp, Inc. All rights reserved. 3. Kubernetes
© 2024 NetApp, Inc. All rights reserved. Kubernetes Greek Triremes ruled the seas Captained by Helmsmen (Kubernetes) (Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved. Kubernetes What? • Automation of containerized applications Superpowers • Available on public clouds (E.g. AWS EKS) • Ephemeral Pods are the unit of concurrency • Easy to scale applications with more or less Pods
© 2024 NetApp, Inc. All rights reserved. But scalability isn’t great
© 2024 NetApp, Inc. All rights reserved. 4. Prometheus 5. Grafana
© 2024 NetApp, Inc. All rights reserved. Kubernetes Abacus counting (Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved. Prometheus + Grafana What? • Prometheus: Monitoring and alerting • Grafana: Graphing Superpowers • Instrumentation or agents (exporters) to expose application metrics • Time series data with counter, gauge, histogram, and summary metrics • Instaclustr monitoring API supports Prometheus metrics for Apache Kafka clusters • Integration of Kafka Cluster metrics and Kafka application (e.g. producers and consumers) is powerful à Metrics suggested optimizations
© 2024 NetApp, Inc. All rights reserved. Slow Kafka consumers problem Slow consumers require more partitions/consumers (Source: Getty Images) Little’s Law: Concurrency (Partitions=Consumers) = Time x Throughput
© 2024 NetApp, Inc. All rights reserved. 2 pool solution The famous Bondi Ocean Pool in Sydney Australia has 2 pools (Source: Shutterstock)
© 2024 NetApp, Inc. All rights reserved. Optimize consumer speed/concurrency using 2 stage pipeline Less consumers (around 100) gives higher throughput— a surprise! Hint: Less partitions 1. Minimize polling time (thread pool 1) 2. Maximize anomaly detector concurrency (thread pool 2) 1 2
© 2024 NetApp, Inc. All rights reserved. 19 billion checks/day after tuning
© 2024 NetApp, Inc. All rights reserved. 6. Kafka Parallel Consumer
© 2024 NetApp, Inc. All rights reserved. Kafka Parallel Consumer Jacquard Loom, Berlin Makes multiple ribbons concurrently (Source: Paul Brebner)
© 2024 NetApp, Inc. All rights reserved. Kafka Parallel Consumer: Multi-threaded consumer • Multiple ordering options—c.f. default Kafka only guarantees order within partitions! PARTITION → KEY → UNORDERED Increasing concurrency → • Concurrency from 1 to lots—depends on client resources, and partitions/key space sizes • KEY has higher concurrency than partition and is ordered by KEY— reasonable compromise • Higher concurrency for less partitions/consumers
© 2024 NetApp, Inc. All rights reserved. Experimental results 3, 50, and 200 times improvement, unordered best 1 consumer 10 partitions 100 keys 10ms latency
© 2024 NetApp, Inc. All rights reserved. Use case 3: Pipelines Berlin “Beer” (?) Pipeline (Source: Paul Brebner)
© 2024 NetApp, Inc. All rights reserved. Kafka® Connect data pipelines REST Tidal Data to OpenSearch REST Tidal Data to PostgreSQL + Superset Alternative sinks Kafka Connectors
© 2024 NetApp, Inc. All rights reserved. 7. OpenSearch 8. Dashboard
© 2024 NetApp, Inc. All rights reserved. OpenSearch + Dashboard Library of Congress Card Division 1919 (city block long) (Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved. OpenSearch + Dashboard What? • Open source version of Elasticsearch • Based on Lucene—powerful and scalable text searching Superpowers • Ingestion, indexing, and searching of JSON documents • Complex linguistic and geospatial queries • Integrated dashboard for visualization
© 2024 NetApp, Inc. All rights reserved. 9. PostgreSQL® ®
© 2024 NetApp, Inc. All rights reserved. PostgreSQL® Elephant vs. tree Elephants are powerful (Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved. PostgreSQL® What? • Powerful SQL database Superpowers • Extensible • JSONB+GIN indexes (efficient storage and search of JSON) ®
© 2024 NetApp, Inc. All rights reserved. 10. Apache Superset™
© 2024 NetApp, Inc. All rights reserved. Apache Superset™ Superhero Supersets All superheroes (B) are a superset of those who use weapons (A) (Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved. Apache Superset™ What? • Powerful data visualization tool Superpowers • Reads from SQL sources • Lots of visualization and graph types, including geospatial
© 2024 NetApp, Inc. All rights reserved. 11. Apache Camel™
© 2024 NetApp, Inc. All rights reserved. Apache Camel™ Camel train (Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved. Apache Camel™ What? • Apache Camel – integration framework • Apache Camel Kafka Connectors Superpowers • Large number of open source Kafka Connectors—179 sources and sinks • Auto-generated from Camel components
© 2024 NetApp, Inc. All rights reserved. Use case 4: Drone delivery (Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved. 12. Uber’s Cadence®
© 2024 NetApp, Inc. All rights reserved. Cadence® Railway signal“man” (signalwoman!) (Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved. Uber’s Cadence® What? • Scalable code-as-workflows engine Superpowers • Sequenced, stateful, long-running, scheduled steps • Scalable and reliable using event-sourcing o Workflows are failproof, history is replayed until the point of failure and resumed
© 2024 NetApp, Inc. All rights reserved. Drone delivery application Computationally expensive mission critical calculations Kafka microservices integration of fast/slow systems
© 2024 NetApp, Inc. All rights reserved. Drone way point flight calculations Returning to base leg • Drone flight path is computed in an activity • Using location, distance, bearing, speed, and charge • Every 10 seconds • On failure, the drone won’t crash and will continue flying from the last location
© 2024 NetApp, Inc. All rights reserved. Uber’s Cadence + Apache Kafka = similarities Cadence (Workflows) Kafka (Streaming Events) Scalable (event sourcing) Scalable (partitions, cluster) Persistent (event sourcing) Persistent (event replaying) Reliable workflow execution (event sourcing) Reliable event delivery Asynchronous signals Asynchronous events Open source Open source Available as a managed service Available as a managed service
© 2024 NetApp, Inc. All rights reserved. Uber’s Cadence = Orchestration (synchronous/timed sequences) (Source: Getty Images) Different architectural (musical) styles
© 2024 NetApp, Inc. All rights reserved. Apache Kafka = Choreography (asynchronous) Different architectural (musical) styles (Source: Getty Images)
© 2024 NetApp, Inc. All rights reserved. Combined Cadence + Kafka = Ballet! Integrated in a new style
© 2024 NetApp, Inc. All rights reserved. Cadence + Kafka = Complementary timescales (Source: Getty Images)
© 2024 NetApp, Inc. All rights reserved. Cadence + Kafka = Complementary timescales Cadence (Slow Workflows) Kafka (Fast Streaming Events) Synchronous events Asynchronous events Stateful flows Stateless events Sequences One-off events Slow/long running flows Fast/instantaneous events Sleep/schedule events Real-time processing of events Complex flow logic Complex stream processing (Kafka Streams)
© 2024 NetApp, Inc. All rights reserved. Cadence + Kafka = Integration → Drone Ballet Drone show, Japan (Source: Getty Images)
© 2024 NetApp, Inc. All rights reserved. How many drones can we fly? (Source: Shutterstock)
© 2024 NetApp, Inc. All rights reserved. Cluster Details (VCPUS): Client (8), Cadence (6), Cassandra (18)
© 2024 NetApp, Inc. All rights reserved. Load test: 2,000 drones + 2,000 orders = 4,000 workflows
© 2024 NetApp, Inc. All rights reserved. 20 Drones flying Purple = base Black = drone Orange = shop Red = delivery location Green = successful delivery
© 2024 NetApp, Inc. All rights reserved. Use case 5: Streaming ML (Source: Getty Images) (Source: Getty Images) Busy! Not Busy! Shop busy/not busy prediction
© 2024 NetApp, Inc. All rights reserved. Drone learning problem Kafka Streams Kafka Streams computes aggregated hourly shop and order details → Busy/NotBusy categorization Sent to TensorFlow Train model to predict shop busy/not busy an hour ahead Simulation produces streaming spatiotemporal data (drone and order state and locations)
© 2024 NetApp, Inc. All rights reserved. 13. TensorFlow
© 2024 NetApp, Inc. All rights reserved. TensorFlow What does the future hold? (Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved. TensorFlow What? • Neural network ML library Superpowers • Supports incremental ML • From streaming Kafka data
© 2024 NetApp, Inc. All rights reserved. TensorFlow Watch out for • ML over streaming spatiotemporal data with concept drifts is tricky o Time/space bias - Wild model accuracy oscillation o Concept shift can result in very low-accuracy models initially - Train/use multiple models
© 2024 NetApp, Inc. All rights reserved. Use case 6: Santa’s elves' toy and box packing KafkaStreams, ChatGPT, RisingWave, and OpenTelemetry Streaming joins to match toys and boxes (Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved. 14. OpenTelemetry
© 2024 NetApp, Inc. All rights reserved. OpenTelemetry X-ray vision! (Source: Wikimedia Public Domain)
© 2024 NetApp, Inc. All rights reserved. OpenTelemetry • OpenTelemetry is the new standard for distributed tracing • Combines tracing (OpenTracing), metrics, and logs • Automatic instrumentation • Lots of open source visualization tools - Jager, SigNoz, Uptrace, etc. • Used in new client monitoring KIP-714 - Kafka 3.7.0
© 2024 NetApp, Inc. All rights reserved. SigNoz service map for toy+boxes application
© 2024 NetApp, Inc. All rights reserved. 15. RisingWave
© 2024 NetApp, Inc. All rights reserved. RisingWave Wave processing (Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved. RisingWave What? • Stream processing database—also as a service Superpowers • Stateful stream processing o SQL syntax o Using cloud native storage o Potential replacement for Kafka Streams • PostgreSQL compatible o Works with Apache Superset for visualization
© 2024 NetApp, Inc. All rights reserved. 16. LLMs
© 2024 NetApp, Inc. All rights reserved. LLMs The Answer? (Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved. LLMs/GenAI • E.g. ChatGPT - not open source + there may be suitable open source alternatives for code generation • Worked well to generate + Kafka clients + Kafka Streams DSL + and test-cases • Not as accurate for RisingWave - lack of examples?
© 2024 NetApp, Inc. All rights reserved. Bonus Technologies from my Instaclustr colleagues ● Kafka benchmarking ○ Apache JMeter for Kafka benchmarking (Thanks to Anup Shirolkar) ○ OpenMessaging (Thanks to Alastair Daivis) ● Strimzi – a Kafka Operator for Kubernetes, and Debezium (CDC using Kafka Connect) (Thanks to Felix Alipaz-Dicke) ● Kafka GUIs (Thanks to Ana-Maria Minda) ○ Kafdrop ○ AKHQ ○ UI for Apache Kafka ○ These all work with Kafka + Instaclustr console and provide complementary features
© 2024 NetApp, Inc. All rights reserved. Ballet pattern à Hanoi street intersection pattern ● A working integrated synchronous + asynchronous system
© 2024 NetApp, Inc. All rights reserved. I survived as a pedestrian!
© 2024 NetApp, Inc. All rights reserved. Try us out • We offer Apache Kafka and these open source technologies as a managed service • You can use the others with our managed services • FREE 30-day trial of developer- sized clusters
© 2024 NetApp, Inc. All rights reserved. Paul Brebner | Instaclustr Technology Evangelist www.Instaclustr.com/paul-brebner à All my blogs Thank You!

Superpower Your Apache Kafka Applications Development with Complementary Open Source Technologies

  • 1.
    © 2024 NetApp,Inc. All rights reserved. © 2024 NetApp, Inc. All rights reserved. Kafka Summit, Bangalore 2024 Superpower your Apache Kafka® applications development with complementary open source technologies Paul Brebner Instaclustr Technology Evangelist
  • 2.
    © 2024 NetApp,Inc. All rights reserved. Focus on complementary technologies – different to Kafka “Colours seem more brilliant when they are in contrast with their complementary colours.” Monet
  • 3.
    © 2024 NetApp,Inc. All rights reserved. Complementary Colours Matisse, Goldfish - Red/Green complementary colors (Source: Wikimedia)
  • 4.
    © 2024 NetApp,Inc. All rights reserved. Contrasting flowers from the Bengaluru market Bengaluru market flowers (Paul Brebner)
  • 5.
    © 2024 NetApp,Inc. All rights reserved. Complementary Kafka Technologies Cassandra PostgreSQL Superset Camel Cadence OpenTelemetry TensorFlow RisingWave LLMs Guava EventBus Kubernetes Prometheus Grafana Parallel Consumer OpenSearch + Dashboard Matisse, Goldfish - Red/Green complementary colors (Source: Wikimedia)
  • 6.
    © 2024 NetApp,Inc. All rights reserved. C.f. analogous Kafka technologies • Apache Pulsar, Flink, Storm, Spark Streaming, Beam, ActiveMQ, RocketMQ, StreamPark, RisingWave etc. Van Gogh, Sunflowers on Yellow Background, (Source: Wikimedia) But we will look at RisingWave
  • 7.
    © 2024 NetApp,Inc. All rights reserved. Approach Use Cases Technologies Superpowers
  • 8.
    © 2024 NetApp,Inc. All rights reserved. 0. Apache Kafka®
  • 9.
    © 2024 NetApp,Inc. All rights reserved. Apache Kafka® Postal Delivery Service Railway Post Office: Mail bags snatched by speeding train (Source: Wikimedia CCL)
  • 10.
    © 2024 NetApp,Inc. All rights reserved. Apache Kafka visual introduction My first Kafka talk: Visual introduction to a Kafka postal service
  • 11.
    © 2024 NetApp,Inc. All rights reserved. Christmas tree lights simulation Christmas 2017 My first Kafka demo application 100% Kafka A simple simulation – to start with
  • 12.
    © 2024 NetApp,Inc. All rights reserved. Use case 1: “Kongo” IoT logistics simulation • Real-time logistics • IoT transportation and rules checking • Complex simulation
  • 13.
    © 2024 NetApp,Inc. All rights reserved. Design 1: Pure Kafka, many topics 1000s of locations (warehouses, trucks) and millions of goods Each location has a topic and multiple consumer groups (all goods at that location) 7,000 TPS → SLOW! Many topics/partitions (without increasing cluster resources) reduced throughput on older versions of Kafka
  • 14.
    © 2024 NetApp,Inc. All rights reserved. 1. Guava EventBus
  • 15.
    © 2024 NetApp,Inc. All rights reserved. Guava EventBus Telegram messengers (Source: Wikimedia CCL)
  • 16.
    © 2024 NetApp,Inc. All rights reserved. Design 2: One topic + Guava EventBus for notifications Single topic, one consumer group Kafka supplemented with Guava Event Bus to handle high fan-out notifications 1.2M TPS → FAST! Uber’s Cadence can be/has been used for scalable notifications
  • 17.
    © 2024 NetApp,Inc. All rights reserved. Use case 2: Anomaly detection at scale One of these things is not like the others… (Source: Shutterstock)
  • 18.
    © 2024 NetApp,Inc. All rights reserved. Streaming anomaly detection Incoming Event Stream Run Anomaly Check – Quickly! Persist new event Get previous 50 events for key Run algorithm Fast writes → Cassandra Application scaling → Kubernetes Initially single threaded consumers
  • 19.
    © 2024 NetApp,Inc. All rights reserved. 2. Apache Cassandra®
  • 20.
    © 2024 NetApp,Inc. All rights reserved. Apache Cassandra® Fast Writes Office typing pool, 1918 (Source: Wikimedia)
  • 21.
    © 2024 NetApp,Inc. All rights reserved. Apache Cassandra® What? • NoSQL horizontally scalable key-value database Superpowers • Fast writes (lots of typewriters) • Wide column store • Good for ML feature stores • Clustering columns • Good for hierarchical data modeling (eg. Geospatial) • In-built multi-DC replication
  • 22.
    © 2024 NetApp,Inc. All rights reserved. 3. Kubernetes
  • 23.
    © 2024 NetApp,Inc. All rights reserved. Kubernetes Greek Triremes ruled the seas Captained by Helmsmen (Kubernetes) (Source: Wikimedia)
  • 24.
    © 2024 NetApp,Inc. All rights reserved. Kubernetes What? • Automation of containerized applications Superpowers • Available on public clouds (E.g. AWS EKS) • Ephemeral Pods are the unit of concurrency • Easy to scale applications with more or less Pods
  • 25.
    © 2024 NetApp,Inc. All rights reserved. But scalability isn’t great
  • 26.
    © 2024 NetApp,Inc. All rights reserved. 4. Prometheus 5. Grafana
  • 27.
    © 2024 NetApp,Inc. All rights reserved. Kubernetes Abacus counting (Source: Wikimedia)
  • 28.
    © 2024 NetApp,Inc. All rights reserved. Prometheus + Grafana What? • Prometheus: Monitoring and alerting • Grafana: Graphing Superpowers • Instrumentation or agents (exporters) to expose application metrics • Time series data with counter, gauge, histogram, and summary metrics • Instaclustr monitoring API supports Prometheus metrics for Apache Kafka clusters • Integration of Kafka Cluster metrics and Kafka application (e.g. producers and consumers) is powerful à Metrics suggested optimizations
  • 29.
    © 2024 NetApp,Inc. All rights reserved. Slow Kafka consumers problem Slow consumers require more partitions/consumers (Source: Getty Images) Little’s Law: Concurrency (Partitions=Consumers) = Time x Throughput
  • 30.
    © 2024 NetApp,Inc. All rights reserved. 2 pool solution The famous Bondi Ocean Pool in Sydney Australia has 2 pools (Source: Shutterstock)
  • 31.
    © 2024 NetApp,Inc. All rights reserved. Optimize consumer speed/concurrency using 2 stage pipeline Less consumers (around 100) gives higher throughput— a surprise! Hint: Less partitions 1. Minimize polling time (thread pool 1) 2. Maximize anomaly detector concurrency (thread pool 2) 1 2
  • 32.
    © 2024 NetApp,Inc. All rights reserved. 19 billion checks/day after tuning
  • 33.
    © 2024 NetApp,Inc. All rights reserved. 6. Kafka Parallel Consumer
  • 34.
    © 2024 NetApp,Inc. All rights reserved. Kafka Parallel Consumer Jacquard Loom, Berlin Makes multiple ribbons concurrently (Source: Paul Brebner)
  • 35.
    © 2024 NetApp,Inc. All rights reserved. Kafka Parallel Consumer: Multi-threaded consumer • Multiple ordering options—c.f. default Kafka only guarantees order within partitions! PARTITION → KEY → UNORDERED Increasing concurrency → • Concurrency from 1 to lots—depends on client resources, and partitions/key space sizes • KEY has higher concurrency than partition and is ordered by KEY— reasonable compromise • Higher concurrency for less partitions/consumers
  • 36.
    © 2024 NetApp,Inc. All rights reserved. Experimental results 3, 50, and 200 times improvement, unordered best 1 consumer 10 partitions 100 keys 10ms latency
  • 37.
    © 2024 NetApp,Inc. All rights reserved. Use case 3: Pipelines Berlin “Beer” (?) Pipeline (Source: Paul Brebner)
  • 38.
    © 2024 NetApp,Inc. All rights reserved. Kafka® Connect data pipelines REST Tidal Data to OpenSearch REST Tidal Data to PostgreSQL + Superset Alternative sinks Kafka Connectors
  • 39.
    © 2024 NetApp,Inc. All rights reserved. 7. OpenSearch 8. Dashboard
  • 40.
    © 2024 NetApp,Inc. All rights reserved. OpenSearch + Dashboard Library of Congress Card Division 1919 (city block long) (Source: Wikimedia)
  • 41.
    © 2024 NetApp,Inc. All rights reserved. OpenSearch + Dashboard What? • Open source version of Elasticsearch • Based on Lucene—powerful and scalable text searching Superpowers • Ingestion, indexing, and searching of JSON documents • Complex linguistic and geospatial queries • Integrated dashboard for visualization
  • 42.
    © 2024 NetApp,Inc. All rights reserved. 9. PostgreSQL® ®
  • 43.
    © 2024 NetApp,Inc. All rights reserved. PostgreSQL® Elephant vs. tree Elephants are powerful (Source: Adobe Stock)
  • 44.
    © 2024 NetApp,Inc. All rights reserved. PostgreSQL® What? • Powerful SQL database Superpowers • Extensible • JSONB+GIN indexes (efficient storage and search of JSON) ®
  • 45.
    © 2024 NetApp,Inc. All rights reserved. 10. Apache Superset™
  • 46.
    © 2024 NetApp,Inc. All rights reserved. Apache Superset™ Superhero Supersets All superheroes (B) are a superset of those who use weapons (A) (Source: Adobe Stock)
  • 47.
    © 2024 NetApp,Inc. All rights reserved. Apache Superset™ What? • Powerful data visualization tool Superpowers • Reads from SQL sources • Lots of visualization and graph types, including geospatial
  • 48.
    © 2024 NetApp,Inc. All rights reserved. 11. Apache Camel™
  • 49.
    © 2024 NetApp,Inc. All rights reserved. Apache Camel™ Camel train (Source: Adobe Stock)
  • 50.
    © 2024 NetApp,Inc. All rights reserved. Apache Camel™ What? • Apache Camel – integration framework • Apache Camel Kafka Connectors Superpowers • Large number of open source Kafka Connectors—179 sources and sinks • Auto-generated from Camel components
  • 51.
    © 2024 NetApp,Inc. All rights reserved. Use case 4: Drone delivery (Source: Adobe Stock)
  • 52.
    © 2024 NetApp,Inc. All rights reserved. 12. Uber’s Cadence®
  • 53.
    © 2024 NetApp,Inc. All rights reserved. Cadence® Railway signal“man” (signalwoman!) (Source: Wikimedia)
  • 54.
    © 2024 NetApp,Inc. All rights reserved. Uber’s Cadence® What? • Scalable code-as-workflows engine Superpowers • Sequenced, stateful, long-running, scheduled steps • Scalable and reliable using event-sourcing o Workflows are failproof, history is replayed until the point of failure and resumed
  • 55.
    © 2024 NetApp,Inc. All rights reserved. Drone delivery application Computationally expensive mission critical calculations Kafka microservices integration of fast/slow systems
  • 56.
    © 2024 NetApp,Inc. All rights reserved. Drone way point flight calculations Returning to base leg • Drone flight path is computed in an activity • Using location, distance, bearing, speed, and charge • Every 10 seconds • On failure, the drone won’t crash and will continue flying from the last location
  • 57.
    © 2024 NetApp,Inc. All rights reserved. Uber’s Cadence + Apache Kafka = similarities Cadence (Workflows) Kafka (Streaming Events) Scalable (event sourcing) Scalable (partitions, cluster) Persistent (event sourcing) Persistent (event replaying) Reliable workflow execution (event sourcing) Reliable event delivery Asynchronous signals Asynchronous events Open source Open source Available as a managed service Available as a managed service
  • 58.
    © 2024 NetApp,Inc. All rights reserved. Uber’s Cadence = Orchestration (synchronous/timed sequences) (Source: Getty Images) Different architectural (musical) styles
  • 59.
    © 2024 NetApp,Inc. All rights reserved. Apache Kafka = Choreography (asynchronous) Different architectural (musical) styles (Source: Getty Images)
  • 60.
    © 2024 NetApp,Inc. All rights reserved. Combined Cadence + Kafka = Ballet! Integrated in a new style
  • 61.
    © 2024 NetApp,Inc. All rights reserved. Cadence + Kafka = Complementary timescales (Source: Getty Images)
  • 62.
    © 2024 NetApp,Inc. All rights reserved. Cadence + Kafka = Complementary timescales Cadence (Slow Workflows) Kafka (Fast Streaming Events) Synchronous events Asynchronous events Stateful flows Stateless events Sequences One-off events Slow/long running flows Fast/instantaneous events Sleep/schedule events Real-time processing of events Complex flow logic Complex stream processing (Kafka Streams)
  • 63.
    © 2024 NetApp,Inc. All rights reserved. Cadence + Kafka = Integration → Drone Ballet Drone show, Japan (Source: Getty Images)
  • 64.
    © 2024 NetApp,Inc. All rights reserved. How many drones can we fly? (Source: Shutterstock)
  • 65.
    © 2024 NetApp,Inc. All rights reserved. Cluster Details (VCPUS): Client (8), Cadence (6), Cassandra (18)
  • 66.
    © 2024 NetApp,Inc. All rights reserved. Load test: 2,000 drones + 2,000 orders = 4,000 workflows
  • 67.
    © 2024 NetApp,Inc. All rights reserved. 20 Drones flying Purple = base Black = drone Orange = shop Red = delivery location Green = successful delivery
  • 68.
    © 2024 NetApp,Inc. All rights reserved. Use case 5: Streaming ML (Source: Getty Images) (Source: Getty Images) Busy! Not Busy! Shop busy/not busy prediction
  • 69.
    © 2024 NetApp,Inc. All rights reserved. Drone learning problem Kafka Streams Kafka Streams computes aggregated hourly shop and order details → Busy/NotBusy categorization Sent to TensorFlow Train model to predict shop busy/not busy an hour ahead Simulation produces streaming spatiotemporal data (drone and order state and locations)
  • 70.
    © 2024 NetApp,Inc. All rights reserved. 13. TensorFlow
  • 71.
    © 2024 NetApp,Inc. All rights reserved. TensorFlow What does the future hold? (Source: Adobe Stock)
  • 72.
    © 2024 NetApp,Inc. All rights reserved. TensorFlow What? • Neural network ML library Superpowers • Supports incremental ML • From streaming Kafka data
  • 73.
    © 2024 NetApp,Inc. All rights reserved. TensorFlow Watch out for • ML over streaming spatiotemporal data with concept drifts is tricky o Time/space bias - Wild model accuracy oscillation o Concept shift can result in very low-accuracy models initially - Train/use multiple models
  • 74.
    © 2024 NetApp,Inc. All rights reserved. Use case 6: Santa’s elves' toy and box packing KafkaStreams, ChatGPT, RisingWave, and OpenTelemetry Streaming joins to match toys and boxes (Source: Adobe Stock)
  • 75.
    © 2024 NetApp,Inc. All rights reserved. 14. OpenTelemetry
  • 76.
    © 2024 NetApp,Inc. All rights reserved. OpenTelemetry X-ray vision! (Source: Wikimedia Public Domain)
  • 77.
    © 2024 NetApp,Inc. All rights reserved. OpenTelemetry • OpenTelemetry is the new standard for distributed tracing • Combines tracing (OpenTracing), metrics, and logs • Automatic instrumentation • Lots of open source visualization tools - Jager, SigNoz, Uptrace, etc. • Used in new client monitoring KIP-714 - Kafka 3.7.0
  • 78.
    © 2024 NetApp,Inc. All rights reserved. SigNoz service map for toy+boxes application
  • 79.
    © 2024 NetApp,Inc. All rights reserved. 15. RisingWave
  • 80.
    © 2024 NetApp,Inc. All rights reserved. RisingWave Wave processing (Source: Adobe Stock)
  • 81.
    © 2024 NetApp,Inc. All rights reserved. RisingWave What? • Stream processing database—also as a service Superpowers • Stateful stream processing o SQL syntax o Using cloud native storage o Potential replacement for Kafka Streams • PostgreSQL compatible o Works with Apache Superset for visualization
  • 82.
    © 2024 NetApp,Inc. All rights reserved. 16. LLMs
  • 83.
    © 2024 NetApp,Inc. All rights reserved. LLMs The Answer? (Source: Wikimedia)
  • 84.
    © 2024 NetApp,Inc. All rights reserved. LLMs/GenAI • E.g. ChatGPT - not open source + there may be suitable open source alternatives for code generation • Worked well to generate + Kafka clients + Kafka Streams DSL + and test-cases • Not as accurate for RisingWave - lack of examples?
  • 85.
    © 2024 NetApp,Inc. All rights reserved. Bonus Technologies from my Instaclustr colleagues ● Kafka benchmarking ○ Apache JMeter for Kafka benchmarking (Thanks to Anup Shirolkar) ○ OpenMessaging (Thanks to Alastair Daivis) ● Strimzi – a Kafka Operator for Kubernetes, and Debezium (CDC using Kafka Connect) (Thanks to Felix Alipaz-Dicke) ● Kafka GUIs (Thanks to Ana-Maria Minda) ○ Kafdrop ○ AKHQ ○ UI for Apache Kafka ○ These all work with Kafka + Instaclustr console and provide complementary features
  • 86.
    © 2024 NetApp,Inc. All rights reserved. Ballet pattern à Hanoi street intersection pattern ● A working integrated synchronous + asynchronous system
  • 87.
    © 2024 NetApp,Inc. All rights reserved. I survived as a pedestrian!
  • 88.
    © 2024 NetApp,Inc. All rights reserved. Try us out • We offer Apache Kafka and these open source technologies as a managed service • You can use the others with our managed services • FREE 30-day trial of developer- sized clusters
  • 89.
    © 2024 NetApp,Inc. All rights reserved. Paul Brebner | Instaclustr Technology Evangelist www.Instaclustr.com/paul-brebner à All my blogs Thank You!