The Rise of Data in Motion Serverless and Cloud-Native Event Streaming on AWS with Confluent Cloud
© 2021, Amazon Web Services, Inc. or its Affiliates. Customers want more value from their data Used by many people Growing Exponentially From new sources Increasingly diverse Analyzed by many applications
© 2021, Amazon Web Services, Inc. or its Affiliates. • The number of “smart” devices is projected to be 200 billion by 2020 (over 100X increase in ten years) • 90% of the data in the world was generated in the last 2 years • There are 2.5 quintillion bytes of data created each day, and this pace is accelerating The volume of data being produced is increasing Source
© 2021, Amazon Web Services, Inc. or its Affiliates. Customers moving from traditional data warehouse approach Data silos to OLTP ERP CRM LO B DW Silo 1 Business Intelligence Device s Web Sensors Socia l DW Silo 2 Business Intelligence Data Lake Non- relational databases Machine learning Data warehousing Log analytics Big data processing Relational databases
© 2021, Amazon Web Services, Inc. or its Affiliates. Lake House architecture SCALABLE DATA LAKES PURPOSE-BUILT DATA SERVICES SEAMLESS DATA MOVEMENT UNIFIED GOVERNANCE PERFORMANT AND COST-EFFECTIVE Non-relational databases Machine learning Data warehousing Log analytics Big data processing Relational databases Data lake
© 2021, Amazon Web Services, Inc. or its Affiliates. Lake House architecture on AWS SCALABLE DATA LAKES PURPOSE-BUILT DATA SERVICES SEAMLESS DATA MOVEMENT UNIFIED GOVERNANCE PERFORMANT AND COST-EFFECTIVE Amazon DynamoDB Amazon SageMaker Amazon Redshift Amazon Elasticsearch Service Amazon EMR Amazon S3 Amazon Aurora Amazon Athena
© 2021, Amazon Web Services, Inc. or its Affiliates. The value of data diminishes over time
This is a fundamental paradigm shift... 8 Infrastructure as code Data in motion as continuous streams of events Future of the datacenter Future of data Cloud Event Streaming
An Event Streaming Platform is the Underpinning of an Event-driven Architecture 9 MES ERP Sensors Mobile Customer 360 Real-time Alerting System Data warehouse Producers Consumers Streams of real time events Stream processing apps Connectors Connectors Stream processing apps Supplier Alert Forecast Inventory Customer Order
Car Engine Car Self-driving Car Confluent Completes Apache Kafka
Truly CLOUD-NATIVE experience at the edge, in the data center, and in the cloud Confluent Cloud A fully managed, cloud-native service for Apache Kafka Confluent Platform A complete, enterprise-grade distribution of Apache Kafka Confluent for Kubernetes Ansible Playbooks Packages: Docker, RPMs, Tarball Public Cloud Workloads Edge and On-Premise Workloads On Kubernetes On VMs / Bare Metal Wavelength
STREAM PROCESSING CONNECTORS Example Architecture for Event Streaming ksqlDB KStreams Processing Data in Motion with Confluent Cloud on AWS Dashboard Oracle DB Oracle CDC CONNECTOR Salesforce CDC CONNECTOR Salesforce Source / Sink CONNECTOR Fraud Detection App
Context-specific Customer 360 13 Electrical retailer Hyper-personalized online retail experience, turning each customer visit into a one-on-one marketing opportunity Correlation of historical customer data with real- time digital signals Maximize customer satisfaction and revenue growth, increased customer conversions https://www.confluent.io/customers/ao/
Ingest & Process Capture event streams with a consistent data structure using Schema Registry, develop real-time ETL pipelines with a lightweight SQL syntax using ksqlDB & unify real-time streams with batch processing using +100 Confluent Connectors Derive insights from data in real-time Mobile Web IoT Data store AWS & On-prem Amazon S3 S3 Sink ANALYZE Amazon Redshift AWS Lake Formation Amazon Athena Redshift Sink TRANSFORM Amazon EMR AWS Data Pipeline AWS Glue Source connectors Store & Analyze Stream data with Confluent pre-built Connectors into your AWS data lake or data warehouse to execute queries on vast amounts of streaming data for real-time and batch analytics VISUALIZE Amazon Elasticsearch Schema Registry ksqlDB Events Real-time analytics
Serverless integration Connect existing and apps & data stores in a repeatable way without having to manage- Apache Kafka, Schema Registry to maintain app compatibility, ksqlDB to develop real-time apps with SQL syntax and Connect for effortless integrations with Lambda & data stores AWS serverless platform Stop provisioning, maintaining or administering servers for backend components such as compute, databases and storage so that you can focus on increasing agility and innovation for your developer teams Increase developer agility & speed of innovation Apps Microservices ksqlDB Schema Registry COMPUTE AWS Lambda Data stores REST Proxy & Clients Source Connectors Lambda Sink DATA STORES Amazon DynamoDB Amazon Aurora STORAGE Amazon S3 S3 Sink ANALYTICS Amazon Athena Amazon Redshift Serverless app integration
Accelerate modernization from on-prem to AWS Redshift Sink Lambda Sink AWS Direct Connect LEGACY EDW MAINFRAME LEGACY DB JDBC / CDC connectors Connect Leverage +100 Confluent pre-built connectors to continuously bring valuable data from existing services on-prem including enterprise data warehouse, databases and mainframes Modernize Increase agility in getting applications to market and reduce TCO when freeing up resources to focus on value generating activities and not in managing servers On-prem AWS Cloud Bridge Hybrid cloud streaming with consistent, event- driven architecture for modern apps On-prem to AWS modernization Amazon Athena AWS Glue SageMaker Lake Formation Amazon DynamoDB Amazon Aurora S3 Sink Data Streams Apps ksqlDB Cluster Linking
Low Latency 5G Use Cases with AWS Wavelength (based on AWS Outposts) and Confluent
Global Event Streaming Streaming Replication between Clusters across Cloud, On-Prem and Edge Bridge to Databases, Data Lakes, Apps, APIs, SaaS Aggregate Small Footprint Edge Deployments with Replication (Aggregation) Simplify Disaster Recovery Operations with Multi-Region Clusters for RPO=0 and RTO~0 Stream Data Globally with Replication and Cluster Linking 18
Omnichannel Retail Time P C3 C2 C1 Sales Talk on site in Car Dealership Right now Location-based Customer Action Customer 360 (Website, Mobile App, On Site in Store, In-Car) Car Configurator 10 and 8 days ago Context-specific Marketing Campaign 90 and 60 days ago AWS Lambda
Omnichannel Retail Time P C3 C2 C1 Machine Learning Context-specific Recommendations Location-based Customer Action Customer 360 (Business Intelligence, Machine Learning) Machine Learning Train Recommendation Engine Reporting All Customer Interactions Amazon Athena Amazon SageMaker
CRM 3rd party payment provider Context-specific real-time upsell Customer data Payment processing and fraud detection as a service Manager Get report API Customer Customer Customer data Train schedule Payment data Loyalty information Streams of real time events Customer data Train schedule Payment data Loyalty information Streams of real time events Customer data Train schedule Payment data Loyalty information Streams of real time events Hybrid Retail Architecture
Point of Sale (POS) Loyalty System Local Inventory Management Payment Discount Customer data Train schedule Payment data Loyalty information Streams of real time events Global Inventory Management Event Streaming at the Edge in the Smart Retail Store Item Availability
Disconnected Edge Time P C3 C2 C1 Context-specific Advertisement Real-time (Milliseconds) Location-based Customer Action Always on (even “offline”) Replayability Reduced traffic cost Better latency Payment Processing Near Real-time (Seconds) Replication to Cloud Batch (Depending on Network Bandwidth)
Live Demo
Confluent Schema Registry The de-facto schemas metadata repository for data in motion Schema Registry Producer Kafka serializer Kafka deserializer Consumer Kafka 3. Produce message with schema ID 5. Consume message with schema ID 6. Ask for schema given schema ID 7. Return schema Invalid message Invalid message 4. Is this a valid schema ID? 1. Register schema 2. Return schema ID
Confluent Cloud Data Governance Data Quality Increase data trust ● Schemas management UI ● Broker-side schema ID validation Data Catalog Classify, organize, discover ● Search and discover schemas metadata ● Manage data classifications ● Classify schemas with tags Data Lineage Turn data visibility on ● Visualize complex data in motion pipelines ● Audit data movement across systems NOW IN EARLY-ACCESS 27
Car Engine Car Self-driving Car Confluent Completes Apache Kafka
Confluent Cloud + : Accelerate Business Value for Customers Topline Impacting New Experiences ● Event-driven & real-time ● Unify data across org. w/ Kafka data fabric (Schema Reg,..) ● AWS Analytics, Redshift, ML connectors Mitigate Risk ● Higher Service Quality & Resilience with 99.95% SLA ● Deep Kafka expertise & innovation ● Elastic billing/pricing Developer Agility ● Focus on innovation (not data infrastructure) ● Leverage full Kafka OSS ecosystem + AWS services Faster Time to Market ● ~50-75% faster time to market* ● Streamline hybrid cloud migration with no complex lift-n- shift ● Maintain business continuity Lower Kafka TCO ● ~25-50% lower TCO * ● GBps-scale & fast deployments for global expansion ● Deploy Kafka at scale in 1 week Maximize ROI ● ~200% ROI per Forrester study ● Save 10s of $Ms with legacy offload to AWS with Confluent Replicator * For customers that don’t already have Kafka based system in-market * TCO assessment to be analyzed for specific customer scenarios
Questions? Feedback?

Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture

  • 1.
    The Rise ofData in Motion Serverless and Cloud-Native Event Streaming on AWS with Confluent Cloud
  • 2.
    © 2021, AmazonWeb Services, Inc. or its Affiliates. Customers want more value from their data Used by many people Growing Exponentially From new sources Increasingly diverse Analyzed by many applications
  • 3.
    © 2021, AmazonWeb Services, Inc. or its Affiliates. • The number of “smart” devices is projected to be 200 billion by 2020 (over 100X increase in ten years) • 90% of the data in the world was generated in the last 2 years • There are 2.5 quintillion bytes of data created each day, and this pace is accelerating The volume of data being produced is increasing Source
  • 4.
    © 2021, AmazonWeb Services, Inc. or its Affiliates. Customers moving from traditional data warehouse approach Data silos to OLTP ERP CRM LO B DW Silo 1 Business Intelligence Device s Web Sensors Socia l DW Silo 2 Business Intelligence Data Lake Non- relational databases Machine learning Data warehousing Log analytics Big data processing Relational databases
  • 5.
    © 2021, AmazonWeb Services, Inc. or its Affiliates. Lake House architecture SCALABLE DATA LAKES PURPOSE-BUILT DATA SERVICES SEAMLESS DATA MOVEMENT UNIFIED GOVERNANCE PERFORMANT AND COST-EFFECTIVE Non-relational databases Machine learning Data warehousing Log analytics Big data processing Relational databases Data lake
  • 6.
    © 2021, AmazonWeb Services, Inc. or its Affiliates. Lake House architecture on AWS SCALABLE DATA LAKES PURPOSE-BUILT DATA SERVICES SEAMLESS DATA MOVEMENT UNIFIED GOVERNANCE PERFORMANT AND COST-EFFECTIVE Amazon DynamoDB Amazon SageMaker Amazon Redshift Amazon Elasticsearch Service Amazon EMR Amazon S3 Amazon Aurora Amazon Athena
  • 7.
    © 2021, AmazonWeb Services, Inc. or its Affiliates. The value of data diminishes over time
  • 8.
    This is afundamental paradigm shift... 8 Infrastructure as code Data in motion as continuous streams of events Future of the datacenter Future of data Cloud Event Streaming
  • 9.
    An Event StreamingPlatform is the Underpinning of an Event-driven Architecture 9 MES ERP Sensors Mobile Customer 360 Real-time Alerting System Data warehouse Producers Consumers Streams of real time events Stream processing apps Connectors Connectors Stream processing apps Supplier Alert Forecast Inventory Customer Order
  • 10.
    Car Engine CarSelf-driving Car Confluent Completes Apache Kafka
  • 11.
    Truly CLOUD-NATIVE experience atthe edge, in the data center, and in the cloud Confluent Cloud A fully managed, cloud-native service for Apache Kafka Confluent Platform A complete, enterprise-grade distribution of Apache Kafka Confluent for Kubernetes Ansible Playbooks Packages: Docker, RPMs, Tarball Public Cloud Workloads Edge and On-Premise Workloads On Kubernetes On VMs / Bare Metal Wavelength
  • 12.
    STREAM PROCESSING CONNECTORS Example Architecture forEvent Streaming ksqlDB KStreams Processing Data in Motion with Confluent Cloud on AWS Dashboard Oracle DB Oracle CDC CONNECTOR Salesforce CDC CONNECTOR Salesforce Source / Sink CONNECTOR Fraud Detection App
  • 13.
    Context-specific Customer 360 13 Electricalretailer Hyper-personalized online retail experience, turning each customer visit into a one-on-one marketing opportunity Correlation of historical customer data with real- time digital signals Maximize customer satisfaction and revenue growth, increased customer conversions https://www.confluent.io/customers/ao/
  • 14.
    Ingest & Process Captureevent streams with a consistent data structure using Schema Registry, develop real-time ETL pipelines with a lightweight SQL syntax using ksqlDB & unify real-time streams with batch processing using +100 Confluent Connectors Derive insights from data in real-time Mobile Web IoT Data store AWS & On-prem Amazon S3 S3 Sink ANALYZE Amazon Redshift AWS Lake Formation Amazon Athena Redshift Sink TRANSFORM Amazon EMR AWS Data Pipeline AWS Glue Source connectors Store & Analyze Stream data with Confluent pre-built Connectors into your AWS data lake or data warehouse to execute queries on vast amounts of streaming data for real-time and batch analytics VISUALIZE Amazon Elasticsearch Schema Registry ksqlDB Events Real-time analytics
  • 15.
    Serverless integration Connect existingand apps & data stores in a repeatable way without having to manage- Apache Kafka, Schema Registry to maintain app compatibility, ksqlDB to develop real-time apps with SQL syntax and Connect for effortless integrations with Lambda & data stores AWS serverless platform Stop provisioning, maintaining or administering servers for backend components such as compute, databases and storage so that you can focus on increasing agility and innovation for your developer teams Increase developer agility & speed of innovation Apps Microservices ksqlDB Schema Registry COMPUTE AWS Lambda Data stores REST Proxy & Clients Source Connectors Lambda Sink DATA STORES Amazon DynamoDB Amazon Aurora STORAGE Amazon S3 S3 Sink ANALYTICS Amazon Athena Amazon Redshift Serverless app integration
  • 16.
    Accelerate modernization fromon-prem to AWS Redshift Sink Lambda Sink AWS Direct Connect LEGACY EDW MAINFRAME LEGACY DB JDBC / CDC connectors Connect Leverage +100 Confluent pre-built connectors to continuously bring valuable data from existing services on-prem including enterprise data warehouse, databases and mainframes Modernize Increase agility in getting applications to market and reduce TCO when freeing up resources to focus on value generating activities and not in managing servers On-prem AWS Cloud Bridge Hybrid cloud streaming with consistent, event- driven architecture for modern apps On-prem to AWS modernization Amazon Athena AWS Glue SageMaker Lake Formation Amazon DynamoDB Amazon Aurora S3 Sink Data Streams Apps ksqlDB Cluster Linking
  • 17.
    Low Latency 5GUse Cases with AWS Wavelength (based on AWS Outposts) and Confluent
  • 18.
    Global Event Streaming StreamingReplication between Clusters across Cloud, On-Prem and Edge Bridge to Databases, Data Lakes, Apps, APIs, SaaS Aggregate Small Footprint Edge Deployments with Replication (Aggregation) Simplify Disaster Recovery Operations with Multi-Region Clusters for RPO=0 and RTO~0 Stream Data Globally with Replication and Cluster Linking 18
  • 19.
    Omnichannel Retail Time P C3 C2 C1 SalesTalk on site in Car Dealership Right now Location-based Customer Action Customer 360 (Website, Mobile App, On Site in Store, In-Car) Car Configurator 10 and 8 days ago Context-specific Marketing Campaign 90 and 60 days ago AWS Lambda
  • 20.
    Omnichannel Retail Time P C3 C2 C1 MachineLearning Context-specific Recommendations Location-based Customer Action Customer 360 (Business Intelligence, Machine Learning) Machine Learning Train Recommendation Engine Reporting All Customer Interactions Amazon Athena Amazon SageMaker
  • 21.
    CRM 3rd party payment provider Context-specific real-time upsell Customerdata Payment processing and fraud detection as a service Manager Get report API Customer Customer Customer data Train schedule Payment data Loyalty information Streams of real time events Customer data Train schedule Payment data Loyalty information Streams of real time events Customer data Train schedule Payment data Loyalty information Streams of real time events Hybrid Retail Architecture
  • 22.
    Point of Sale (POS)Loyalty System Local Inventory Management Payment Discount Customer data Train schedule Payment data Loyalty information Streams of real time events Global Inventory Management Event Streaming at the Edge in the Smart Retail Store Item Availability
  • 23.
    Disconnected Edge Time P C3 C2 C1 Context-specific Advertisement Real-time (Milliseconds) Location-based CustomerAction Always on (even “offline”) Replayability Reduced traffic cost Better latency Payment Processing Near Real-time (Seconds) Replication to Cloud Batch (Depending on Network Bandwidth)
  • 24.
  • 26.
    Confluent Schema Registry Thede-facto schemas metadata repository for data in motion Schema Registry Producer Kafka serializer Kafka deserializer Consumer Kafka 3. Produce message with schema ID 5. Consume message with schema ID 6. Ask for schema given schema ID 7. Return schema Invalid message Invalid message 4. Is this a valid schema ID? 1. Register schema 2. Return schema ID
  • 27.
    Confluent Cloud DataGovernance Data Quality Increase data trust ● Schemas management UI ● Broker-side schema ID validation Data Catalog Classify, organize, discover ● Search and discover schemas metadata ● Manage data classifications ● Classify schemas with tags Data Lineage Turn data visibility on ● Visualize complex data in motion pipelines ● Audit data movement across systems NOW IN EARLY-ACCESS 27
  • 28.
    Car Engine CarSelf-driving Car Confluent Completes Apache Kafka
  • 29.
    Confluent Cloud +: Accelerate Business Value for Customers Topline Impacting New Experiences ● Event-driven & real-time ● Unify data across org. w/ Kafka data fabric (Schema Reg,..) ● AWS Analytics, Redshift, ML connectors Mitigate Risk ● Higher Service Quality & Resilience with 99.95% SLA ● Deep Kafka expertise & innovation ● Elastic billing/pricing Developer Agility ● Focus on innovation (not data infrastructure) ● Leverage full Kafka OSS ecosystem + AWS services Faster Time to Market ● ~50-75% faster time to market* ● Streamline hybrid cloud migration with no complex lift-n- shift ● Maintain business continuity Lower Kafka TCO ● ~25-50% lower TCO * ● GBps-scale & fast deployments for global expansion ● Deploy Kafka at scale in 1 week Maximize ROI ● ~200% ROI per Forrester study ● Save 10s of $Ms with legacy offload to AWS with Confluent Replicator * For customers that don’t already have Kafka based system in-market * TCO assessment to be analyzed for specific customer scenarios
  • 30.