Kafka & Couchbase Integration Patterns Manuel Hurtado Couchbase Solutions Engineer
©2016 Couchbase Inc. 2 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
©2016 Couchbase Inc. 3 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
©2016 Couchbase Inc. 4 What is Couchbase? Couchbase delivers the Data Platform for the Digital Economy • Products: Couchbase Server & Couchbase Mobile • Open source NoSQL, JSON document database • Founded 2010 • 500+ enterprise customers, including 20+ Fortune 100 UNIFIED ADMINISTRATION UNIFIED PROGRAMMING INTERFACE Data Query Index SearchMobileReplication Analytics {N1QL}
©2016 Couchbase Inc. 5 Growing number of use cases 5 Catalog Metadata Operational Dashboarding User Profile Database Session Database Inventory & Availability Entitlement Management Field Service Enablement Customer 360 Asset/Resource Management Device User Data Management Endpoint Data Management
©2016 Couchbase Inc. 6 Why customers choose Couchbase? 6 Memory-first Architecture Full SQL Query Language Active-Active Global Data Replication Multi-dimensional scaling Mobile
©2016 Couchbase Inc. 7 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
©2016 Couchbase Inc. 8 Kafka Introduction What is Kafka? • Kafka is a distributed, partitioned, replicated, log service developed by LinkedIn and open sourced in 2011. • Basically it is a massively scalable pub/sub message queue architected as a distributed transaction log. It was created to provide “a unified platform for handling all the real-time data feeds a large company might have”. • It powers a large number of high-profile companies including LinkedIn, Yahoo and Netflix.
©2016 Couchbase Inc. 9 Kafka Architecture
©2016 Couchbase Inc. 10 Kafka Architecture
©2016 Couchbase Inc. 11 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
©2016 Couchbase Inc. 12 The challenge: Stream Data Platform with Kafka Source: https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
©2016 Couchbase Inc. 13 Kafka Connect Source: https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
©2016 Couchbase Inc. 14 Kafka Connect Source: https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
©2016 Couchbase Inc. 15 Couchbase Connector in Kafka
©2016 Couchbase Inc. 16 Database Change Protocol (DCP)
©2016 Couchbase Inc. 17 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
©2016 Couchbase Inc. 18 Couchbase & Kafka Use Cases
©2016 Couchbase Inc. 19 Couchbase & Kafka Use Cases Currently using Kafka, but not Couchbase Reading from Couchbase with almost no code (Source) ○ Adopt default message format given by connector ○ Write custom converter ○ Use Kafka Streams to process events from Couchbase, and write them back. Writing to Couchbase without touching SDK (Sink) ○ Provide document in the generic form (it will be converted into JSON)
©2016 Couchbase Inc. 20 Couchbase & Kafka Use Cases Currently using Couchbase, but not Kafka Backup of the data ○ Need to be filtered/analyzed/transformed by the business logic ○ Checking integrity Turn realtime data into history ○ e.g. Couchbase keeps the exchange rates, but it is necessary to emit stream of the deltas, or log of the changes.
©2016 Couchbase Inc. 21 Couchbase & Kafka Use Cases Currently using Kafka and Couchbase Integrate more naturally with Kafka infrastructure. A lot of the custom code may be moved into connector. ○ Filtering ○ Transformation ○ Process/Task management
©2016 Couchbase Inc. 22
©2016 Couchbase Inc. 23 Demo: Couchbase as Data Source Basic Scenario ConsoleConsumer • OOTB Couchbase Connector • Bucket “travel-sample” • Output all-content • Consume messages to console • Insert/Update/Delete • JMX Monitor
©2016 Couchbase Inc. 24 Demo: Couchbase as Data Source + Data Sink Basic Scenario • OOTB Couchbase Connector • Bucket “travel-sample” to “receiver” • Raw content • Write messages to Couchbase • JMX Monitor
©2016 Couchbase Inc. 25 Demo: Couchbase as Data Source + Data Sink Custom Scenario • Custom Schema Converter • Custom Filter • “travel-sample” to “receiver”: only “airlines” • JSON content • Write messages to Couchbase • JMX Monitor AirlineConverter AirlineFilter
©2016 Couchbase Inc. 26 Demo: Twitter Data Source + Couchbase as Data Sink • Custom Kafka Producer & Consumer • Bucket “twitter” • Write messages to Couchbase • JMX Monitor TwitterProducer Couchbase Consumer
©2016 Couchbase Inc. 27 Couchbase Connector in Confluent Platform
©2016 Couchbase Inc. 28 Confluent Platform
©2016 Couchbase Inc. 29 Resources • Couchbase Kafka Connector https://developer.couchbase.com/documentation/server/current/connectors/kafka- 3.1/kafka-intro.html • Couchbase Kafka Connector (code) https://github.com/couchbase/kafka-connect-couchbase • Demo code repositories https://github.com/mahurtado/KafkaCouchbaseConnectorSample https://github.com/mahurtado/TwitterKafkaCouchbasePipeline • Kafka Connect Confluent https://www.confluent.io/product/connectors/
Q & A

Kafka & Couchbase Integration Patterns

  • 1.
    Kafka & Couchbase IntegrationPatterns Manuel Hurtado Couchbase Solutions Engineer
  • 2.
    ©2016 Couchbase Inc.2 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
  • 3.
    ©2016 Couchbase Inc.3 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
  • 4.
    ©2016 Couchbase Inc.4 What is Couchbase? Couchbase delivers the Data Platform for the Digital Economy • Products: Couchbase Server & Couchbase Mobile • Open source NoSQL, JSON document database • Founded 2010 • 500+ enterprise customers, including 20+ Fortune 100 UNIFIED ADMINISTRATION UNIFIED PROGRAMMING INTERFACE Data Query Index SearchMobileReplication Analytics {N1QL}
  • 5.
    ©2016 Couchbase Inc.5 Growing number of use cases 5 Catalog Metadata Operational Dashboarding User Profile Database Session Database Inventory & Availability Entitlement Management Field Service Enablement Customer 360 Asset/Resource Management Device User Data Management Endpoint Data Management
  • 6.
    ©2016 Couchbase Inc.6 Why customers choose Couchbase? 6 Memory-first Architecture Full SQL Query Language Active-Active Global Data Replication Multi-dimensional scaling Mobile
  • 7.
    ©2016 Couchbase Inc.7 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
  • 8.
    ©2016 Couchbase Inc.8 Kafka Introduction What is Kafka? • Kafka is a distributed, partitioned, replicated, log service developed by LinkedIn and open sourced in 2011. • Basically it is a massively scalable pub/sub message queue architected as a distributed transaction log. It was created to provide “a unified platform for handling all the real-time data feeds a large company might have”. • It powers a large number of high-profile companies including LinkedIn, Yahoo and Netflix.
  • 9.
    ©2016 Couchbase Inc.9 Kafka Architecture
  • 10.
    ©2016 Couchbase Inc.10 Kafka Architecture
  • 11.
    ©2016 Couchbase Inc.11 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
  • 12.
    ©2016 Couchbase Inc.12 The challenge: Stream Data Platform with Kafka Source: https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
  • 13.
    ©2016 Couchbase Inc.13 Kafka Connect Source: https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
  • 14.
    ©2016 Couchbase Inc.14 Kafka Connect Source: https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
  • 15.
    ©2016 Couchbase Inc.15 Couchbase Connector in Kafka
  • 16.
    ©2016 Couchbase Inc.16 Database Change Protocol (DCP)
  • 17.
    ©2016 Couchbase Inc.17 Agenda • Couchbase Introduction • Kafka Introduction • Kafka Connect & Couchbase Kafka Connector • Use cases
  • 18.
    ©2016 Couchbase Inc.18 Couchbase & Kafka Use Cases
  • 19.
    ©2016 Couchbase Inc.19 Couchbase & Kafka Use Cases Currently using Kafka, but not Couchbase Reading from Couchbase with almost no code (Source) ○ Adopt default message format given by connector ○ Write custom converter ○ Use Kafka Streams to process events from Couchbase, and write them back. Writing to Couchbase without touching SDK (Sink) ○ Provide document in the generic form (it will be converted into JSON)
  • 20.
    ©2016 Couchbase Inc.20 Couchbase & Kafka Use Cases Currently using Couchbase, but not Kafka Backup of the data ○ Need to be filtered/analyzed/transformed by the business logic ○ Checking integrity Turn realtime data into history ○ e.g. Couchbase keeps the exchange rates, but it is necessary to emit stream of the deltas, or log of the changes.
  • 21.
    ©2016 Couchbase Inc.21 Couchbase & Kafka Use Cases Currently using Kafka and Couchbase Integrate more naturally with Kafka infrastructure. A lot of the custom code may be moved into connector. ○ Filtering ○ Transformation ○ Process/Task management
  • 22.
  • 23.
    ©2016 Couchbase Inc.23 Demo: Couchbase as Data Source Basic Scenario ConsoleConsumer • OOTB Couchbase Connector • Bucket “travel-sample” • Output all-content • Consume messages to console • Insert/Update/Delete • JMX Monitor
  • 24.
    ©2016 Couchbase Inc.24 Demo: Couchbase as Data Source + Data Sink Basic Scenario • OOTB Couchbase Connector • Bucket “travel-sample” to “receiver” • Raw content • Write messages to Couchbase • JMX Monitor
  • 25.
    ©2016 Couchbase Inc.25 Demo: Couchbase as Data Source + Data Sink Custom Scenario • Custom Schema Converter • Custom Filter • “travel-sample” to “receiver”: only “airlines” • JSON content • Write messages to Couchbase • JMX Monitor AirlineConverter AirlineFilter
  • 26.
    ©2016 Couchbase Inc.26 Demo: Twitter Data Source + Couchbase as Data Sink • Custom Kafka Producer & Consumer • Bucket “twitter” • Write messages to Couchbase • JMX Monitor TwitterProducer Couchbase Consumer
  • 27.
    ©2016 Couchbase Inc.27 Couchbase Connector in Confluent Platform
  • 28.
    ©2016 Couchbase Inc.28 Confluent Platform
  • 29.
    ©2016 Couchbase Inc.29 Resources • Couchbase Kafka Connector https://developer.couchbase.com/documentation/server/current/connectors/kafka- 3.1/kafka-intro.html • Couchbase Kafka Connector (code) https://github.com/couchbase/kafka-connect-couchbase • Demo code repositories https://github.com/mahurtado/KafkaCouchbaseConnectorSample https://github.com/mahurtado/TwitterKafkaCouchbasePipeline • Kafka Connect Confluent https://www.confluent.io/product/connectors/
  • 30.

Editor's Notes

  • #6 Talk about Data Platform, not just a database. Customers are building Service Endpoints and multiple types of applications and use cases per customer.
  • #7 Memory-first architecture Integrated, distributed caching tier High performant, in-memory streaming across all database services Full SQL query language (N1QL) Best-in-class In-Memory Indexing Active-Active global data replication High speed Intra and Inter-cluster In-memory replication Big Data integrations Leverage In-Memory replication (Spark & Kafka) Mobile On-device local cache / storage (Offline & Peer-to-Peer operations) Automatic multi-master replication Full stack secure data management
  • #13 Image from: https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
  • #14 Image from: https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
  • #15 at-most-once delivery means that for each message handed to the mechanism, that message is delivered zero or one times; in more casual terms it means that messages may be lost. at-least-once delivery means that for each message handed to the mechanism potentially multiple attempts are made at delivering it, such that at least one succeeds; again, in more casual terms this means that messages may be duplicated but not lost. exactly-once delivery means that for each message handed to the mechanism exactly one delivery is made to the recipient; the message can neither be lost nor duplicated. https://dzone.com/articles/kafka-clients-at-most-once-at-least-once-exactly-o The first one is the cheapest—highest performance, least implementation overhead—because it can be done in a fire-and-forget fashion without keeping state at the sending end or in the transport mechanism. The second one requires retries to counter transport losses, which means keeping state at the sending end and having an acknowledgement mechanism at the receiving end. The third is most expensive—and has consequently worst performance—because in addition to the second it requires state to be kept at the receiving end in order to filter out duplicate deliveries. An at-most-once scenario happens when the commit interval has occurred, and that in turn triggers Kafka to automatically commit the last used offset. Meanwhile, let us say the consumer did not get a chance to complete the processing of the messages and consumer has crashed. Now when consumer restarts, it starts to receive messages from the last committed offset, in essence consumer could lose a few messages in between. At-least-once scenario happens when consumer processes a message and commits the message into its persistent store and consumer crashes at that point. Meanwhile, let us say Kafka could not get a chance to commit the offset to the broker since commit interval has not passed. Now when the consumer restarts, it gets delivered with a few older messages from the last committed offset.
  • #24 KEY POINT: Applications communicate directly to the services they need to fulfill the application request and the application does not need to be topology aware as the SDK has that already. Single node type, services defined dynamically One node acts the same as 100, just the services are spread out in the cluster Query service accesses Index and Data to formulate response All query and document access is topology aware and dynamically scalable Develop with one node, deploy against multiple production nodes The Couchbase SDK handles knowing about where in the cluster it needs to go to satisfy whatever the application is requesting, be I CRUD or cluster management.
  • #25 KEY POINT: Applications communicate directly to the services they need to fulfill the application request and the application does not need to be topology aware as the SDK has that already. Single node type, services defined dynamically One node acts the same as 100, just the services are spread out in the cluster Query service accesses Index and Data to formulate response All query and document access is topology aware and dynamically scalable Develop with one node, deploy against multiple production nodes The Couchbase SDK handles knowing about where in the cluster it needs to go to satisfy whatever the application is requesting, be I CRUD or cluster management.
  • #26 KEY POINT: Applications communicate directly to the services they need to fulfill the application request and the application does not need to be topology aware as the SDK has that already. Single node type, services defined dynamically One node acts the same as 100, just the services are spread out in the cluster Query service accesses Index and Data to formulate response All query and document access is topology aware and dynamically scalable Develop with one node, deploy against multiple production nodes The Couchbase SDK handles knowing about where in the cluster it needs to go to satisfy whatever the application is requesting, be I CRUD or cluster management.
  • #27 KEY POINT: Applications communicate directly to the services they need to fulfill the application request and the application does not need to be topology aware as the SDK has that already. Single node type, services defined dynamically One node acts the same as 100, just the services are spread out in the cluster Query service accesses Index and Data to formulate response All query and document access is topology aware and dynamically scalable Develop with one node, deploy against multiple production nodes The Couchbase SDK handles knowing about where in the cluster it needs to go to satisfy whatever the application is requesting, be I CRUD or cluster management.
  • #28 Image from http://docs.confluent.io/3.0.0/platform.html + couchbase logo instead of ERP :) Link to Kafka intro video https://www.youtube.com/watch?v=wMLAlJimPzk ?
  • #29 Source of image: https://www.confluent.io/product/compare/