Building an Event Streaming Architecture with Apache Pulsar Developer Advocate at StreamNative Tim Spann
Welcome! Tim Spann Developer Advocate StreamNative tim@streamnative.io @PaasDev
Agenda ● Apache Pulsar Basics ○ Features ○ Architecture ○ Connectivity ○ Protocols ● Stream Storage ○ In-Memory ○ Apache BookKeeper ○ ScyllaDB ● Demo
Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability 4
Metadata & Service Discovery Pulsar Cluster Store Messages 5
Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging Unified Messaging Model
Connectivity • Libraries - (Java, Python, Go, NodeJS, WebSockets, C++, C#, Scala, Kotlin,...) • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT), RoP (RocketMQ) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io 7
ScyllaDB Compatible Sink Connector https://pulsar.apache.org/docs/en/io-quickstart/ ScyllaDB Compatible Sink 8
Kafka On Pulsar (KoP) 9 Pulsar Producer Pulsar Lib Pulsar Producer Pulsar Lib Kafka Producer Kafka Lib Kafka Consumer Kafka Lib Pulsar Protocol Handler Kafka Protocol Handler Managed Ledger Pulsar Topic BK Client Geo Replicator Load Balance Pulsar Broker Zookeeper Bookie Pulsar Cluster
MQTT On Pulsar (MoP) 10 Pulsar Producer Pulsar Lib Pulsar Producer Pulsar Lib MQTT Producer MQTT Lib MQTT Consumer MQTT Lib Pulsar Proxy MQTT Proxy Pulsar Protocol Handler MQTT Protocol Handler Managed Ledger Pulsar Topic BK Client Geo Replicator Load Balance Pulsar Broker Zookeeper Bookie
AMQP On Pulsar (AoP) 11 Pulsar Producer Pulsar Lib Pulsar Producer Pulsar Lib RabbitMQ Producer RabbitMQ Lib RabbitMQ Consumer RabbitMQ Lib Pulsar Proxy AMQP Proxy Pulsar Protocol Handler AMQP Protocol Handler Managed Ledger Pulsar Topic BK Client Geo Replicator Load Balance Pulsar Broker Zookeeper Bookie
Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers 12
● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. 13 Pulsar Functions
Function Mesh Pulsar Functions, along with Pulsar IO/Connectors, provide a powerful API for ingesting, transforming, and outputting data. Function Mesh, another StreamNative project, makes it easier for developers to create entire applications built from sources, functions, and sinks all through a declarative API. 14
15 ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute Multi-Cloud Deployment
16 https://github.com/tspannhw/airquality-datastore Demo
https://streamnative.io/blog/engineering/2022-03-17-streaming-real-time-chat-messages-into-scylla- with-apache-pulsar/ 17
Passionate and dedicated team. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform. 18
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies. Built for Containers Cloud Native StreamNative Cloud Flink SQL 19
Keep in touch! Tim Spann Developer Advocate StreamNative tim@streamnative.io @PaasDev
21 Who?
Executive Team ✓ Data veterans with extensive industry experience ✓ Original creators of Apache Pulsar & BookKeeper ✓ Operated the largest Pulsar/BookKeeper cluster Sijie Guo Founder and CEO Apache Software Foundation Member Pulsar / BookKeeper PMC Matteo Merli CTO Co-creator and PMC Chair of Pulsar, BookKeeper PMC Jia Zhai Co-Founder Pulsar / BooKKeeper PMC 22
Pulsar Global Adoption 23
Adopted Pulsar to replace Kafka in their DSP (Data Streaming Platform). ● 1.5-2x lower in capex cost ● 5-50x improvement in latency ● 2-3x lower in opex due Adopted Pulsar to power their billing platform, Midas, which processing hundreds of billions of financial transactions daily. Adoption then expanded to Tencent’s Federated Learning Platform and Tencent Gaming. Applied Materials is one of the biggest semiconductor hardware and software supplier in the industry. They adopted Pulsar to enable them to build a message bus to tie all of their data together. They previously used Tibco. Pulsar Adoption Use Cases 24
StreamNative Academy ➔ Pulsar expert instructor-led courses ➔ On-demand learning with labs ➔ 300+ engineers, admins and architects trained! Academy.StreamNative.io LEARN MORE ABOUT APACHE PULSAR WITH: 25
Pulsar Summit San Francisco ➔ August 18th, 2022 ➔ Hotel Nikko San Francisco ➔ Hosted by Get early bird tickets now! Pulsar-Summit.org
Pulsar Summit San Francisco Sponsorship Prospectus Help engage and connect the Apache Pulsar community by becoming an official sponsor for Pulsar Summit San Francisco 2022! Learn more about the requirements and benefits of becoming a community sponsor. COMMUNITY SPONSORSHIPS AVAILABLE: LEARN MORE https://hubs.ly/Q01dJ3ly0
Welcome to Pulsar! Scan the QR code to sign up for the StreamNative Newsletter for Apache Pulsar. You will get the latest Pulsar news and have the chance to win a Pulsar Giveaway Bundle valued at $500! $500 Pulsar Giveaway Bundle: ■ 1 pair of Apple AirPods Pro ■ StreamNative and We <3 Pulsar swag ■ 2 free registration tickets to Pulsar Summit San Francisco 2022!

Building an Event Streaming Architecture with Apache Pulsar

Editor's Notes

  • #6 The compute is done in Pulsar, while the state is stored in BookKeeper, leveraging an external metadata store (e.g. Zookeeper). This allows for scaling to support (use case) (e.g. commodity storage, Flink, Spark, etc). To show it in a diagram…
  • #17 ScyllaDB Cloud T3 cluster for testing (Free for 30 days!)
  • #18 Pulsar is a unified Messaging and Event Streaming Platform
  • #23 Instructor Notes Data veterans with extensive industry experience Original creators of Apache Pulsar & BookKeeper Operated the largest Pulsar/BookKeeper cluster More Apache Pulsar PMC/Committers than any other company Lead the day-to-day development of Apache Pulsar Provide services and support to help drive Pulsar adoption
  • #29 LEARN MORE ABOUT PULSAR AND ENTER TO WIN! https://share.hsforms.com/1dAyP4l3jQey3KBWH8Q9EGQ3x5r4