This document discusses building modern data streaming apps with Python and provides an overview of Apache Pulsar. It introduces Tim Spann and his experience with streaming technologies. It then covers key Pulsar concepts like tenants, namespaces, topics and messages. It provides examples of building Python producers and consumers and integrating Pulsar with other technologies like Kafka, MQTT and websockets. It also demonstrates deploying Pulsar Functions with Python.
2 Tim Spann Developer Advocate at StreamNative FLiP(N)Stack = Flink, Pulsar and NiFi Stack Streaming Systems & Data Architecture Expert Experience: ● 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ● Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
Apache Pulsar hasa vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
6.
Cloud native withdecoupled storage and compute layers. Built-in compatibility with your existing code and messaging infrastructure. Geographic redundancy and high availability included. Centralized cluster management and oversight. Elastic horizontal and vertical scalability. Seamless and instant partitioning rebalancing with no downtime. Flexible subscription model supports a wide array of use cases. Compatible with the tools you use to store, analyze, and process data. Pulsar Features
7.
Messages - thebasic unit of Pulsar 7 Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
8.
Integrated Schema Registry SchemaRegistry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers 8
Data Offloaders (Tiered Storage) ClientLibraries StreamNative Pulsar ecosystem hub.streamnative.io Connectors (Sources & Sinks) Protocol Handlers Pulsar Functions (Lightweight Stream Processing) Processing Engines … and more! … and more!
13.
13 Pulsar Functions ● Consumemessages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge.
14.
from pulsar importFunction from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer import json class Chat(Function): def __init__(self): pass def process(self, input, context): fields = json.loads(input) sid = SentimentIntensityAnalyzer() ss = sid.polarity_scores(fields["comment"]) row = { } row['id'] = str(msg_id) if ss['compound'] < 0.00: row['sentiment'] = 'Negative' else: row['sentiment'] = 'Positive' row['comment'] = str(fields["comment"]) json_string = json.dumps(row) return json_string Entire Function ML Function 14
15.
Starting a Function- Distributed Cluster Once compiled into a JAR, start a Pulsar Function in a distributed cluster: 15
16.
Building Tenant, Namespace,Topics bin/pulsar-admin tenants create meetup bin/pulsar-admin namespaces create meetup/newjersey bin/pulsar-admin tenants list bin/pulsar-admin namespaces list meetup bin/pulsar-admin topics create persistent://meetup/newjersey/first bin/pulsar-admin topics list meetup/newjersey 16
17.
Install Python 3Pulsar Client pip3 install pulsar-client=='2.9.1[all]' # Depending on Platform May Need C++ Client Built For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi https://pulsar.apache.org/docs/en/client-libraries-python/
18.
Building a Python3Producer import pulsar client = pulsar.Client('pulsar://localhost:6650') producer client.create_producer('persistent://conf/ete/first') producer.send(('Simple Text Message').encode('utf-8')) client.close()