Utilizing Apache Pulsar, Apache NiFi and MiNiFi for EdgeAI IoT at Scale Tim Spann | Developer Advocate
streamnative.io Tim Spann Developer Advocate ● https://www.datainmotion.dev/ ● https://github.com/tspannhw/SpeakerProfile ● https://dev.to/tspannhw ● https://sessionize.com/tspann/ DZone Zone Leader and Big Data MVB Data DJay
streamnative.io Founded by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative builds a cloud-native event streaming platform that enables enterprises to easily access data as real-time event streams.
Apache Pulsar
streamnative.io Apache is an open source, cloud-native distributed messaging and streaming platform.
streamnative.io What are the Benefits of Pulsar? Data Durability Scalability Geo-Replication Multi-Tenancy Unified Messaging Model
Apache Pulsar
streamnative.io A Unified Messaging Platform Message Queuing Data Streaming
streamnative.io Apache Pulsar Overview Enable Geo-Replicated Messaging ● Pub-Sub ● Geo-Replication ● Pulsar Functions ● Horizontal Scalability ● Multi-tenancy ● Tiered Persistent Storage ● Pulsar Connectors ● REST API ● CLI ● Many clients available ● Four Different Subscription Types ● Multi-Protocol Support ○ MQTT ○ AMQP ○ JMS ○ Kafka ○ ...
streamnative.io What is the Pulsar Ecosystem? ● Functions and Connectors ○ Functions: Lightweight stream processing ○ Connectors: Part of “Pulsar IO”, includes “Source” and “Sink” APIs ■ Files, Databases, Data tools, Cloud Services, etc ● Protocol Handlers ○ Allows Pulsar to handle additional protocols by an extendable API running in the broker ■ AoP (AMQP), KoP (Kafka), MoP (MQTT)
streamnative.io What is the Pulsar Ecosystem? (cont’d) ● Processing Engines ○ Supports modern processing engines ■ Flink and Spark, as well as Pulsar SQL (Presto/Trino) ● Offloaders ○ Allows data to be offloaded to cloud storage and used with existing Pulsar APIs ■ S3, GCP Cloud Storage, HDFS, File (NFS), Azure Blob Storage (in Pulsar 2.7.0)
streamnative.io Pulsar Functions Provides a simple API to: ● Receive a message (consume) ● Process the message using your own code ● Send a message (produce) Takes care of the boilerplate code so there is no need to create producers and consumers.
streamnative.io Moving Data In and Out of Pulsar IO/Connectors are a simple way to integrate with external systems and move data in and out of Pulsar. ● Built on top of Pulsar Functions ● Built-in connectors - hub.streamnative.io Source Sink
streamnative.io MQTT on Pulsar (MoP)
streamnative.io Pulsar SQL Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordinator ... ... SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata
streamnative.io Ingesting IoT Data via Java Pulsar https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/
streamnative.io Ingesting IoT Data via Java Pulsar
streamnative.io Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a sixty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
streamnative.io Architecture https://nifi.apache.org/docs/nifi-docs/html/overview.html
StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Apache Pulsar - Apache NiFi - MiNiFi <-> Events/Messages <-> Data Stores Tiered Storage Pulsar --- KoP --- MoP --- Websocket --- HTTP Pulsar Sink Pulsar Sink Streaming Edge Gateway Protocols End-to-End Streaming FLiP(N) IoT Apps
Demo
Wrap-Up
streamnative.io Interested In Learning More? Flink SQL Cookbook The Github Source for Flink SQL Demo The GitHub Source for Demo Manning's Apache Pulsar in Action O’Reilly Book [11/8] PASS Data Community [11/18] Developer Week Austin [11/19] Porto Tech Hub Con [12/3] Data Science Camp Resources Free eBooks Upcoming Events
Let’s Keep in Touch! Timothy Spann Developer Advocate @PaasDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw

Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai iot at scale

  • 1.
    Utilizing Apache Pulsar,Apache NiFi and MiNiFi for EdgeAI IoT at Scale Tim Spann | Developer Advocate
  • 2.
    streamnative.io Tim Spann Developer Advocate ●https://www.datainmotion.dev/ ● https://github.com/tspannhw/SpeakerProfile ● https://dev.to/tspannhw ● https://sessionize.com/tspann/ DZone Zone Leader and Big Data MVB Data DJay
  • 3.
    streamnative.io Founded by theoriginal developers of Apache Pulsar and Apache BookKeeper, StreamNative builds a cloud-native event streaming platform that enables enterprises to easily access data as real-time event streams.
  • 4.
  • 5.
    streamnative.io Apache is anopen source, cloud-native distributed messaging and streaming platform.
  • 6.
    streamnative.io What are theBenefits of Pulsar? Data Durability Scalability Geo-Replication Multi-Tenancy Unified Messaging Model
  • 7.
  • 8.
    streamnative.io A Unified MessagingPlatform Message Queuing Data Streaming
  • 9.
    streamnative.io Apache Pulsar Overview EnableGeo-Replicated Messaging ● Pub-Sub ● Geo-Replication ● Pulsar Functions ● Horizontal Scalability ● Multi-tenancy ● Tiered Persistent Storage ● Pulsar Connectors ● REST API ● CLI ● Many clients available ● Four Different Subscription Types ● Multi-Protocol Support ○ MQTT ○ AMQP ○ JMS ○ Kafka ○ ...
  • 10.
    streamnative.io What is thePulsar Ecosystem? ● Functions and Connectors ○ Functions: Lightweight stream processing ○ Connectors: Part of “Pulsar IO”, includes “Source” and “Sink” APIs ■ Files, Databases, Data tools, Cloud Services, etc ● Protocol Handlers ○ Allows Pulsar to handle additional protocols by an extendable API running in the broker ■ AoP (AMQP), KoP (Kafka), MoP (MQTT)
  • 11.
    streamnative.io What is thePulsar Ecosystem? (cont’d) ● Processing Engines ○ Supports modern processing engines ■ Flink and Spark, as well as Pulsar SQL (Presto/Trino) ● Offloaders ○ Allows data to be offloaded to cloud storage and used with existing Pulsar APIs ■ S3, GCP Cloud Storage, HDFS, File (NFS), Azure Blob Storage (in Pulsar 2.7.0)
  • 12.
    streamnative.io Pulsar Functions Provides asimple API to: ● Receive a message (consume) ● Process the message using your own code ● Send a message (produce) Takes care of the boilerplate code so there is no need to create producers and consumers.
  • 13.
    streamnative.io Moving Data Inand Out of Pulsar IO/Connectors are a simple way to integrate with external systems and move data in and out of Pulsar. ● Built on top of Pulsar Functions ● Built-in connectors - hub.streamnative.io Source Sink
  • 14.
  • 15.
    streamnative.io Pulsar SQL Presto/Trino workers canread segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordinator ... ... SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata
  • 16.
    streamnative.io Ingesting IoT Datavia Java Pulsar https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/
  • 17.
  • 18.
    streamnative.io Why Apache NiFi? •Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a sixty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 19.
  • 20.
    StreamNative Hub StreamNative Cloud UnifiedBatch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Apache Pulsar - Apache NiFi - MiNiFi <-> Events/Messages <-> Data Stores Tiered Storage Pulsar --- KoP --- MoP --- Websocket --- HTTP Pulsar Sink Pulsar Sink Streaming Edge Gateway Protocols End-to-End Streaming FLiP(N) IoT Apps
  • 21.
  • 22.
  • 23.
    streamnative.io Interested In LearningMore? Flink SQL Cookbook The Github Source for Flink SQL Demo The GitHub Source for Demo Manning's Apache Pulsar in Action O’Reilly Book [11/8] PASS Data Community [11/18] Developer Week Austin [11/19] Porto Tech Hub Con [12/3] Data Science Camp Resources Free eBooks Upcoming Events
  • 24.
    Let’s Keep in Touch! TimothySpann Developer Advocate @PaasDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw