1Confidential Processing IoT Data with MQTT and Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de Kafka-Native End-to-End IoT Data Integration and Processing
3 Agenda 1) IoT Use Cases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
4 Agenda 1) IoT Use Cases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
6 Connected Intelligence (Cars, Machines, Robots, …)
7 Smart Cities
8 Smart Retail and Customer 360
9 Intelligent Applications (Early Part Scrapping, Predictive Maintenance, …)
10 ? Architecture (High Level) Kafka BrokerKafka BrokerStreaming Platform Connect w/ MQTT connector IoT Gateway DevicesDevicesDevicesDevice Device Tracking (Real Time) Predictive Maintenance (Near Real Time) Log Analytics (Batch) Edge Data Center / Cloud How to integrate?
11 Poll Which IoT scenarios do you see in your company? 1) IoT ingestion into analytics cluster 2) Bi-directional communication to control IoT devices (e.g. connected cars, fleet management, logistics) 3) Real time stream processing using machine learning (e.g. predictive maintenance, early part scrapping) 4) No IoT scenarios today; maybe in the future
13 Agenda 1) IoT Use Cases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
14 MQTT - Publish / subscribe messaging protocol • Built on top of TCP/IP for constrained devices and unreliable networks • Many (open source) broker implementations • Many client libraries • IoT-specific features for bad network / connectivity • Widely used (mostly IoT, but also web and mobile apps via MQTT over WebSockets)
15 MQTT Server 1 Processor 1 Processor 2 ... MQTT Architecture (no scale) topic: [deviceid]/car
16 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/car ... MQTT Architecture (clustering depends on broker implementation) Processor 1 Processor 2 Processor 3 Processor 4
17 MQTT Architecture (clustering depends on broker implementation) Load Balancer MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/car ... Processor 1 Processor 2 Processor 3 Processor 4
18 MQTT Trade-Offs Pros • Lightweight • Simple API • Built for poor connectivity / high latency scenario • Many client connections (tens of thousands per MQTT server) Cons • Queuing, not stream processing • Can’t handle usage surges (no buffering) • No high scalability • Very asynchronous processing (often offline for long time) • No good integration to rest of the enterprise • No reprocessing of events
19 Agenda 1) IoT Use Cases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
20 Apache Kafka – The Rise of a Streaming Platform The Log ConnectorsConnectors Producer Consumer Streaming Engine
24 Apache Kafka == Distributed Commit Log with Replication
26 Kafka Trade-Offs (from IoT perspective) Pros • Stream processing, not just queuing • High throughput • Large scale • High availability • Long term storage and buffering • Reprocessing of events • Good integration to rest of the enterprise Cons • Not built for tens of thousands connections • Requires stable network and good infrastructure • No IoT-specific features like keep alive, last will or testament
27 (De facto) Standards for Processing IoT Data A Match Made In Heaven + =
28 Agenda 1) IoT Use Cases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
29 ? Architecture (High Level) Kafka BrokerKafka BrokerStreaming Platform Connect w/ MQTT connector GatewayDevicesDevicesDevicesDevice Device Tracking (Real Time) Predictive Maintenance (Near Real Time) Log Analytics (Batch) Edge Data Center / Cloud How to integrate?
30 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/car Kafka Integration Sensor Data Stream processing Kafka Cluster End-to-End Integration from MQTT to Apache Kafka
31 Design Questions for End-to-End Integration • How much throughput? • Ingest-only vs. processing of data? • Analytical vs. operational deployments? • Device publish only vs. device pub/sub? • Pull vs. Push? • Low-level client vs. integration framework vs. proxy? • Integration patterns needed? (transform, route, …)? • IoT-specific features required (last will, testament, …)?
32 Kafka-Native Integration Options between MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy
33 Kafka-Native Integration Options between MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy
34 MQTT Source and Sink Connectors for Kafka Connect https://www.confluent.io/hub/ https://www.confluent.io/connector/kafka-connect-mqtt/
35 ? Integration with Kafka Connect (Source and Sink) Kafka BrokerKafka BrokerKafka Broker MQTT Broker Connect w/ MQTT connector Connect w/ MQTT connectorMQTT DevicesDevicesDevicesDevice Kafka Consumer MQTT Broker Persistent + offers MQTT-specific features Consumes push data from IoT devices Kafka Connect Kafka Consumer + Kafka Producer under the hood Pull-based (at own pace, without overwhelming the source or getting overwhelmed by the source) Out-of-the-box scalability and integration features (like connectors, converters, SMTs)
38 Kafka Connect components
39 Connector for MQTT Source + Single Message Transformation (SMT) curl -s -X POST -H 'Content-Type: application/json' http://localhost:8083/connectors -d '{ "name" : "mqtt-source", "config" : { "connector.class" : "io.confluent.connect.mqtt.MqttSourceConnector", "tasks.max" : "1", "mqtt.server.uri" : "tcp://127.0.0.1:1883", "mqtt.topics" : "temperature", "kafka.topics" : "mqtt.", "transforms":"filter", "transforms.filter.type":"com.github.kaiwaehner.kafka.connect.smt.StringFilter", "transforms.filter.topic.format":"fraud" } }'
40 Kafka Connect - Converters MQTT Broker S3 Object Storage MQTT Source Connector AvroConverter AvroConverter S3 Sink Connector JSON Message Connect data API format byte[] (Avro) byte[] (Avro) Connect data API format AmazonS3 Object
41 Live Demo … MQTT Integration with Kafka Connect… Connect
42 Kafka Connect + MQTT Connector https://github.com/kaiwaehner/kafka-connect-iot-mqtt-connector-example
43 Kafka-Native Integration Options between MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy
44 MQTT Proxy Kafka BrokerKafka BrokerKafka Broker MQTT ProxyMQTT DevicesDevicesDevicesDevices Kafka Consumer MQTT Proxy MQTT is push-based Horizontally scalable Consumes push data from IoT devices and forwards it to Kafka Broker at low-latency Kafka Producer under the hood No MQTT Broker needed Kafka Broker Source of truth Responsible for persistence, high availability, reliability
45 Details of Confluent’s MQTT Proxy Implementation General and modular framework • Based on Netty to not re-invent the wheel (network layer handling, thread pools) • Scalable with standard load balancer • Internally uses Kafka Connect formats (allows re-using transformation and other Connect- constructs à Coming soon) Three pipeline stages • Network (Netty) • Protocol (like MQTT with QoS 0,1,2 today, later others, maybe e.g. WebSockets) • Stream (Kafka clients: Today Producers, later also consumers) Missing parts in first release • Only MQTT Publish; MQTT Subscribe coming soon • MQTT-specific features like last will or testament
46 Kafka-Native Integration Options between MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy
47 Confluent REST Proxy REST Proxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP(S) TCP The „simple alternative“ for IoT • Simple and understood • HTTP(S) Proxy à Push-based • Security ”easier” • Scalable with standard load balancer (still synchronous HTTP) • Not for very high throughput • Implement Connect features in your client app
48 Confluent REST Proxy - Produce and Consume Messages
49 Agenda 1) IoT Use Cases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
5050 Processing Options for MQTT Data with Apache Kafka Streams Kafka native vs. additional big data cluster and technology (or others, you name it …)
5353 Example: Anomaly Detection System to Predict Defects in Car Engine MQTT Proxy Elastic search Grafana Kafka Cluster Kafka Connect KSQL Car Sensors Kafka Ecosystem Other Components Real Time Emergency System All Data PotentialDefect Apply Analytic Model Filter Anomalies On premise DC: Kubernetes + Confluent OperatorAt the edge
5454 KSQL and Deep Learning (Auto Encoder) for Anomaly Detection “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF)
55 Live Demo … Processing IoT data with MQTT Proxy and KSQL …
56 Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data https://github.com/kaiwaehner/ksql-udf-deep-learning-mqtt-iot
57 Poll What is the best choice for your IoT integration between MQTT and Kafka? 1. Kafka Connect 2. MQTT Proxy 3. REST Proxy 4. Custom Kafka Client (Java Client, Nifi, StreamSets, non-MQTT technology, …)
58 Kai Waehner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.kai-waehner.de www.confluent.io LinkedIn Questions? Feedback? Please contact me!

IoT Integration with MQTT and Apache Kafka

  • 1.
    1Confidential Processing IoT Datawith MQTT and Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de Kafka-Native End-to-End IoT Data Integration and Processing
  • 2.
    3 Agenda 1) IoT UseCases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
  • 3.
    4 Agenda 1) IoT UseCases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
  • 4.
    6 Connected Intelligence (Cars,Machines, Robots, …)
  • 5.
  • 6.
    8 Smart Retail andCustomer 360
  • 7.
    9 Intelligent Applications (EarlyPart Scrapping, Predictive Maintenance, …)
  • 8.
    10 ? Architecture (High Level) KafkaBrokerKafka BrokerStreaming Platform Connect w/ MQTT connector IoT Gateway DevicesDevicesDevicesDevice Device Tracking (Real Time) Predictive Maintenance (Near Real Time) Log Analytics (Batch) Edge Data Center / Cloud How to integrate?
  • 9.
    11 Poll Which IoT scenariosdo you see in your company? 1) IoT ingestion into analytics cluster 2) Bi-directional communication to control IoT devices (e.g. connected cars, fleet management, logistics) 3) Real time stream processing using machine learning (e.g. predictive maintenance, early part scrapping) 4) No IoT scenarios today; maybe in the future
  • 10.
    13 Agenda 1) IoT UseCases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
  • 11.
    14 MQTT - Publish/ subscribe messaging protocol • Built on top of TCP/IP for constrained devices and unreliable networks • Many (open source) broker implementations • Many client libraries • IoT-specific features for bad network / connectivity • Widely used (mostly IoT, but also web and mobile apps via MQTT over WebSockets)
  • 12.
    15 MQTT Server 1 Processor 1Processor 2 ... MQTT Architecture (no scale) topic: [deviceid]/car
  • 13.
    16 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server3 MQTT Server 4 topic: [deviceid]/car ... MQTT Architecture (clustering depends on broker implementation) Processor 1 Processor 2 Processor 3 Processor 4
  • 14.
    17 MQTT Architecture (clusteringdepends on broker implementation) Load Balancer MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/car ... Processor 1 Processor 2 Processor 3 Processor 4
  • 15.
    18 MQTT Trade-Offs Pros • Lightweight •Simple API • Built for poor connectivity / high latency scenario • Many client connections (tens of thousands per MQTT server) Cons • Queuing, not stream processing • Can’t handle usage surges (no buffering) • No high scalability • Very asynchronous processing (often offline for long time) • No good integration to rest of the enterprise • No reprocessing of events
  • 16.
    19 Agenda 1) IoT UseCases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
  • 17.
    20 Apache Kafka –The Rise of a Streaming Platform The Log ConnectorsConnectors Producer Consumer Streaming Engine
  • 18.
    24 Apache Kafka ==Distributed Commit Log with Replication
  • 19.
    26 Kafka Trade-Offs (fromIoT perspective) Pros • Stream processing, not just queuing • High throughput • Large scale • High availability • Long term storage and buffering • Reprocessing of events • Good integration to rest of the enterprise Cons • Not built for tens of thousands connections • Requires stable network and good infrastructure • No IoT-specific features like keep alive, last will or testament
  • 20.
    27 (De facto) Standardsfor Processing IoT Data A Match Made In Heaven + =
  • 21.
    28 Agenda 1) IoT UseCases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
  • 22.
    29 ? Architecture (High Level) KafkaBrokerKafka BrokerStreaming Platform Connect w/ MQTT connector GatewayDevicesDevicesDevicesDevice Device Tracking (Real Time) Predictive Maintenance (Near Real Time) Log Analytics (Batch) Edge Data Center / Cloud How to integrate?
  • 23.
    30 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server3 MQTT Server 4 topic: [deviceid]/car Kafka Integration Sensor Data Stream processing Kafka Cluster End-to-End Integration from MQTT to Apache Kafka
  • 24.
    31 Design Questions forEnd-to-End Integration • How much throughput? • Ingest-only vs. processing of data? • Analytical vs. operational deployments? • Device publish only vs. device pub/sub? • Pull vs. Push? • Low-level client vs. integration framework vs. proxy? • Integration patterns needed? (transform, route, …)? • IoT-specific features required (last will, testament, …)?
  • 25.
    32 Kafka-Native Integration Optionsbetween MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy
  • 26.
    33 Kafka-Native Integration Optionsbetween MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy
  • 27.
    34 MQTT Source andSink Connectors for Kafka Connect https://www.confluent.io/hub/ https://www.confluent.io/connector/kafka-connect-mqtt/
  • 28.
    35 ? Integration with KafkaConnect (Source and Sink) Kafka BrokerKafka BrokerKafka Broker MQTT Broker Connect w/ MQTT connector Connect w/ MQTT connectorMQTT DevicesDevicesDevicesDevice Kafka Consumer MQTT Broker Persistent + offers MQTT-specific features Consumes push data from IoT devices Kafka Connect Kafka Consumer + Kafka Producer under the hood Pull-based (at own pace, without overwhelming the source or getting overwhelmed by the source) Out-of-the-box scalability and integration features (like connectors, converters, SMTs)
  • 29.
  • 30.
    39 Connector for MQTTSource + Single Message Transformation (SMT) curl -s -X POST -H 'Content-Type: application/json' http://localhost:8083/connectors -d '{ "name" : "mqtt-source", "config" : { "connector.class" : "io.confluent.connect.mqtt.MqttSourceConnector", "tasks.max" : "1", "mqtt.server.uri" : "tcp://127.0.0.1:1883", "mqtt.topics" : "temperature", "kafka.topics" : "mqtt.", "transforms":"filter", "transforms.filter.type":"com.github.kaiwaehner.kafka.connect.smt.StringFilter", "transforms.filter.topic.format":"fraud" } }'
  • 31.
    40 Kafka Connect -Converters MQTT Broker S3 Object Storage MQTT Source Connector AvroConverter AvroConverter S3 Sink Connector JSON Message Connect data API format byte[] (Avro) byte[] (Avro) Connect data API format AmazonS3 Object
  • 32.
    41 Live Demo … MQTTIntegration with Kafka Connect… Connect
  • 33.
    42 Kafka Connect +MQTT Connector https://github.com/kaiwaehner/kafka-connect-iot-mqtt-connector-example
  • 34.
    43 Kafka-Native Integration Optionsbetween MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy
  • 35.
    44 MQTT Proxy Kafka BrokerKafkaBrokerKafka Broker MQTT ProxyMQTT DevicesDevicesDevicesDevices Kafka Consumer MQTT Proxy MQTT is push-based Horizontally scalable Consumes push data from IoT devices and forwards it to Kafka Broker at low-latency Kafka Producer under the hood No MQTT Broker needed Kafka Broker Source of truth Responsible for persistence, high availability, reliability
  • 36.
    45 Details of Confluent’sMQTT Proxy Implementation General and modular framework • Based on Netty to not re-invent the wheel (network layer handling, thread pools) • Scalable with standard load balancer • Internally uses Kafka Connect formats (allows re-using transformation and other Connect- constructs à Coming soon) Three pipeline stages • Network (Netty) • Protocol (like MQTT with QoS 0,1,2 today, later others, maybe e.g. WebSockets) • Stream (Kafka clients: Today Producers, later also consumers) Missing parts in first release • Only MQTT Publish; MQTT Subscribe coming soon • MQTT-specific features like last will or testament
  • 37.
    46 Kafka-Native Integration Optionsbetween MQTT and Apache Kafka Kafka Connect MQTT Proxy REST Proxy
  • 38.
    47 Confluent REST Proxy RESTProxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP(S) TCP The „simple alternative“ for IoT • Simple and understood • HTTP(S) Proxy à Push-based • Security ”easier” • Scalable with standard load balancer (still synchronous HTTP) • Not for very high throughput • Implement Connect features in your client app
  • 39.
    48 Confluent REST Proxy- Produce and Consume Messages
  • 40.
    49 Agenda 1) IoT UseCases 2) MQTT Standard 3) Apache Kafka Ecosystem 4) Kafka-Native End-to-End IoT Integration Architecture(s) 5) IoT Data Processing
  • 41.
    5050 Processing Options forMQTT Data with Apache Kafka Streams Kafka native vs. additional big data cluster and technology (or others, you name it …)
  • 42.
    5353 Example: Anomaly DetectionSystem to Predict Defects in Car Engine MQTT Proxy Elastic search Grafana Kafka Cluster Kafka Connect KSQL Car Sensors Kafka Ecosystem Other Components Real Time Emergency System All Data PotentialDefect Apply Analytic Model Filter Anomalies On premise DC: Kubernetes + Confluent OperatorAt the edge
  • 43.
    5454 KSQL and DeepLearning (Auto Encoder) for Anomaly Detection “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF)
  • 44.
    55 Live Demo … ProcessingIoT data with MQTT Proxy and KSQL …
  • 45.
    56 Deep Learning UDFfor KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data https://github.com/kaiwaehner/ksql-udf-deep-learning-mqtt-iot
  • 46.
    57 Poll What is thebest choice for your IoT integration between MQTT and Kafka? 1. Kafka Connect 2. MQTT Proxy 3. REST Proxy 4. Custom Kafka Client (Java Client, Nifi, StreamSets, non-MQTT technology, …)
  • 47.