Apache NiFi 101 Timothy Spann | Developer Advocate
streamnative.io Tim Spann Developer Advocate ● https://www.datainmotion.dev/ ● https://github.com/tspannhw/SpeakerProfile ● https://dev.to/tspannhw ● https://sessionize.com/tspann/ ● https://streamnative.io/ DZone Zone Leader and Big Data MVB Data DJay
Apache NiFi
Don’t Be Afraid of Open Source
streamnative.io Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a sixty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
streamnative.io Architecture https://nifi.apache.org/docs/nifi-docs/html/overview.html
streamnative.io Provenance https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html
streamnative.io Backpressure & Prioritizers https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
streamnative.io System Diagnostics
streamnative.io Flow File https://nifi.apache.org/docs/nifi-docs/html/overview.html Flow Files are content and key/value pairs for attributes that are each event/message/file that has been introduced into NiFi.
streamnative.io Version Control (Github and Beyond)
streamnative.io Repositories https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#repositories
Record Processors https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html ● XML, CSV, JSON, AVRO and more ● Schemas or Inferred Schemas ● Easily convert between them ● Support SQL with Apache Calcite
Record Processors https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html
Caching https://dev.to/tspannhw/flank-using-apache-kudu-as-a-cache-for-fda-updates-4knj
Metrics, Status, Charts https://www.clouddataops.dev/data-flow-experience
DevOps https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html https://nipyapi.readthedocs.io/en/latest/ nifi-toolkit/bin/cli.sh nifi list-param-contexts -u http:/ /edge2ai-1.dim.local:8080 nifi-toolkit/bin/cli.sh nifi pg-list -u http:/ /edge2ai-1.dim.local:8080 nifi-toolkit/bin/cli.sh nifi pg-set-param-context -u http:/ /edge2ai-1.dim.local:8080 ...
DevOps https://dev.to/tspannhw/automating-starting-services-in-apache-nifi-and-applying-parameters-5h4n https://github.com/tspannhw/ApacheConAtHome2020/blob/main/scripts/setupnifi.sh nifi pg-list nifi pg-status nifi pg-get-services nifi pg-enable-services -u http:/ /edge2ai-1.dim.local:8080 --processGroupId root nifi pg-start -u http:/ /edge2ai-1.dim.local:8080 -pgid LOOKTHISUP nifi list-param-contexts -u http:/ /edge2ai-1.dim.local:8080 -verbose nifi create-reporting-task -u http:/ /edge2ai-1.dim.local:8080 -verbose -i
Consume MQTT This could read from Apache Pulsar - MoP (MQTT on Pulsar)
Listen FTP Let Apache NiFi be your FTP server
streamnative.io No More Spaghetti Flows - DO NOT https://dev.to/tspannhw/no-more-spaghetti-flows-2emd Do Not ● Do not Put 1,000 Flows on one workspace. ● If your flow has hundreds of steps, this is a Flow Smell. Investigate why. ● Do not Use ExecuteProcess, ExecuteScripts or a lot of Groovy scripts as a default, look for existing processors ● Do not Use Random Custom Processors you find that have no documentation or are unknown. ● Do not forget to upgrade, if you are running anything before Apache NiFi 1.14, upgrade now! ● Do not run on default 512M RAM. ● Do not run one node and think you have a highly available cluster. ● Do not split a file with millions of records to individual records in one shot without checking available space/memory and back pressure. ● Use Split processors only as an absolute last resort. Many processors are designed to work on FlowFiles that contain many records or many lines of text. Keeping the FlowFiles together instead of splitting them apart can often yield performance that is improved by 1-2 orders of magnitude.
streamnative.io No More Spaghetti Flows - DO https://dev.to/tspannhw/no-more-spaghetti-flows-2emd Do ● Reduce, Reuse, Recycle. Use Parameters to reuse common modules. ● Put flows, reusable chunks (write to Slack, Database, Kafka) into separate Process Groups. ● Write custom processors if you need new or specialized features ● Use Record Processors everywhere ● Read the Docs! ● Use the NiFi Registry for version control. ● Use NiFi CLI and DevOps for Migrations. ● Walk through your flow and make sure you understand every step and it’s easy to read and follow. Is every processor used? Are there dead ends? ● Do run Zookeeper on different nodes from Apache NiFi. ● Use routing based on content and attributes to allow one flow to handle multiple nearly identical flows is better than deploying the same flow many times with tweaks to parameters in same cluster. ● Use the correct driver for your database. There's usually a couple different JDBC drivers.
streamnative.io Apache MXNet Native Processor through DJL.AI for Apache NiFi This processor uses the DJL.AI Java Interface https://github.com/tspannhw/nifi-djl-processor https://dev.to/tspannhw/easy-deep-learning-in-apache-nifi-with-djl-2d79
streamnative.io What are the Benefits of Pulsar? Data Durability Scalability Geo-Replication Multi-Tenancy Unified Messaging Model
Apache Pulsar
streamnative.io A Unified Messaging Platform Message Queuing Data Streaming
Demo
Wrap-Up
streamnative.io Founded by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative builds a cloud-native event streaming platform that enables enterprises to easily access data as real-time event streams.
streamnative.io Interested In Learning More? Flink SQL Cookbook The Github Source for Flink SQL Demo The GitHub Source for Demo Manning's Apache Pulsar in Action O’Reilly Book [11/8] PASS Data Community [11/18] Developer Week Austin [11/19] Porto Tech Hub Con [12/3] Data Science Camp Resources Free eBooks Upcoming Events
streamnative.io We’re Hiring streamnative.io/careers/
Let’s Keep in Touch! Speaker Name Speaker title @PassDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw

Api world apache nifi 101