November 8, 2024 Unstructured Data Processing with a Raspberry Pi AI Kit and Python
Tim Spann Principal Developer Advocate, Zilliz tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/paasdev https://github.com/tspannhw https://github.com/milvus-io/milvus Speaker
Todayʼs Slides
Code 1 https://bit.ly/4ftn04t
Code 2 https://bit.ly/4ebEPUJ
Walk Through Article https://bit.ly/4hxjvvF
Agenda Introduction Unstructured Data Vector Databases Similarity search Milvus 01 Overview of the Raspberry Pi 5  AI Kit Human Pose Estimation Processing Images and utilized pre-trained models from Hailo 02 App and Demo Running edge AI application connected to cloud Integrating AI Models with Ollama Utilizing, Querying, Visualizing data with Milvus, Slack and other tools 03 Next Steps Challenges, Limitations and Alternatives 03
8 | © Copyright Zilliz 8 01 Introduction
9 | © Copyright Zilliz 9 The challenge of Unstructured Data ● Problem: Unstructured data comes in lots of forms, no easy way to interact with it all ● Solution: Vector embeddings ● How: Neural networks e.g. embedding models Vector Databases
10 | © Copyright Zilliz 10 10% Other newly generated data in 2025 will be unstructured data 90% Data Source: The Digitization of the World by IDC Why is Semantic Search so important?
11 | © Copyright Zilliz 11 What is Milvus ideal for? • Advanced filtering • Hybrid search • Durability and backups • Replications/High Availability • Sharding • Aggregations • Lifecycle management • Multi-tenancy • High query load • High insertion/deletion • Full precision/recall • Accelerator support GPU, FPGA • Billion-scale storage Purpose-built to store, index and query vector embeddings from unstructured data at scale.
12 | © Copyright Zilliz 12 Weʼve built technologies for various types of use cases Compute Types Designed for various compute powers, such as AVX512, Neon for SIMD, quantization cache-aware optimization and GPU Leverage strengths of each hardware type, ensuring high-speed processing and cost-effective scalability for different application needs Search Types Support multiple types such as top-K ANN, Range ANN, sparse & dense, multi-vector, grouping, and metadata filtering Enable query flexibility and accuracy, allowing developers to tailor their information retrieval needs Multi-tenancy Enable multi-tenancy through collection and partition management Allow for efficient resource utilization and customizable data segregation, ensuring secure and isolated data handling for each tenant Index Types Offer a wide range of 15 indexes support, including popular ones like HNSW, PQ, Binary, Sparse, DiskANN and GPU index Empower developers with tailored search optimizations, catering to performance, accuracy and cost needs
13 Vector Database : making sense of unstructured data 2024 A vector database stores embedding vectors and allows for semantic retrieval of various types of unstructured data.
14 | © Copyright Zilliz 14
15 | © Copyright Zilliz 15 02 Overview of Pi
16 | © Copyright Zilliz 16 Raspberry Pi 5  AI Kit Raspberry Pi 5 with 8GB of RAM The AI Kit adds a neural network inference accelerator capable of 13 tera-operations per second (TOPS), which is pretty good for $70 US. Attached to this M.2 Hat is the Hailo-8L M.2 Entry-Level Acceleration Module which will give us our AI powers.
17 | © Copyright Zilliz 17 What is it? https://paperswithcode.com/task/pose-estimation 1,431 papers with code Human Pose Estimation is a computer vision technique that locates and estimates things like eyes, ears, shoulders, joints in motion. It looks pretty cool and has some interesting applications for medical purposes and robotics. For me, it was one of the cool examples that runs on the AI Kit.
18 | © Copyright Zilliz 18 Pose Estimation by Hailo 8L Each person is identified and represented by 17 keypoints Examples nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. We are tracking eyes and more (updated today) https://github.com/hailo-ai/hailo_model_zoo/blob/master/docs/public_models/HAILO8/HAILO8_pose_estimation.rst https://github.com/tensorboy/centerpose https://softwaremill.com/human-pose-estimation-2023-guide/
19 | © Copyright Zilliz 19 Pose Estimation on Hailo 8L Pose Estimation COCO Yolov8s_pose Hailo-8L https://github.com/ultralytics/ultralytics
20 | © Copyright Zilliz 20 HAILO Raspberry Pi 5 Example Apps https://github.com/hailo-ai/hailo-rpi5-examples New: CLIP Zero Shot Inference Application
21 | © Copyright Zilliz 21 03 App and Demo
22 | © Copyright Zilliz 22 Show Me The Source Code https://github.com/tspannhw/AIM-RPIAIKit-PoseEstimation https://github.com/hailo-ai/hailo-rpi5-examples lefteye = (f"x: {left_eye_x:.2f} y: {left_eye_y:.2f}") righteye = (f"x: {right_eye_x:.2f} y: {right_eye_y:.2f}") try: imageembedding = extractor(strfilename) milvus_client.insert( COLLECTION_NAME, {"vector": imageembedding, "lefteye": lefteye, "righteye": righteye, "label": label, "confidence": confidence}) except Exception as e: print("An error:", e)
23 | © Copyright Zilliz 23 • Vision to Images and Videos • Audio from Cameras and Microphones • Raw Text • Edge Neural Networks and Gen AI • Unstructured Data Processing and Vector DB Edge Unstructured Data
24 | © Copyright Zilliz 24 Video
25 | © Copyright Zilliz 25 Video
26 | © Copyright Zilliz 26
27 | © Copyright Zilliz 27
28 | © Copyright Zilliz 28 "rank": 8 "id": 451727117998321522 "score": "0.81195295" "lefteye": "x: 246.00 y: 60.00" "leftshoulder": "x: 292.00 y: 111.00" "ogfilename": "personpose.jpg" "leftwrist": "x: 521.00 y: 176.00" "righthip": "x: 218.00 y: 409.00" "rightankle": "x: 260.00 y: 535.00" "rightknee": "x: 246.00 y: 559.00" "rightear": "x: 171.00 y: 55.00" "leftankle": "x: 381.00 y: 545.00" "height": "640" "nose": "x: 236.00 y: 83.00" "leftear": "x: 255.00 y: 60.00" "lefthip": "x: 339.00 y: 386.00" "confidence": 0.8405382 "rightwrist": "x: 106.00 y: 423.00" "label": "person" "url": "https://iili.io/dEqvDdl.jpg" "sizeformatted": "84 KB" "righteye": "x: 213.00 y: 64.00" "rightshoulder": "x: 101.00 y: 139.00" "filename": "dEqvDdl.jpg" "size": "84036" "width": "640" "leftelbow": "x: 395.00 y: 195.00" "mimetype": "image/jpeg" "rightelbow": "x: 64.00 y: 302.00" "leftknee": "x: 395.00 y: 531.00"
29 | © Copyright Zilliz 29 04 Next Steps
30 | © Copyright Zilliz 30 ● Reduced Memory ● Limited Processing Power ● New Kit and Library Challenges
31 | © Copyright Zilliz 31 ● Just Released AI Kit + with 26 Tops ● NVIDIA Jetson Series ● Smart Cameras like OAKD ● Specialized Devices Alternatives
32 | © Copyright Zilliz 32 ● Closer is better ● Empowering AI Robots ● Vector Search Everywhere ● Keep your data and computation close Takeaways
33 | © Copyright Zilliz 33 | © Copyright Zilliz 33 Q&A
34 | © Copyright Zilliz 34 ● Edge AI ● Edge Hardware ● Milvus ● Vector Databases Questions
35 | © Copyright Zilliz 35 | © Copyright Zilliz 35 RESOURCES
36 | © Copyright Zilliz 36 Vector Database Resources Give Milvus a Star! Chat with me on Discord! https://github.com/milvus-io/milvus
37 Unstructured Data Meetup https://www.meetup.com/unstructured-data-meetup-new-york/ This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
https://medium.com/@tspann/unstructured-data-processing-with-a-raspberry-pi-ai-kit-c959dd7fff47 Raspberry Pi AI Kit Hailo Edge AI
https://medium.com/@tspann/from-the-edge-to-the-cloud-and-back-again-01095e95a783 Raspberry Pi AI Kit Hailo Edge AI Pose Estimation
https://medium.com/@tspann/unstructured-street-data-in-new-york-8d3cde0a1e5b
Extracting Value from Unstructured Data Example • A company has 100,000s+ pages of proprietary documentation to enable their staff to service customers. Problem • Searching can be slow, inefficient, or lack context. Solution • Create internal chatbot with ChatGPT and a vector database enriched with company documentation to provide direction and support to employees and customers. https://osschat.io/chat
42 | © Copyright Zilliz 42 Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
43 | © Copyright Zilliz 43 T H A N K Y O U

tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi AI Kit and Python

  • 1.
    November 8, 2024 UnstructuredData Processing with a Raspberry Pi AI Kit and Python
  • 2.
    Tim Spann Principal DeveloperAdvocate, Zilliz tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/paasdev https://github.com/tspannhw https://github.com/milvus-io/milvus Speaker
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
    Agenda Introduction Unstructured Data VectorDatabases Similarity search Milvus 01 Overview of the Raspberry Pi 5  AI Kit Human Pose Estimation Processing Images and utilized pre-trained models from Hailo 02 App and Demo Running edge AI application connected to cloud Integrating AI Models with Ollama Utilizing, Querying, Visualizing data with Milvus, Slack and other tools 03 Next Steps Challenges, Limitations and Alternatives 03
  • 8.
    8 | ©Copyright Zilliz 8 01 Introduction
  • 9.
    9 | ©Copyright Zilliz 9 The challenge of Unstructured Data ● Problem: Unstructured data comes in lots of forms, no easy way to interact with it all ● Solution: Vector embeddings ● How: Neural networks e.g. embedding models Vector Databases
  • 10.
    10 | ©Copyright Zilliz 10 10% Other newly generated data in 2025 will be unstructured data 90% Data Source: The Digitization of the World by IDC Why is Semantic Search so important?
  • 11.
    11 | ©Copyright Zilliz 11 What is Milvus ideal for? • Advanced filtering • Hybrid search • Durability and backups • Replications/High Availability • Sharding • Aggregations • Lifecycle management • Multi-tenancy • High query load • High insertion/deletion • Full precision/recall • Accelerator support GPU, FPGA • Billion-scale storage Purpose-built to store, index and query vector embeddings from unstructured data at scale.
  • 12.
    12 | ©Copyright Zilliz 12 Weʼve built technologies for various types of use cases Compute Types Designed for various compute powers, such as AVX512, Neon for SIMD, quantization cache-aware optimization and GPU Leverage strengths of each hardware type, ensuring high-speed processing and cost-effective scalability for different application needs Search Types Support multiple types such as top-K ANN, Range ANN, sparse & dense, multi-vector, grouping, and metadata filtering Enable query flexibility and accuracy, allowing developers to tailor their information retrieval needs Multi-tenancy Enable multi-tenancy through collection and partition management Allow for efficient resource utilization and customizable data segregation, ensuring secure and isolated data handling for each tenant Index Types Offer a wide range of 15 indexes support, including popular ones like HNSW, PQ, Binary, Sparse, DiskANN and GPU index Empower developers with tailored search optimizations, catering to performance, accuracy and cost needs
  • 13.
    13 Vector Database :making sense of unstructured data 2024 A vector database stores embedding vectors and allows for semantic retrieval of various types of unstructured data.
  • 14.
    14 | ©Copyright Zilliz 14
  • 15.
    15 | ©Copyright Zilliz 15 02 Overview of Pi
  • 16.
    16 | ©Copyright Zilliz 16 Raspberry Pi 5  AI Kit Raspberry Pi 5 with 8GB of RAM The AI Kit adds a neural network inference accelerator capable of 13 tera-operations per second (TOPS), which is pretty good for $70 US. Attached to this M.2 Hat is the Hailo-8L M.2 Entry-Level Acceleration Module which will give us our AI powers.
  • 17.
    17 | ©Copyright Zilliz 17 What is it? https://paperswithcode.com/task/pose-estimation 1,431 papers with code Human Pose Estimation is a computer vision technique that locates and estimates things like eyes, ears, shoulders, joints in motion. It looks pretty cool and has some interesting applications for medical purposes and robotics. For me, it was one of the cool examples that runs on the AI Kit.
  • 18.
    18 | ©Copyright Zilliz 18 Pose Estimation by Hailo 8L Each person is identified and represented by 17 keypoints Examples nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. We are tracking eyes and more (updated today) https://github.com/hailo-ai/hailo_model_zoo/blob/master/docs/public_models/HAILO8/HAILO8_pose_estimation.rst https://github.com/tensorboy/centerpose https://softwaremill.com/human-pose-estimation-2023-guide/
  • 19.
    19 | ©Copyright Zilliz 19 Pose Estimation on Hailo 8L Pose Estimation COCO Yolov8s_pose Hailo-8L https://github.com/ultralytics/ultralytics
  • 20.
    20 | ©Copyright Zilliz 20 HAILO Raspberry Pi 5 Example Apps https://github.com/hailo-ai/hailo-rpi5-examples New: CLIP Zero Shot Inference Application
  • 21.
    21 | ©Copyright Zilliz 21 03 App and Demo
  • 22.
    22 | ©Copyright Zilliz 22 Show Me The Source Code https://github.com/tspannhw/AIM-RPIAIKit-PoseEstimation https://github.com/hailo-ai/hailo-rpi5-examples lefteye = (f"x: {left_eye_x:.2f} y: {left_eye_y:.2f}") righteye = (f"x: {right_eye_x:.2f} y: {right_eye_y:.2f}") try: imageembedding = extractor(strfilename) milvus_client.insert( COLLECTION_NAME, {"vector": imageembedding, "lefteye": lefteye, "righteye": righteye, "label": label, "confidence": confidence}) except Exception as e: print("An error:", e)
  • 23.
    23 | ©Copyright Zilliz 23 • Vision to Images and Videos • Audio from Cameras and Microphones • Raw Text • Edge Neural Networks and Gen AI • Unstructured Data Processing and Vector DB Edge Unstructured Data
  • 24.
    24 | ©Copyright Zilliz 24 Video
  • 25.
    25 | ©Copyright Zilliz 25 Video
  • 26.
    26 | ©Copyright Zilliz 26
  • 27.
    27 | ©Copyright Zilliz 27
  • 28.
    28 | ©Copyright Zilliz 28 "rank": 8 "id": 451727117998321522 "score": "0.81195295" "lefteye": "x: 246.00 y: 60.00" "leftshoulder": "x: 292.00 y: 111.00" "ogfilename": "personpose.jpg" "leftwrist": "x: 521.00 y: 176.00" "righthip": "x: 218.00 y: 409.00" "rightankle": "x: 260.00 y: 535.00" "rightknee": "x: 246.00 y: 559.00" "rightear": "x: 171.00 y: 55.00" "leftankle": "x: 381.00 y: 545.00" "height": "640" "nose": "x: 236.00 y: 83.00" "leftear": "x: 255.00 y: 60.00" "lefthip": "x: 339.00 y: 386.00" "confidence": 0.8405382 "rightwrist": "x: 106.00 y: 423.00" "label": "person" "url": "https://iili.io/dEqvDdl.jpg" "sizeformatted": "84 KB" "righteye": "x: 213.00 y: 64.00" "rightshoulder": "x: 101.00 y: 139.00" "filename": "dEqvDdl.jpg" "size": "84036" "width": "640" "leftelbow": "x: 395.00 y: 195.00" "mimetype": "image/jpeg" "rightelbow": "x: 64.00 y: 302.00" "leftknee": "x: 395.00 y: 531.00"
  • 29.
    29 | ©Copyright Zilliz 29 04 Next Steps
  • 30.
    30 | ©Copyright Zilliz 30 ● Reduced Memory ● Limited Processing Power ● New Kit and Library Challenges
  • 31.
    31 | ©Copyright Zilliz 31 ● Just Released AI Kit + with 26 Tops ● NVIDIA Jetson Series ● Smart Cameras like OAKD ● Specialized Devices Alternatives
  • 32.
    32 | ©Copyright Zilliz 32 ● Closer is better ● Empowering AI Robots ● Vector Search Everywhere ● Keep your data and computation close Takeaways
  • 33.
    33 | ©Copyright Zilliz 33 | © Copyright Zilliz 33 Q&A
  • 34.
    34 | ©Copyright Zilliz 34 ● Edge AI ● Edge Hardware ● Milvus ● Vector Databases Questions
  • 35.
    35 | ©Copyright Zilliz 35 | © Copyright Zilliz 35 RESOURCES
  • 36.
    36 | ©Copyright Zilliz 36 Vector Database Resources Give Milvus a Star! Chat with me on Discord! https://github.com/milvus-io/milvus
  • 37.
    37 Unstructured Data Meetup https://www.meetup.com/unstructured-data-meetup-new-york/ Thismeetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
  • 38.
  • 39.
  • 40.
  • 41.
    Extracting Value fromUnstructured Data Example • A company has 100,000s+ pages of proprietary documentation to enable their staff to service customers. Problem • Searching can be slow, inefficient, or lack context. Solution • Create internal chatbot with ChatGPT and a vector database enriched with company documentation to provide direction and support to employees and customers. https://osschat.io/chat
  • 42.
    42 | ©Copyright Zilliz 42 Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 43.
    43 | ©Copyright Zilliz 43 T H A N K Y O U