InferNode

A scalable inference platform that provides multi-node management and control for AI/ML inference workloads.

It enables easy deployment and management of inference pipelines across distributed nodes with auto-discovery, telemetry, and flexible result publishing.

📺 Demo Video

Quick.demo.of.a.computer.vision.inference.pipeline.builder.mp4

🚀 Features - Scalable Inference Platform

Core Capabilities

Multi-engine support: Ultralytics YOLO, Geti, and custom engines
Auto-discovery: Nodes automatically discover each other on the network
Real-time telemetry: System monitoring and performance metrics via MQTT
Flexible result publishing: MQTT, webhooks, serial, and custom destinations
RESTful API: Complete HTTP API for remote management
Rate limiting: Built-in rate limiting for all result destinations

Supported Inference Engines

Ultralytics: YOLO object detection models (YOLOv8, YOLOv11, etc.)
Geti: Intel's computer vision platform
ONNX Runtime: Cross-platform ML model inference with CPU, OpenVINO, and GPU acceleration
Pass-through: For testing and development
Custom: Extensible framework for custom implementations

Result Destinations

MQTT: Publish results to MQTT brokers
Webhook: HTTP POST to custom endpoints
Serial: Output to serial ports (RS-232, USB)
OPC UA: Industrial automation protocol
ROS2: Robot Operating System 2
ZeroMQ: High-performance messaging
Folder: Save to local/network filesystem
Roboflow: Integration with Roboflow platform
Geti: Geti platform integration
Custom: Implement your own destinations

📋 Requirements

Python 3.10+
Compatible with Windows, Linux
Optional: CUDA for GPU acceleration
Optional: MQTT broker for telemetry and result publishing

Note: Only tested on a limited set of configurations so far (Windows / Ubuntu) x (Intel / Nvidia) - AMD and more is on the #todo list

🛠️ Installation

Quick Start

# Clone the repository git clone https://github.com/olkham/inference_node.git cd inference_node # Run the setup script (Windows) setup.bat # Or on Linux/macOS chmod +x setup.sh ./setup.sh

Manual Installation

# Install core dependencies pip install -r requirements.txt # Optional: Install AI/ML frameworks (if not already in requirements.txt) pip install torch torchvision ultralytics geti-sdk # Optional: Install ONNX Runtime (choose based on your hardware) pip install onnxruntime>=1.16.0 # CPU version pip install onnxruntime-openvino>=1.16.0 # Intel OpenVINO acceleration pip install "onnxruntime-gpu[cuda12,cudnn]>=1.16.0" # NVIDIA GPU acceleration # Or use optional dependency groups from pyproject.toml pip install -e .[onnx] # CPU version pip install -e .[onnx-openvino] # Intel OpenVINO pip install -e .[onnx-gpu] # NVIDIA GPU # Optional: Install GPU monitoring (uses nvidia-ml-py, not deprecated pynvml) pip install nvidia-ml-py>=12.0.0 # Optional: Install serial communication pip install pyserial>=3.5

🏃‍♂️ Quick Start

1. Start an Inference Node

from InferenceNode import InferenceNode # Create and start a node node = InferenceNode("MyNode", port=5555) node.start(enable_discovery=True, enable_telemetry=True)

Or use the command line:

# Start full node with all services using Flask python main.py # Start full node with all services using waitress (production mode) python main.py --production # Start with custom settings python main.py --port 8080 --name "ProductionNode" --no-telemetry

2. Using Inference Engines

from InferenceEngine import InferenceEngine # Create different engine types ie_ultralytics = InferenceEngine('ultralytics') ie_onnx = InferenceEngine('onnx') ie_geti = InferenceEngine('geti') ie_custom = InferenceEngine('custom') # Upload and load a model (Ultralytics example) model_id = ie_ultralytics.upload('path/to/model.pt') ie_ultralytics.load(model_id, device='cuda') # Upload and load an ONNX model onnx_model_id = ie_onnx.upload('path/to/model.onnx') ie_onnx.load(onnx_model_id, device='cpu') # Run inference result = ie_ultralytics.infer('path/to/image.jpg') onnx_result = ie_onnx.infer('path/to/image.jpg')

3. Configure Result Publishing

from ResultPublisher import ResultPublisher, ResultDestination # Create result publisher rp = ResultPublisher() # Configure MQTT destination rd_mqtt = ResultDestination('mqtt') rd_mqtt.configure( server='localhost', topic='infernode/results', rate_limit=1.0 # 1 second between publishes ) rp.add(rd_mqtt) # Configure webhook destination rd_webhook = ResultDestination('webhook') rd_webhook.configure( url='http://myserver.com/webhook', rate_limit=0.5 ) rp.add(rd_webhook) # Publish results rp.publish({"inference_results": "data"})

🔧 API Reference

Node Information

GET /api/info

Returns node capabilities and status.

Engine Management

# Load an inference engine POST /api/engine/load { "engine_type": "ultralytics", "model_id": "model_123", "device": "cuda" } # Upload a model POST /api/engine/upload # Form data with file upload

Inference

POST /api/inference # Form data with image file or JSON with image_path

Result Publisher Configuration

POST /api/publisher/configure { "type": "mqtt", "config": { "server": "localhost", "topic": "results", "rate_limit": 1.0 } }

Telemetry Control

# Start telemetry POST /api/telemetry/start { "mqtt": { "mqtt_server": "localhost", "mqtt_topic": "telemetry" } } # Stop telemetry POST /api/telemetry/stop

📁 Project Structure

inference_node/ ├── InferenceEngine/ # Inference engine implementations │ ├── engines/ │ │ ├── base_engine.py # Base class for all engines │ │ ├── ultralytics_engine.py # Ultralytics YOLO support │ │ ├── geti_engine.py # Geti support │ │ ├── onnx_engine.py # ONNX Runtime support │ │ ├── pass_engine.py # Pass-through engine │ │ └── example_engine_template.py # Custom engine template │ ├── inference_engine_factory.py │ └── result_converters.py ├── InferenceNode/ # Main node implementation │ ├── inference_node.py # Core node class │ ├── pipeline_manager.py # Pipeline orchestration │ ├── pipeline.py # Pipeline definitions │ ├── discovery_manager.py # Network discovery │ ├── telemetry.py # System telemetry │ ├── model_repo.py # Model repository │ ├── hardware_detector.py # Hardware detection │ ├── log_manager.py # Logging │ ├── static/ # Web UI assets │ └── templates/ # Web UI templates ├── ResultPublisher/ # Result publishing system │ ├── publisher.py # Main publisher class │ ├── base_destination.py # Base destination class │ ├── result_destinations.py # Built-in destinations │ └── plugins/ # Pluggable destinations │ ├── mqtt_destination.py │ ├── webhook_destination.py │ ├── serial_destination.py │ ├── opcua_destination.py │ ├── ros2_destination.py │ ├── zeromq_destination.py │ ├── folder_destination.py │ ├── roboflow_destination.py │ ├── geti_destination.py │ └── null_destination.py ├── main.py # Entry point ├── setup.bat # Windows setup script ├── setup.sh # Linux/macOS setup script ├── requirements.txt # Dependencies ├── pyproject.toml # Project configuration ├── Dockerfile # Docker container ├── docker-compose.yml # Docker compose configuration └── readme.md # This file

🔧 Configuration

The node can be configured through:

Command-line arguments: python main.py --port 8080 --name "MyNode"
Web UI: Access the dashboard at http://localhost:8080
REST API: Configure via API endpoints

Default settings:

Node Port: 5555
Discovery: Enabled
Telemetry: Disabled by default
Model Repository: InferenceNode/model_repository/models/
Pipelines: InferenceNode/pipelines/

🧪 Testing

TODO 😂

🔍 Monitoring and Telemetry

InferNode provides comprehensive system monitoring:

CPU usage and frequency
Memory utilization
Disk usage
Network statistics
GPU information (NVIDIA)
Inference performance metrics

Telemetry data is published to MQTT in JSON format:

{ "node_id": "uuid-here", "timestamp": "2025-07-28T10:30:00Z", "cpu": {"usage_percent": 45.2, "count": 8}, "memory": {"usage_percent": 67.3, "total_gb": 16}, "gpu": {"available": true, "devices": [...]} }

🌐 Network Discovery

Nodes automatically discover each other using UDP broadcasts:

from discovery import NodeDiscovery # Discover nodes on network discovered = NodeDiscovery.discover_nodes(timeout=5.0) for node_id, info in discovered.items(): print(f"Found node: {node_id} at {info['address']}")

🔌 Extending the Platform

Creating Custom Inference Engines

from InferenceEngine.base_engine import BaseInferenceEngine class MyCustomEngine(BaseInferenceEngine): def _load_model(self): # Load your model pass def _preprocess(self, image): # Preprocess input return processed_image def _infer(self, preprocessed_input): # Run inference return raw_output def _postprocess(self, raw_output): # Process results return final_results

Creating Custom Result Destinations

from ResultPublisher.result_destinations import BaseResultDestination class MyCustomDestination(BaseResultDestination): def configure(self, **kwargs): # Configure your destination self.is_configured = True def _publish(self, data): # Publish data to your destination return True # Success

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

⚠️ Known Issues

Intel Geti SDK Compatibility

Issue: Geti SDK support is limited to Python 3.10-3.13 only
Impact: Users running Python 3.14+ cannot use Geti integration features
Workaround:
- Use Python 3.10-3.13 for Geti functionality
- Or install in a separate virtual environment with a compatible Python version
- Geti SDK is in optional dependencies and won't block installation on incompatible Python versions

Ultralytics on Intel Hardware

Issue: On first run with Ultralytics models on Intel hardware, nodes may report failure to start
Cause: Extra dependencies and model downloads required for OpenVINO conversion are not pre-installed
Impact: Initial startup may fail or take longer than expected
Workaround:
- Re-run the node after the initial failure - subsequent starts should work correctly
- The required dependencies will be downloaded automatically on first run

📝 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🆘 Support

For questions and support:

Create an issue on GitHub
Check the documentation
Review the example code

🗺️ Roadmap

Web-based management interface
Integration with FrameSource library
Docker containers and orchestration
Advanced load balancing
Model versioning and A/B testing
Enhanced pipeline builder UI
Additional inference engine integrations

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
InferenceEngine		InferenceEngine
InferenceNode		InferenceNode
ResultPublisher		ResultPublisher
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
__init__.py		__init__.py
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml
readme.md		readme.md
requirements-geti.txt		requirements-geti.txt
requirements.txt		requirements.txt
run.sh		run.sh
setup.bat		setup.bat
setup.sh		setup.sh

License

olkham/inference_node

Folders and files

Latest commit

History

Repository files navigation