GPU-accelerated LLaMA inference wrapper for legacy Vulkan-capable systems a Pythonic way to run AI with knowledge (Ilm) on fire (Vulkan).
- Updated
Oct 14, 2025 - Python
GPU-accelerated LLaMA inference wrapper for legacy Vulkan-capable systems a Pythonic way to run AI with knowledge (Ilm) on fire (Vulkan).
🚀 ClipServe: A fast API server for embedding text, images, and performing zero-shot classification using OpenAI’s CLIP model. Powered by FastAPI, Redis, and CUDA for lightning-fast, scalable AI applications. Transform texts and images into embeddings or classify images with custom labels—all through easy-to-use endpoints. 🌐📊
Docker based GPU inference of machine learning models
A tiny Python / ZMQ server to faciliate experimenting with DeepSeek-OCR
Add a description, image, and links to the gpu-inference topic page so that developers can more easily learn about it.
To associate your repository with the gpu-inference topic, visit your repo's landing page and select "manage topics."