Run local LLMs on iGPU and APU (AMD , Intel, and Qualcomm (Coming Soon))
| Support matrix | Supported now | Under Development | On the roadmap |
|---|---|---|---|
| Model architectures | Gemma Llama * Mistral + Phi | ||
| Platform | Linux Windows | ||
| Architecture | x86 x64 | Arm64 | |
| Hardware Acceleration | CUDA DirectML | QNN ROCm | OpenVINO |
* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.
+ The Mistral model architecture supports similar model families such as Zephyr.
- [2024/06] Support chat inference on iGPU and CPU.
| Models | Parameters | Context Length | Link |
|---|---|---|---|
| Gemma-2b-Instruct v1 | 2B | 8192 | EmbeddedLLM/gemma-2b-it-onnx |
| Llama-2-7b-chat | 7B | 4096 | EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml |
| Llama-2-13b-chat | 13B | 4096 | EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml |
| Llama-3-8b-chat | 8B | 8192 | EmbeddedLLM/mistral-7b-instruct-v0.3-onnx |
| Mistral-7b-v0.3-instruct | 7B | 32768 | EmbeddedLLM/mistral-7b-instruct-v0.3-onnx |
| Phi3-mini-4k-instruct | 3.8B | 4096 | microsoft/Phi-3-mini-4k-instruct-onnx |
| Phi3-mini-128k-instruct | 3.8B | 128k | microsoft/Phi-3-mini-128k-instruct-onnx |
| Phi3-medium-4k-instruct | 17B | 4096 | microsoft/Phi-3-medium-4k-instruct-onnx-directml |
| Phi3-medium-128k-instruct | 17B | 128k | microsoft/Phi-3-medium-128k-instruct-onnx-directml |
-
Excellent open-source projects: vLLM, onnxruntime-genai and many others.
-
Thanks to all the contributors.