Name	Name	Last commit message	Last commit date
Latest commit History 12 Commits
scripts	scripts
src/embeddedllm	src/embeddedllm
.env.example	.env.example
.flake8	.flake8
.gitignore	.gitignore
.prettierignore	.prettierignore
.prettierrc	.prettierrc
README.md	README.md
pyproject.toml	pyproject.toml
requirements-build.txt	requirements-build.txt
requirements-common.txt	requirements-common.txt
requirements-cpu.txt	requirements-cpu.txt
requirements-cuda.txt	requirements-cuda.txt
requirements-directml.txt	requirements-directml.txt
requirements-lint.txt	requirements-lint.txt
requirements-webui.txt	requirements-webui.txt
setup.py	setup.py

EmbeddedLLM

Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)) Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS

Support matrix	Supported now	Under Development	On the roadmap
Model architectures	Gemma Llama * Mistral + Phi
Platform	Linux Windows
Architecture	x86 x64	Arm64
Hardware Acceleration	CUDA DirectML	QNN ROCm	OpenVINO

* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.

+ The Mistral model architecture supports similar model families such as Zephyr.

🚀 Latest News

[2024/06] Support Phi-3 (mini, small, medium), Phi-3-Vision-Mini, Llama-2, Llama-3, Gemma (v1), Mistral v0.3, Starling-LM, Yi-1.5.
[2024/06] Support vision/chat inference on iGPU, APU, CPU and CUDA.

Supported Models (Quick Start)

Models	Parameters	Context Length	Link
Gemma-2b-Instruct v1	2B	8192	EmbeddedLLM/gemma-2b-it-onnx
Llama-2-7b-chat	7B	4096	EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml
Llama-2-13b-chat	13B	4096	EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml
Llama-3-8b-chat	8B	8192	EmbeddedLLM/mistral-7b-instruct-v0.3-onnx
Mistral-7b-v0.3-instruct	7B	32768	EmbeddedLLM/mistral-7b-instruct-v0.3-onnx
Phi3-mini-4k-instruct	3.8B	4096	microsoft/Phi-3-mini-4k-instruct-onnx
Phi3-mini-128k-instruct	3.8B	128k	microsoft/Phi-3-mini-128k-instruct-onnx
Phi3-medium-4k-instruct	17B	4096	microsoft/Phi-3-medium-4k-instruct-onnx-directml
Phi3-medium-128k-instruct	17B	128k	microsoft/Phi-3-medium-128k-instruct-onnx-directml

Getting Started

Installation

From Source

Windows

Install embeddedllm package. $env:ELLM_TARGET_DEVICE='directml'; pip install -e .. Note: currently support cpu, directml and cuda.
- DirectML: $env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml]
- CPU: $env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]
- CUDA: $env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]
- With Web UI:
  - DirectML: $env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml, webui]
  - CPU: $env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu, webui]
  - CUDA: $env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda, webui]

Linux

Install embeddedllm package. ELLM_TARGET_DEVICE='directml' pip install -e .. Note: currently support cpu, directml and cuda.
- DirectML: ELLM_TARGET_DEVICE='directml' pip install -e .[directml]
- CPU: ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]
- CUDA: ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]
- With Web UI:
  - DirectML: ELLM_TARGET_DEVICE='directml' pip install -e .[directml, webui]
  - CPU: ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu, webui]
  - CUDA: ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda, webui]

Launch OpenAI API Compatible Server

usage: ellm_server.exe [-h] [--port int] [--host str] [--response_role str] [--uvicorn_log_level str] [--served_model_name str] [--model_path str] [--vision bool] options: -h, --help show this help message and exit --port int Server port. (default: 6979) --host str Server host. (default: 0.0.0.0) --response_role str Server response role. (default: assistant) --uvicorn_log_level str Uvicorn logging level. `debug`, `info`, `trace`, `warning`, `critical` (default: info) --served_model_name str Model name. (default: phi3-mini-int4) --model_path str Path to model weights. (required) --vision bool Enable vision capability, only if model supports vision input. (default: False)

ellm_server --model_path <path/to/model/weight>.
Example code to connect to the api server can be found in scripts/python.

Launch Chatbot Web UI

ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost.

Acknowledgements

Excellent open-source projects: vLLM, onnxruntime-genai and many others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EmbeddedLLM

🚀 Latest News

Supported Models (Quick Start)

Getting Started

Installation

From Source

Launch OpenAI API Compatible Server

Launch Chatbot Web UI

Acknowledgements

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

EmbeddedLLM/embeddedllm

Folders and files

Latest commit

History

Repository files navigation

EmbeddedLLM

🚀 Latest News

Supported Models (Quick Start)

Getting Started

Installation

From Source

Launch OpenAI API Compatible Server

Launch Chatbot Web UI

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages