Name	Name	Last commit message	Last commit date
Latest commit History 10,089 Commits
.Package.swift	.Package.swift
.ci	.ci
.githooks	.githooks
.github	.github
backends	backends
codegen	codegen
configurations	configurations
data/bin	data/bin
desktop	desktop
devtools	devtools
docs	docs
examples	examples
exir	exir
export	export
extension	extension
kernels	kernels
profiler	profiler
runtime	runtime
schema	schema
scripts	scripts
shim @ cf6a954	shim @ cf6a954
shim_et	shim_et
src	src
test	test
third-party	third-party
tools	tools
util	util
website	website
zephyr	zephyr
.buckconfig	.buckconfig
.clang-format	.clang-format
.clang-tidy	.clang-tidy
.cmake-format.yaml	.cmake-format.yaml
.cmakelintrc	.cmakelintrc
.flake8	.flake8
.gitignore	.gitignore
.gitmodules	.gitmodules
.lintrunner.toml	.lintrunner.toml
.mypy.ini	.mypy.ini
CMakeLists.txt	CMakeLists.txt
CMakePresets.json	CMakePresets.json
CODEOWNERS	CODEOWNERS
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
Makefile	Makefile
Package.swift	Package.swift
README-wheel.md	README-wheel.md
README.md	README.md
Test.cmake	Test.cmake
conftest.py	conftest.py
install_executorch.bat	install_executorch.bat
install_executorch.py	install_executorch.py
install_executorch.sh	install_executorch.sh
install_requirements.py	install_requirements.py
install_requirements.sh	install_requirements.sh
install_utils.py	install_utils.py
pyproject.toml	pyproject.toml
pytest-windows.ini	pytest-windows.ini
pytest.ini	pytest.ini
requirements-dev.txt	requirements-dev.txt
requirements-examples.txt	requirements-examples.txt
requirements-lintrunner.txt	requirements-lintrunner.txt
run_python_script.sh	run_python_script.sh
setup.py	setup.py
torch_pin.py	torch_pin.py
version.txt	version.txt

ExecuTorch

On-device AI inference powered by PyTorch

ExecuTorch is PyTorch's unified solution for deploying AI models on-device—from smartphones to microcontrollers—built for privacy, performance, and portability. It powers Meta's on-device AI across Instagram, WhatsApp, Quest 3, Ray-Ban Meta Smart Glasses, and more.

Deploy LLMs, vision, speech, and multimodal models with the same PyTorch APIs you already know—accelerating research to production with seamless model export, optimization, and deployment. No manual C++ rewrites. No format conversions. No vendor lock-in.

📘 Table of Contents

Why ExecuTorch?

🔒 Native PyTorch Export — Direct export from PyTorch. No .onnx, .tflite, or intermediate format conversions. Preserve model semantics.
⚡ Production-Proven — Powers billions of users at Meta with real-time on-device inference.
💾 Tiny Runtime — 50KB base footprint. Runs on microcontrollers to high-end smartphones.
🚀 12+ Hardware Backends — Open-source acceleration for Apple, Qualcomm, ARM, MediaTek, Vulkan, and more.
🎯 One Export, Multiple Backends — Switch hardware targets with a single line change. Deploy the same model everywhere.

How It Works

ExecuTorch uses ahead-of-time (AOT) compilation to prepare PyTorch models for edge deployment:

🧩 Export — Capture your PyTorch model graph with torch.export()
⚙️ Compile — Quantize, optimize, and partition to hardware backends → .pte
🚀 Execute — Load .pte on-device via lightweight C++ runtime

Models use a standardized Core ATen operator set. Partitioners delegate subgraphs to specialized hardware (NPU/GPU) with CPU fallback.

Learn more: How ExecuTorch Works • Architecture Guide

Quick Start

Installation

pip install executorch

For platform-specific setup (Android, iOS, embedded systems), see the Quick Start documentation for additional info.

Export and Deploy in 3 Steps

import torch from executorch.exir import to_edge_transform_and_lower from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner # 1. Export your PyTorch model model = MyModel().eval() example_inputs = (torch.randn(1, 3, 224, 224),) exported_program = torch.export.export(model, example_inputs) # 2. Optimize for target hardware (switch backends with one line) program = to_edge_transform_and_lower( exported_program, partitioner=[XnnpackPartitioner()] # CPU | CoreMLPartitioner() for iOS | QnnPartitioner() for Qualcomm ).to_executorch() # 3. Save for deployment with open("model.pte", "wb") as f: f.write(program.buffer) # Test locally via ExecuTorch runtime's pybind API (optional) from executorch.runtime import Runtime runtime = Runtime.get() method = runtime.load_program("model.pte").load_method("forward") outputs = method.execute([torch.randn(1, 3, 224, 224)])

Run on Device

C++

#include <executorch/extension/module/module.h> #include <executorch/extension/tensor/tensor.h> Module module("model.pte"); auto tensor = make_tensor_ptr({2, 2}, {1.0f, 2.0f, 3.0f, 4.0f}); auto outputs = module.forward(tensor);

Swift (iOS)

import ExecuTorch let module = Module(filePath: "model.pte") let input = Tensor<Float>([1.0, 2.0, 3.0, 4.0], shape: [2, 2]) let outputs = try module.forward(input)

Kotlin (Android)

val module = Module.load("model.pte") val inputTensor = Tensor.fromBlob(floatArrayOf(1.0f, 2.0f, 3.0f, 4.0f), longArrayOf(2, 2)) val outputs = module.forward(EValue.from(inputTensor))

LLM Example: Llama

Export Llama models using the export_llm script or Optimum-ExecuTorch:

# Using export_llm python -m executorch.extension.llm.export.export_llm --model llama3_2 --output llama.pte # Using Optimum-ExecuTorch optimum-cli export executorch \ --model meta-llama/Llama-3.2-1B \ --task text-generation \ --recipe xnnpack \ --output_dir llama_model

Run on-device with the LLM runner API:

C++

#include <executorch/extension/llm/runner/text_llm_runner.h> auto runner = create_llama_runner("llama.pte", "tiktoken.bin"); executorch::extension::llm::GenerationConfig config{ .seq_len = 128, .temperature = 0.8f}; runner->generate("Hello, how are you?", config);

Swift (iOS)

import ExecuTorchLLM let runner = TextRunner(modelPath: "llama.pte", tokenizerPath: "tiktoken.bin") try runner.generate("Hello, how are you?", Config { $0.sequenceLength = 128 }) { token in print(token, terminator: "") }

Kotlin (Android) — API Docs • Demo App

val llmModule = LlmModule("llama.pte", "tiktoken.bin", 0.8f) llmModule.load() llmModule.generate("Hello, how are you?", 128, object : LlmCallback { override fun onResult(result: String) { print(result) } override fun onStats(stats: String) { } })

For multimodal models (vision, audio), use the MultiModal runner API which extends the LLM runner to handle image and audio inputs alongside text. See Llava and Voxtral examples.

See examples/models/llama for complete workflow including quantization, mobile deployment, and advanced options.

Next Steps:

📖 Step-by-step tutorial — Complete walkthrough for your first model
⚡ Colab notebook — Try ExecuTorch instantly in your browser
🤖 Deploy Llama models — LLM workflow with quantization and mobile demos

Platform & Hardware Support

Platform	Supported Backends
Android	XNNPACK, Vulkan, Qualcomm, MediaTek, Samsung Exynos
iOS	XNNPACK, MPS, CoreML (Neural Engine)
Linux / Windows	XNNPACK, OpenVINO, CUDA (experimental)
macOS	XNNPACK, MPS, Metal (experimental)
Embedded / MCU	XNNPACK, ARM Ethos-U, NXP, Cadence DSP

See Backend Documentation for detailed hardware requirements and optimization guides.

Production Deployments

ExecuTorch powers on-device AI at scale across Meta's family of apps, VR/AR devices, and partner deployments. View success stories →

Examples & Models

LLMs: Llama 3.2/3.1/3, Qwen 3, Phi-4-mini, LiquidAI LFM2

Multimodal: Llava (vision-language), Voxtral (audio-language), Gemma (vision-language)

Vision/Speech: MobileNetV2, DeepLabV3, Whisper

Resources: examples/ directory • executorch-examples out-of-tree demos • Optimum-ExecuTorch for HuggingFace models

Key Features

ExecuTorch provides advanced capabilities for production deployment:

Quantization — Built-in support via torchao for 8-bit, 4-bit, and dynamic quantization
Memory Planning — Optimize memory usage with ahead-of-time allocation strategies
Developer Tools — ETDump profiler, ETRecord inspector, and model debugger
Selective Build — Strip unused operators to minimize binary size
Custom Operators — Extend with domain-specific kernels
Dynamic Shapes — Support variable input sizes with bounded ranges

See Advanced Topics for quantization techniques, custom backends, and compiler passes.

Documentation

Documentation Home — Complete guides and tutorials
API Reference — Python, C++, Java/Kotlin APIs
Backend Integration — Build custom hardware backends
Troubleshooting — Common issues and solutions

Community & Contributing

We welcome contributions from the community!

💬 GitHub Discussions — Ask questions and share ideas
🎮 Discord — Chat with the team and community
🐛 Issues — Report bugs or request features
🤝 Contributing Guide — Guidelines and codebase structure

License

ExecuTorch is BSD licensed, as found in the LICENSE file.

Part of the PyTorch ecosystem

GitHub • Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ExecuTorch

Why ExecuTorch?

How It Works

Quick Start

Installation

Export and Deploy in 3 Steps

Run on Device

LLM Example: Llama

Platform & Hardware Support

Production Deployments

Examples & Models

Key Features

Documentation

Community & Contributing

License

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors 360

Languages

License

pytorch/executorch

Folders and files

Latest commit

History

Repository files navigation

ExecuTorch

Why ExecuTorch?

How It Works

Quick Start

Installation

Export and Deploy in 3 Steps

Run on Device

LLM Example: Llama

Platform & Hardware Support

Production Deployments

Examples & Models

Key Features

Documentation

Community & Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors 360

Languages

Packages