The High-Throughput and Memory-Efficient inference and serving engine for LLMs

Easy, fast, and cost-efficient LLM serving for everyone.

Easy

Deploy the widest range of open-source models on any hardware. Includes a drop-in OpenAI-compatible API for instant integration.

Fast

Maximize throughput with PagedAttention. Advanced scheduling and continuous batching ensure peak GPU utilization.

Cost Efficient

Slash inference costs by maximizing hardware efficiency. We make high-performance LLMs affordable and accessible to everyone.

Quick Start

Select your preferences and run the install command. Stable represents the most currently tested and supported version of vLLM. Nightly is available if you want the latest builds.

📦 Requires Python 3.10+. Python 3.12+ recommended.

⚡ We recommend uv for faster and more reliable installation.

🔧 For other platforms, see docs.vllm.ai

🎉 See what's new in

Build

StableNightly

Platform

CUDAROCmXPUCPU

Package

Python (uv)PythonDocker

CUDA Version

CUDA 12.9CUDA 13.0

Run this Command:

uv pip install vllm

💡 Compatible with all CUDA 12.x versions (12.0 - 12.9)

Looking for older versions?

Universal Compatibility

One engine, endless possibilities. Run any model on any hardware.

Hardware

Unified API across platforms

AWS NeuronInferentia & Trainium

Google TPUCloud TPU

IBM SpyreAI Accelerator

IntelGaudi

View all supported hardware

Open Models

Latest trending open-source models, optimized & production-ready

DeepSeek

DeepSeek R1DeepSeek V3.2

Google

Gemma 3Gemma 3n

Got questions?
We're here to help.

Whether you're just getting started or debugging a complex deployment, our community is open to everyone. No question is too basic!

Fast & friendly responses

Active maintainers

Join Slack

Real-time help & discussions

Visit Forum

Searchable Q&A knowledge base

GitHub Issues

Bug reports & feature requests

Resources

Explore recipes, benchmarks, and roadmap

Recipes

Example notebooks and tutorials

recipes.vllm.ai

Performance

Benchmarks and comparisons

perf.vllm.ai

Roadmap

Project roadmap and milestones

roadmap.vllm.ai

The High-Throughput and Memory-Efficient inference and serving engine for LLMs

Easy

Fast

Cost Efficient

Quick Start

Sponsors

Cash Donations

Compute Resources

Universal Compatibility

Hardware

Open Models

Got questions?
We're here to help.

Resources

Recipes

Performance

Roadmap

The High-Throughput and Memory-Efficient inference and serving engine for LLMs

Easy

Fast

Cost Efficient

Quick Start

Sponsors

Cash Donations

Compute Resources

Universal Compatibility

Hardware

Open Models

Got questions? We're here to help.

Resources

Recipes

Performance

Roadmap

Got questions?
We're here to help.