DEV Community

Cover image for Docker Model Runner Cheatsheet 2025
Ajeet Singh Raina
Ajeet Singh Raina

Posted on • Edited on

Docker Model Runner Cheatsheet 2025

πŸ“‘ Table of Contents

  1. What is Docker Model Runner?
  2. πŸš€ Quick Setup Guide
  3. πŸ“‹ Essential Commands
  4. πŸ”— API Integration
  5. 🐳 Docker Compose Integration

What is Docker Model Runner?

Docker Model Runner is a new feature integrated into Docker Desktop that enables developers to run AI models locally with zero setup complexity. Built into Docker Desktop 4.40+, it brings LLM (Large Language Model) inference directly into your GenAI development workflow.

Key Benefits

  • βœ… No extra infrastructure - Runs natively on your machine
  • βœ… OpenAI-compatible API - Drop-in replacement for OpenAI calls
  • βœ… GPU acceleration - Optimized for Apple Silicon and NVIDIA GPUs
  • βœ… OCI artifacts - Models distributed as OCI artifacts
  • βœ… Host-based execution - Maximum performance, no VM overhead

πŸš€ Quick Setup Guide

Prerequisites

  • Docker Desktop 4.40+ (4.41+ for Windows GPU support)
  • macOS: Apple Silicon (M1/M2/M3) for optimal performance
  • Windows: NVIDIA GPU (for GPU acceleration)
  • Linux: Docker Engine with Model Runner

Enable Docker Model Runner

Docker Desktop (GUI)

  1. Open Docker Desktop Settings
  2. Navigate to Features in development β†’ Beta
  3. Enable "Docker Model Runner"
  4. Apply & Restart

Docker Desktop (CLI)

# Enable Model Runner docker desktop enable model-runner # Enable with TCP support (for host access) docker desktop enable model-runner --tcp 12434 # Check status docker desktop status 
Enter fullscreen mode Exit fullscreen mode

Docker Engine (Linux)

sudo apt-get update sudo apt-get install docker-model-plugin 
Enter fullscreen mode Exit fullscreen mode

πŸ“‹ Essential Commands

Model Management

Pull Models

# Pull latest version docker model pull ai/smollm2 
Enter fullscreen mode Exit fullscreen mode

List Models

# List all local models docker model ls 
Enter fullscreen mode Exit fullscreen mode

Remove Models

# Remove specific model docker model rm ai/smollm2 
Enter fullscreen mode Exit fullscreen mode

Running Models

Interactive Mode

# Quick inference docker model run ai/smollm2 "Explain Docker in one sentence" 
Enter fullscreen mode Exit fullscreen mode

Model Information

# Inspect model details docker model inspect ai/smollm2 
Enter fullscreen mode Exit fullscreen mode

πŸ”— API Integration

OpenAI-Compatible Endpoints

From Containers

# Base URL for container access http://model-runner.docker.internal/engines/llama.cpp/v1/ 
Enter fullscreen mode Exit fullscreen mode

From Host (with TCP enabled)

# Base URL for host access http://localhost:12434/engines/llama.cpp/v1/ 
Enter fullscreen mode Exit fullscreen mode

Chat Completions API

cURL Example

curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2", "messages": [ { "role": "system", "content": "You are a helpful coding assistant." }, { "role": "user", "content": "Write a Docker Compose file for a web app" } ], "temperature": 0.7, "max_tokens": 500 }' 
Enter fullscreen mode Exit fullscreen mode

Python Example

import openai # Configure client for local Model Runner client = openai.OpenAI( base_url="http://model-runner.docker.internal/engines/llama.cpp/v1", api_key="not-needed" # Local inference doesn't need API key ) # Chat completion response = client.chat.completions.create( model="ai/smollm2", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain containerization benefits"} ], temperature=0.7, max_tokens=200 ) print(response.choices[0].message.content) 
Enter fullscreen mode Exit fullscreen mode

Node.js Example

import OpenAI from 'openai'; const openai = new OpenAI({ baseURL: 'http://model-runner.docker.internal/engines/llama.cpp/v1', apiKey: 'not-needed' }); async function chatWithModel() { const completion = await openai.chat.completions.create({ model: 'ai/smollm2', messages: [ { role: 'system', content: 'You are a DevOps expert.' }, { role: 'user', content: 'Best practices for Docker in production?' } ], temperature: 0.8, max_tokens: 300 }); console.log(completion.choices[0].message.content); } 
Enter fullscreen mode Exit fullscreen mode

🐳 Docker Compose Integration

services: chat: image: my-chat-app depends_on: - ai_runner ai_runner: provider: type: model options: model: ai/smollm2 
Enter fullscreen mode Exit fullscreen mode

🐳 Docker Model Management Endpoints

POST /models/create GET /models GET /models/{namespace}/{name} DELETE /models/{namespace}/{name} 
Enter fullscreen mode Exit fullscreen mode

OpenAI Endpoints:

GET /engines/llama.cpp/v1/models GET /engines/llama.cpp/v1/models/{namespace}/{name} POST /engines/llama.cpp/v1/chat/completions POST /engines/llama.cpp/v1/completions POST /engines/llama.cpp/v1/embeddings 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)