CoreMLPipelines

Run Large Language Models on Apple Silicon with Core ML

CoreMLPipelines is an experimental Swift library for running pretrained Core ML models to perform different AI tasks. It provides high-performance inference on Apple Silicon devices with minimal memory usage.

Features

🚀 High Performance: Optimized Core ML inference on Apple Silicon
💾 Memory Efficient: 4-bit and 8-bit quantization support
🔄 Streaming: Real-time text generation with async streams
🛠️ CLI Tools: Command-line interface for text generation and chat
🔧 Model Conversion: Python tools to convert Hugging Face models to Core ML
📱 Cross-Platform: Supports iOS 18+ and macOS 15+

Supported Pipelines

Text Generation: Generate text using causal language models
Chat: Interactive conversational AI

Installation

Swift Package

Add it to your Xcode project via File > Add Package Dependencies.

Or clone and build locally:

git clone https://github.com/pywind/CoreMLPipelines.git cd CoreMLPipelines swift build -c release cp .build/release/coremlpipelines-cli /usr/local/bin/

Python Tools

The Python conversion tools require Python 3.11+ and can be installed using uv:

cd coremlpipelinestools uv sync

If you want to upload model to your HuggingFace Hub, please create .env file and put:

HF_TOKEN=hf_your_token

Please read this coreml README for more details.

Quick Start

Basic Text Generation

import CoreMLPipelines // Create a text generation pipeline let pipeline = try await TextGenerationPipeline(model: .llama_3_2_1B_Instruct_4bit) // Generate text with streaming let stream = pipeline( messages: [[ "role": "user", "content": "Write a haiku about programming" ]] ) for try await text in stream { print(text, terminator: "") }

Advanced Usage

import CoreMLPipelines let pipeline = try await TextGenerationPipeline(model: .qwen2_5_0_5B_Instruct_4bit) // Configure generation parameters let config = GenerationConfig( maxNewTokens: 100, temperature: 0.7, topP: 0.9, repetitionPenalty: 1.1 ) let stream = pipeline( messages: [["role": "user", "content": "Explain quantum computing simply"]], generationConfig: config ) var fullResponse = "" for try await text in stream { fullResponse += text print(text, terminator: "") }

Custom Model

import CoreMLPipelines // Use any Hugging Face model (must be converted to Core ML first) let pipeline = try await TextGenerationPipeline( model: "your-username/your-coreml-model" )

Supported Models

CoreMLPipelines supports various quantized language models optimized for Apple Silicon:

Llama Models

llama_3_2_1B_Instruct_4bit - Meta's Llama 3.2 1B parameter model (4-bit quantized)

Qwen Models

qwen2_5_0_5B_Instruct_4bit - Alibaba's Qwen2.5 0.5B model (4-bit quantized)
qwen2_5_Coder_0_5B_Instruct_4bit - Qwen2.5 Coder 0.5B for code generation (4-bit quantized)

SmolLM Models

smolLM2_135M_Instruct_4bit - SmolLM2 135M model (4-bit quantized)
smolLM2_135M_Instruct_8bit - SmolLM2 135M model (8-bit quantized)

EXAONE Models

lgai_exaone_4_0_1_2B_4bit - LG AI's EXAONE 4.0 1.2B model (4-bit quantized)

Note: Models/tokenizers and chat_template are automatically downloaded from Hugging Face on first use. Ensure you have a stable internet connection.

Architecture

Core Components

CoreMLPipelines/ ├── Models/ # Model definitions and configurations ├── Pipelines/ # Pipeline implementations │ ├── TextGenerationPipeline.swift │ └── TextGenerationPipeline+Models.swift ├── Samplers/ # Token sampling strategies │ ├── GreedySampler.swift │ └── Sampler.swift └── Extensions/ # Core ML tensor utilities

Key Features

Unified API: Consistent interface across different model architectures
Memory Management: Efficient memory usage with Core ML's MLModel
Async/Await: Modern Swift concurrency support
Streaming: Real-time token generation with AsyncSequence
Type Safety: Strong typing with Swift's type system

CLI Usage

The command-line interface provides convenient tools for testing and development.

Generate Text

coremlpipelines-cli generate-text --model finnvoorhees/coreml-Llama-3.2-1B-Instruct-4bit "Hello, world!" --max-new-tokens 50

Options:

--model <model>: Hugging Face model repository ID
--max-new-tokens <int>: Maximum number of tokens to generate (default: 100)
<prompt>: Text prompt (default: "Hello")

Interactive Chat

coremlpipelines-cli chat --model finnvoorhees/coreml-Llama-3.2-1B-Instruct-4bit

Start an interactive chat session with the specified model.

Profiling

coremlpipelines-cli profile --model finnvoorhees/coreml-Llama-3.2-1B-Instruct-4bit

Profile model performance and memory usage.

Model Conversion

Convert Hugging Face models to Core ML format using the Python tools:

cd coremlpipelinestools uv run convert_causal_llm.py --model microsoft/DialoGPT-medium --quantize --half --compile --context-size 512 --batch-size 1

Key Options:

--model: Hugging Face model ID
--quantize: Apply 4-bit linear quantization
--half: Load model in float16 precision
--compile: Save as optimized .mlmodelc format
--context-size: Maximum context length
--batch-size: Batch size for inference
--upload: Upload converted model to Hugging Face

Example Conversions

# Convert Llama model with 4-bit quantization uv run convert_causal_llm.py \ --model meta-llama/Llama-3.2-1B-Instruct \ --quantize \ --half \ --compile \ --context-size 2048 \ --batch-size 1 \ --upload # Convert SmolLM model with 8-bit quantization uv run convert_causal_llm.py \ --model HuggingFaceTB/SmolLM2-135M-Instruct \ --half \ --compile \ --context-size 1024 \ --batch-size 1

Requirements

System Requirements

macOS: 15.0 or later
iOS: 18.0 or later
Xcode: 16.0 or later
Swift: 6.0 or later

Dependencies

Python Tools Requirements

Python 3.11+
uv package manager
Core ML Tools
Transformers library

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

Clone the repository:

git clone https://github.com/pywind/CoreMLPipelines.git cd CoreMLPipelines

Open in Xcode:
```
open Package.swift
```
Run tests:
```
swift test
```

Build CLI tool:

swift build -c release --product coremlpipelines-cli

Code Style

This project follows Swift's official style guidelines. Use swiftformat to format code:

swiftformat .

License

This project is licensed under the CC0 1.0 Universal License - see the LICENSE file for details.

This project come from this repo: finnvoor

Become a sponsor to https://github.com/finnvoor

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
.swiftpm/xcode/xcshareddata/xcschemes		.swiftpm/xcode/xcshareddata/xcschemes
Sources		Sources
Tests/CoreMLPipelinesTests		Tests/CoreMLPipelinesTests
coremlpipelinestools		coremlpipelinestools
.gitignore		.gitignore
.swiftformat		.swiftformat
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Uh oh!

License

Uh oh!

pywind/CoreMLPipelines

Folders and files

Latest commit

History

Repository files navigation

CoreMLPipelines

Table of Contents

Features

Supported Pipelines

Installation

Swift Package

Python Tools

Quick Start

Basic Text Generation

Advanced Usage

Custom Model

Supported Models

Llama Models

Qwen Models

SmolLM Models

EXAONE Models

Architecture

Core Components

Key Features

CLI Usage

Generate Text

Interactive Chat

Profiling

Model Conversion

Example Conversions

Requirements

System Requirements

Dependencies

Python Tools Requirements

Contributing

Development Setup

Code Style

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages