Self-host Resemble AI's Chatterbox open-source TTS family (Original + ChatterboxβTurbo) behind an OpenAIβcompatible API and a modern Web UI. ChatterboxβTurbo is a streamlined 350M-parameter model with dramatically improved throughput and native paralinguistic tags like [laugh], [cough], and [chuckle] for more expressive voice agents and narration. Features voice cloning, large text processing via intelligent chunking, audiobook generation, and consistent, reproducible voices using built-in ready-to-use voices and a generation seed feature.
π Try it now! Test the full TTS server with voice cloning and audiobook generation in Google Colab - no installation required! To use it, please run cells 1 through 4 one at a time. After running cell 4, click on the "https://localhost:8004" link that appears in the output, and your web browser will open the UI from the .colab.dev domain. Read the instructions here.
This server is based on the architecture and UI of our Dia-TTS-Server project but uses the distinct chatterbox-tts engine. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS) GPUs, with a fallback to CPU. Make sure you also check our Kitten-TTS-Server project.
- Added full support for ChatterboxβTurbo, Resemble AI's latest efficiency-focused Chatterbox model.
- Turbo is built on a streamlined 350Mβparameter architecture, designed to use less compute/VRAM while keeping high-fidelity output.
- Turbo distills the speech-token-to-mel "audio diffusion decoder" from 10 steps β 1 step, removing a major inference bottleneck.
- Resemble positions Turbo for real-time/agent workflows and highlights significantly faster-than-real-time performance on GPU (performance varies by hardware/settings).
- Added a new engine selector dropdown at the top of the Web UI.
- Instantly hot-swap between Original Chatterbox and ChatterboxβTurbo; the backend auto-loads the selected engine.
- All UI + API requests route through the active engine so you can A/B test quality vs latency without changing client code.
- Turbo adds native paralinguistic tags you can write directly into your text, e.g.
β¦calling you back [chuckle]β¦. - Supported tags include
[laugh],[cough], and[chuckle], plus text-based prompting for reactions like sigh, gasp, and cough. - Added new presets in
ui/presets.yamldemonstrating paralinguistic prompting for agent-style scripts and expressive reads.
- The original Chatterbox model remains available, with support for high quality English language output, a 0.5B LLaMA backbone, emotion exaggeration control, and training on 0.5M hours of cleaned data.
- Updated to support NVIDIA CUDA 12.8 and RTX 5090 / Blackwell generation GPUs.
- New Automated Launcher (Windows + Linux) that creates/activates a venv, installs the right dependencies, downloads model files, starts the server, and opens the Web UI.
- Easy maintenance commands:
--upgradeto update code + dependencies.--reinstallfor a clean reinstall when environments get messy.
The Chatterbox TTS model by Resemble AI provides capabilities for generating high-quality speech. This project builds upon that foundation by providing a robust FastAPI server that makes Chatterbox significantly easier to use and integrate.
π Want to try it instantly? Launch the live demo in Google Colab - no installation needed!
The server expects plain text input for synthesis and we solve the complexity of setting up and running the model by offering:
- A modern Web UI for easy experimentation, preset loading, reference audio management, and generation parameter tuning.
- Multi-engine support (Original + Turbo): Choose the TTS engine directly in the Web UI, then generate via the same UI/API surface.
- Paralinguistic prompting (Turbo): Native tags like
[laugh],[cough], and[chuckle]for natural non-speech reactions inside the same generated voice. - Original Chatterbox strengths: High quality English output plus unique "emotion exaggeration control" and 0.5B LLaMA backbone.
- Multi-Platform Acceleration: Full support for NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS) GPUs, with an automatic fallback to CPU, ensuring you can run on any hardware.
- Large Text Handling: Intelligently splits long plain text inputs into manageable chunks based on sentence structure, processes them sequentially, and seamlessly concatenates the audio.
- π Audiobook Generation: Perfect for creating complete audiobooks - simply paste an entire book's text and the server automatically processes it into a single, seamless audio file with consistent voice quality throughout.
- Predefined Voices: Select from curated, ready-to-use synthetic voices for consistent and reliable output without cloning setup.
- Voice Cloning: Generate speech using a voice similar to an uploaded reference audio file.
- Consistent Generation: Achieve consistent voice output across multiple generations or text chunks by using the "Predefined Voices" or "Voice Cloning" modes, optionally combined with a fixed integer Seed.
- Docker support for easy, reproducible containerized deployment on any platform.
This server is your gateway to leveraging Chatterbox's TTS capabilities seamlessly, with enhanced stability, voice consistency, and large text support for plain text inputs.
π₯ Live Demo Available:
- π One-Click Google Colab Demo: Try the full server with voice cloning and audiobook generation instantly in your browser - no local installation required!
This server application enhances the underlying chatterbox-tts engine with the following:
π Core Functionality:
- Multi-Engine Support:
- Choose between Original Chatterbox and ChatterboxβTurbo via a hot-swappable engine selector in the Web UI.
- Turbo offers significantly faster inference with a streamlined 350M-parameter architecture.
- Original Chatterbox provides multilingual support (23 languages) and emotion exaggeration control.
- Paralinguistic Tags (Turbo):
- Write native tags like
[laugh],[cough], and[chuckle]directly in your text when using ChatterboxβTurbo. - New presets demonstrate paralinguistic prompting for agent-style scripts and expressive narration.
- Write native tags like
- Large Text Processing (Chunking):
- Automatically handles long plain text inputs by intelligently splitting them into smaller chunks based on sentence boundaries.
- Processes each chunk individually and seamlessly concatenates the resulting audio, overcoming potential generation limits of the TTS engine.
- Ideal for audiobook generation - paste entire books and get professional-quality audiobooks with consistent narration.
- Configurable via UI toggle ("Split text into chunks") and chunk size slider.
- Predefined Voices:
- Allows usage of curated, ready-to-use synthetic voices stored in the
./voicesdirectory. - Selectable via UI dropdown ("Predefined Voices" mode).
- Provides reliable voice output without manual cloning setup.
- Allows usage of curated, ready-to-use synthetic voices stored in the
- Voice Cloning:
- Supports voice cloning using a reference audio file (
.wavor.mp3). - The server processes the reference audio for the engine.
- Supports voice cloning using a reference audio file (
- Generation Seed: Added
seedparameter to UI and API for influencing generation results. Using a fixed integer seed in combination with Predefined Voices or Voice Cloning helps maintain consistency. - API Endpoint (
/tts):- The primary API endpoint, offering fine-grained control over TTS generation.
- Supports parameters for text, voice mode (predefined/clone), reference/predefined voice selection, chunking control (
split_text,chunk_size), generation settings (temperature, exaggeration, CFG weight, seed, speed factor, language), and output format.
- UI Configuration Management: Added UI section to view/edit
config.yamlsettings (server, model, paths) and save generation defaults. - Configuration System: Uses
config.yamlfor all runtime configuration, managed viaconfig.py(YamlConfigManager). Ifconfig.yamlis missing, it's created with default values fromconfig.py. - Audio Post-Processing (Optional): Includes utilities for silence trimming, internal silence reduction, and (if
parselmouthis installed) unvoiced segment removal to improve audio quality. These are configurable. - UI State Persistence: Web UI now saves/restores text input, voice mode selection, file selections, and generation parameters (seed, chunking, sliders) in
config.yaml(ui_statesection).
π§ General Enhancements:
- Easy Installation & Management:
- π Automated Launcher (
start.bat/start.sh) - One-command setup with automatic hardware detection - π§ Multiple GPU Support - NVIDIA CUDA 12.1, NVIDIA CUDA 12.8 (Blackwell), AMD ROCm, Apple MPS
- π Easy Updates - Simple
--upgradeand--reinstallcommands - π¦ Isolated Environment - Automatic virtual environment management
- π― Skip Menu Options - Direct installation with
--cpu,--nvidia,--nvidia-cu128,--rocmflags
- π Automated Launcher (
- Performance: Optimized for speed and efficient VRAM usage on GPU.
- Web Interface: Modern, responsive UI for plain text input, parameter adjustment, preset loading, reference/predefined audio management, and audio playback.
- Model Loading: Uses
ChatterboxTTS.from_pretrained()for robust model loading from Hugging Face Hub, utilizing the standard HF cache. - Dependency Management: Clear
requirements.txt. - Utilities: Comprehensive
utils.pyfor audio processing, text handling, and file management.
- Core Chatterbox Capabilities (via Resemble AI Chatterbox):
- π£οΈ High-quality single-speaker voice synthesis from plain text.
- π€ Perform voice cloning using reference audio prompts.
- β‘ ChatterboxβTurbo for significantly faster inference with paralinguistic tag support.
- π Original Chatterbox with high quality English output and emotion exaggeration control.
- Enhanced Server & API:
- β‘ Built with the high-performance FastAPI framework.
- βοΈ Custom API Endpoint (
/tts) as the primary method for programmatic generation, exposing all key parameters. - π Interactive API documentation via Swagger UI (
/docs). - π©Ί Health check endpoint (
/api/ui/initial-dataalso serves as a comprehensive status check).
- Advanced Generation Features:
- π Hot-Swappable Engines: Switch between Original Chatterbox and ChatterboxβTurbo directly in the Web UI.
- π Paralinguistic Tags (Turbo): Native support for
[laugh],[cough],[chuckle]and other expressive tags. - π Large Text Handling: Intelligently splits long plain text inputs into chunks based on sentences, generates audio for each, and concatenates the results seamlessly. Configurable via
split_textandchunk_size. - π Audiobook Creation: Perfect for generating complete audiobooks from full-length texts with consistent voice quality and automatic chapter handling.
- π€ Predefined Voices: Select from curated synthetic voices in the
./voicesdirectory. - β¨ Voice Cloning: Simple voice cloning using an uploaded reference audio file.
- π± Consistent Generation: Use Predefined Voices or Voice Cloning modes, optionally with a fixed integer Seed, for consistent voice output.
- π Audio Post-Processing: Optional automatic steps to trim silence, fix internal pauses, and remove long unvoiced segments/artifacts (configurable via
config.yaml).
- Intuitive Web User Interface:
- π±οΈ Modern, easy-to-use interface.
- π Engine Selector: Hot-swap between Original Chatterbox and ChatterboxβTurbo.
- π‘ Presets: Load example text and settings dynamically from
ui/presets.yaml. - π€ Reference/Predefined Audio Upload: Easily upload
.wav/.mp3files. - π£οΈ Voice Mode Selection: Choose between Predefined Voices or Voice Cloning.
- ποΈ Parameter Control: Adjust generation settings (Temperature, Exaggeration, CFG Weight, Speed Factor, Seed, etc.) via sliders and inputs.
- πΎ Configuration Management: View and save server settings (
config.yaml) and default generation parameters directly in the UI. - πΎ Session Persistence: Remembers your last used settings via
config.yaml. - βοΈ Chunking Controls: Enable/disable text splitting and adjust approximate chunk size.
β οΈ Warning Modals: Optional warnings for chunking voice consistency and general generation quality.- π Light/Dark Mode: Toggle between themes with preference saved locally.
- π Audio Player: Integrated waveform player (WaveSurfer.js) for generated audio with download option.
- β³ Loading Indicator: Shows status during generation.
- Flexible & Efficient Model Handling:
- βοΈ Downloads models automatically from Hugging Face Hub using
ChatterboxTTS.from_pretrained(). - π Easily specify model repository via
config.yaml. - π Optional
download_model.pyscript available to pre-download specific model components to a local directory (this is separate from the main HF cache used at runtime).
- βοΈ Downloads models automatically from Hugging Face Hub using
- Performance & Configuration:
- π» GPU Acceleration: Automatically uses NVIDIA CUDA, Apple MPS, or AMD ROCm if available, falls back to CPU.
- βοΈ All configuration via
config.yaml. - π¦ Uses standard Python virtual environments.
- Docker Support:
- π³ Containerized deployment via Docker and Docker Compose.
- π NVIDIA GPU acceleration with Container Toolkit integration.
- πΎ Persistent volumes for models (HF cache), custom voices, outputs, logs, and config.
- π One-command setup and deployment (
docker compose up -d).
- Operating System: Windows 10/11 (64-bit) or Linux (Debian/Ubuntu recommended).
- Python: Version 3.10 or later (Download).
- Git: For cloning the repository (Download).
- Internet: For downloading dependencies and models from Hugging Face Hub.
- Disk Space: 10GB+ recommended (for dependencies and model cache).
- (Optional but HIGHLY Recommended for Performance):
- NVIDIA GPU (CUDA 12.1): CUDA-compatible (Maxwell architecture or newer, RTX 20/30/40 series). Check NVIDIA CUDA GPUs.
- NVIDIA GPU (CUDA 12.8): RTX 5090 or other Blackwell-based GPUs, driver version 570+.
- NVIDIA Drivers: Latest version for your GPU/OS (Download).
- AMD GPU: ROCm-compatible (e.g., RX 6000/7000 series). Check AMD ROCm GPUs.
- AMD Drivers: Latest ROCm-compatible drivers for your GPU/OS (Linux only).
- Apple Silicon: M1, M2, M3, M4, or newer Apple Silicon chips with macOS 12.3+ for MPS acceleration.
- (Linux Only):
libsndfile1: Audio library needed bysoundfile. Install via package manager (e.g.,sudo apt install libsndfile1).ffmpeg: For robust audio operations (optional but recommended). Install via package manager (e.g.,sudo apt install ffmpeg).
| Hardware | Installation Option | Requirements File | Driver Requirement |
|---|---|---|---|
| CPU Only | --cpu | requirements.txt | None |
| NVIDIA RTX 20/30/40 | --nvidia | requirements-nvidia.txt | 525+ |
| NVIDIA RTX 5090 / Blackwell | --nvidia-cu128 | requirements-nvidia-cu128.txt | 570+ |
| AMD RX 6000/7000 (Linux) | --rocm | requirements-rocm.txt | ROCm 6.4+ |
| Apple Silicon (M1/M2/M3/M4) | Manual install | See Option 4 | macOS 12.3+ |
This project uses specific dependency files to ensure a smooth installation for your hardware. You can choose between the automated launcher (recommended for most users) or manual installation (for advanced users).
1. Clone the Repository
git clone https://github.com/devnen/Chatterbox-TTS-Server.git cd Chatterbox-TTS-ServerThe automated launcher handles virtual environment creation, hardware detection, dependency installation, and server startup - all in one step.
# Double-click start.bat or run from command prompt: start.bat# Make the launcher executable and run it chmod +x start.sh ./start.sh- The launcher checks your Python installation (3.10+ required)
- Creates a virtual environment automatically
- Detects your GPU hardware (NVIDIA, AMD, or CPU-only)
- Shows an installation menu with recommended option pre-selected:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Hardware Detection ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ NVIDIA GPU: Detected (NVIDIA GeForce RTX 4090) AMD GPU: Not detected ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Select Installation Type ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ [1] CPU Only No GPU acceleration - works on any system [2] NVIDIA GPU (CUDA 12.1) [DEFAULT] Standard for RTX 20/30/40 series [3] NVIDIA GPU (CUDA 12.8) For RTX 5090 / Blackwell GPUs only [4] AMD GPU (ROCm 6.4) For AMD GPUs on Linux Enter choice [2]: - Press Enter to accept the recommended default, or type a number to select a different option
- Dependencies are installed automatically (this may take several minutes on first run)
- The server starts and displays the access URLs
| Option | Description |
|---|---|
--reinstall or -r | Remove existing installation and reinstall fresh (shows menu) |
--upgrade or -u | Upgrade to latest version (keeps current hardware selection) |
--cpu | Install CPU-only version (skip menu) |
--nvidia | Install NVIDIA CUDA 12.1 version (skip menu) |
--nvidia-cu128 | Install NVIDIA CUDA 12.8 version for RTX 5090/Blackwell (skip menu) |
--rocm | Install AMD ROCm version (skip menu) |
--verbose or -v | Show detailed installation output |
--help or -h | Show help message |
Examples:
# Skip menu and install NVIDIA CUDA 12.1 directly python start.py --nvidia # Reinstall with fresh dependencies python start.py --reinstall # Upgrade to latest version (keeps your hardware selection) python start.py --upgrade # Install with verbose output for troubleshooting python start.py --reinstall --nvidia --verboseAfter the first installation, simply run the launcher again to start the server:
# Windows start.bat # Linux/macOS ./start.shThe launcher detects the existing installation and starts the server directly without reinstalling.
For users who prefer manual control over the installation process.
2. Create a Python Virtual Environment
Using a virtual environment is crucial to avoid conflicts with other projects.
-
Windows (PowerShell):
python -m venv venv .\venv\Scripts\activate -
Linux (Bash):
python3 -m venv venv source venv/bin/activateYour command prompt should now start with
(venv).
3. Choose Your Installation Path
Pick one of the following commands based on your hardware. This single command will install all necessary dependencies with compatible versions.
This is the most straightforward option and works on any machine without a compatible GPU.
# Make sure your (venv) is active pip install --upgrade pip pip install -r requirements.txtπ‘ How This Works
The `requirements.txt` file is specially crafted for CPU users. It tells `pip` to use PyTorch's CPU-specific package repository and pins compatible versions of `torch` and `torchvision`. This prevents `pip` from installing mismatched versions, which is a common source of errors.For users with NVIDIA GPUs. This provides the best performance for RTX 20/30/40 series.
Prerequisite: Ensure you have the latest NVIDIA drivers installed.
# Make sure your (venv) is active pip install --upgrade pip pip install -r requirements-nvidia.txtAfter installation, verify that PyTorch can see your GPU:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')"If CUDA available: shows True, your setup is correct!
π‘ How This Works
The `requirements-nvidia.txt` file instructs `pip` to use PyTorch's official CUDA 12.1 package repository. It pins specific, compatible versions of `torch`, `torchvision`, and `torchaudio` that are built with CUDA support. This guarantees that the versions required by `chatterbox-tts` are met with the correct GPU-enabled libraries, preventing conflicts.Note: Only use this if you have an RTX 5090 or other Blackwell-based GPU. For RTX 3000/4000 series, use Option 2 above.
For users with the latest NVIDIA RTX 5090 or other Blackwell architecture GPUs that require CUDA 12.8 and sm_120 support.
Prerequisites:
- NVIDIA RTX 5090 or Blackwell-based GPU
- CUDA 12.8+ drivers (driver version 570+)
Using Docker (Recommended for RTX 5090):
# Build and start with CUDA 12.8 support docker compose -f docker-compose-cu128.yml up -d # Access the web UI at http://localhost:8004Manual Installation:
# Make sure your (venv) is active pip install --upgrade pip pip install -r requirements-nvidia-cu128.txt pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master--no-deps flag is required to prevent PyTorch from being downgraded to a version that doesn't support Blackwell GPUs.
After installation, verify that PyTorch supports sm_120:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}'); print(f'Architectures: {torch.cuda.get_arch_list()}')"You should see sm_120 in the architectures list!
π‘ Why CUDA 12.8?
The RTX 5090 uses NVIDIA's new Blackwell architecture with compute capability sm_120. PyTorch 2.8.0 with CUDA 12.8 is the first stable release that includes support for this architecture. Earlier versions (including CUDA 12.1) will fail with the error: CUDA error: no kernel image is available for execution on the device.
See README_CUDA128.md for detailed setup instructions and troubleshooting.
For users with modern, ROCm-compatible AMD GPUs.
Prerequisite: Ensure you have the latest ROCm drivers installed on a Linux system.
# Make sure your (venv) is active pip install --upgrade pip pip install -r requirements-rocm.txtAfter installation, verify that PyTorch can see your GPU:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'ROCm available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')"If ROCm available: shows True, your setup is correct!
π‘ How This Works
The `requirements-rocm.txt` file works just like the NVIDIA one, but it points `pip` to PyTorch's official ROCm 6.4.1 package repository. This ensures that the correct GPU-enabled libraries for AMD hardware are installed, providing a stable and performant environment.For users with Apple Silicon Macs (M1, M2, M3, M4, etc.).
Prerequisite: Ensure you have macOS 12.3 or later for MPS support.
Step 1: Install PyTorch with MPS support first
# Make sure your (venv) is active pip install --upgrade pip pip install torch torchvision torchaudioStep 2: Configure the server to use MPS Update your config.yaml to use MPS instead of CUDA:
tts_engine: device: mps # Changed from 'cuda' to 'mps'Step 3: Install remaining dependencies
# Install chatterbox-tts without its dependencies to avoid conflicts pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master # Install core server dependencies pip install fastapi 'uvicorn[standard]' librosa safetensors soundfile pydub audiotsm praat-parselmouth python-multipart requests aiofiles PyYAML watchdog unidecode inflect tqdm # Install missing chatterbox dependencies pip install conformer==0.3.2 diffusers==0.29.0 resemble-perth==1.0.1 transformers==4.46.3 # Install s3tokenizer without its problematic dependencies pip install --no-deps s3tokenizer # Install a compatible version of ONNX and audio codec pip install onnx==1.16.0 descript-audio-codecAfter installation, verify that PyTorch can see your GPU:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'MPS available: {torch.backends.mps.is_available()}'); print(f'Device will use: {\"mps\" if torch.backends.mps.is_available() else \"cpu\"}')"If MPS available: shows True, your setup is correct!
π‘ Why This Process Is Different
Apple Silicon requires a specific installation sequence due to dependency conflicts between the pinned PyTorch versions in chatterbox-tts and the latest PyTorch versions that support MPS. By installing PyTorch first with MPS support, then carefully installing dependencies while avoiding version conflicts, we ensure MPS acceleration works properly. The server's automatic device detection will use MPS when configured and available.Want to test Chatterbox TTS Server immediately without any installation?
- β Full Web UI with all controls and features
- β Voice cloning with uploaded audio files
- β Predefined voices included
- β Large text processing with chunking (perfect for audiobooks)
- β Free GPU acceleration (T4 GPU)
- β No installation or setup required
- β Works on any device with a web browser
- Click the badge above to open the notebook in Google Colab
- Select GPU runtime: Runtime β Change runtime type β T4 GPU β Save
- Run Cell 1: Click the play button to install dependencies (~1-5 minutes)
- Run Cell 2: Start the server and access the Web UI via the provided links
- Wait for "Server ready! Click below" message: Locate the "localhost:8004" link and click. This starts the Web UI in your browser
- Generate speech: Use the web interface to create high-quality TTS audio
- First run: Takes a few minutes to download models (one-time only)
- Session limits: Colab free tier has usage limits; sessions may timeout after inactivity
- For production: Use the local installation or Docker deployment methods below
Prefer local installation? Continue reading below for full setup instructions.
The server relies exclusively on config.yaml for runtime configuration.
config.yaml: Located in the project root. This file stores all server settings, model paths, generation defaults, and UI state. It is created automatically on the first run (using defaults fromconfig.py) if it doesn't exist. This is the main file to edit for persistent configuration changes.- UI Configuration: The "Server Configuration" and "Generation Parameters" sections in the Web UI allow direct editing and saving of values into
config.yaml.
Key Configuration Areas (in config.yaml or UI):
server:host,port, logging settings.model:repo_id(e.g., "ResembleAI/chatterbox").tts_engine:device('auto', 'cuda', 'mps', 'cpu'),predefined_voices_path,reference_audio_path,default_voice_id.paths:model_cache(fordownload_model.py),output.generation_defaults: Default UI values fortemperature,exaggeration,cfg_weight,seed,speed_factor,language.audio_output:format,sample_rate,max_reference_duration_sec.ui_state: Stores the last used text, voice mode, file selections, etc., for UI persistence.ui:title,show_language_select,max_predefined_voices_in_dropdown.debug:save_intermediate_audio.
β Remember: Changes made to server, model, tts_engine, or paths sections in config.yaml (or via the UI's Server Configuration section) require a server restart to take effect. Changes to generation_defaults or ui_state are applied dynamically or on the next page load.
Important Note on Model Downloads (First Run): The very first time you start the server, it needs to download the chatterbox-tts model files from Hugging Face Hub. This is an automatic, one-time process (per model version, or until your Hugging Face cache is cleared).
- β³ Please be patient: This download can take several minutes, depending on your internet speed and the size of the model files (typically a few gigabytes).
- π Monitor your terminal: You'll see progress indicators or logs related to the download. The server will only become fully operational and accessible after these essential model files are successfully downloaded and loaded.
- βοΈ Subsequent starts will be much faster as the server will use the already downloaded models from your local Hugging Face cache.
You can optionally use the python download_model.py script to pre-download specific model components to the ./model_cache directory defined in config.yaml. However, please note that the runtime engine (engine.py) primarily loads the model from the main Hugging Face Hub cache directly, not this specific local model_cache directory.
The easiest way to run the server is using the automated launcher:
Windows:
start.batLinux / macOS:
./start.shThe launcher automatically:
- Activates the virtual environment
- Verifies the installation is complete
- Starts the server
- Waits for the server to be ready (including model download on first run)
- Displays the access URLs when ready
If you prefer to start the server manually:
Steps to Run:
- Activate the virtual environment (if not activated):
- Linux/macOS:
source venv/bin/activate - Windows:
.\venv\Scripts\activate
- Linux/macOS:
- Run the server:
python server.py
- Access the UI: After the server starts (and completes any initial model downloads), it should automatically attempt to open the Web UI in your default browser. If it doesn't, manually navigate to
http://localhost:PORT(e.g.,http://localhost:8004if your configured port is 8004). - Access API Docs: Open
http://localhost:PORT/docsfor interactive API documentation. - Stop the server: Press
CTRL+Cin the terminal where the server is running.
## π Updating to the Latest Version Follow these steps to update your local installation to the latest version from GitHub. This guide provides multiple methods: using the automated launcher, the recommended `git stash` workflow, and a manual backup alternative. All methods preserve your local `config.yaml`. **First, Navigate to Your Project Directory** Before starting, open your terminal and go to the project folder. ```bash cd Chatterbox-TTS-Server The launcher provides simple upgrade functionality that handles everything automatically.
Upgrade (keeps your hardware selection):
# First, pull the latest code git pull origin main # Then upgrade dependencies using the launcher # Windows python start.py --upgrade # Linux/macOS python3 start.py --upgradeFull Reinstall (choose new hardware option):
git pull origin main # Windows python start.py --reinstall # Linux/macOS python3 start.py --reinstallThe --upgrade flag preserves your current hardware selection (CPU, NVIDIA, etc.) and reinstalls dependencies.
The --reinstall flag removes the existing installation completely and shows the hardware selection menu again.
Changing Hardware Configuration:
To switch to a different hardware configuration (e.g., from CPU to NVIDIA, or from CUDA 12.1 to CUDA 12.8):
# Shows menu to select new hardware python start.py --reinstall # Or specify directly python start.py --reinstall --nvidia python start.py --reinstall --nvidia-cu128 python start.py --reinstall --cpu python start.py --reinstall --rocmIf you installed manually without using the launcher, this is the standard and safest way to update using Git. It automatically handles your local changes (like to config.yaml) without needing to manually copy files.
First, activate your virtual environment:
# On Windows (PowerShell): .\venv\Scripts\activate # On Linux (Bash): source venv/bin/activate-
Step 1: Stash Your Local Changes This command safely stores your modifications on a temporary "shelf."
git stash
-
Step 2: Pull the Latest Version Now that your local changes are safely stored, you can download the latest code from GitHub.
git pull origin main
-
Step 3: Re-apply Your Changes This command takes your changes from the shelf and applies them back to the updated code.
git stash pop
Your
config.yamlwill now have your settings, and the rest of the project files will be up-to-date. You can now proceed to the "Final Steps" section below.
This method involves manually backing up and restoring your configuration file.
First, activate your virtual environment:
# On Windows (PowerShell): .\venv\Scripts\activate # On Linux (Bash): source venv/bin/activate-
Step 1: Backup Your Configuration
β οΈ Important: Create a backup of yourconfig.yamlto preserve your custom settings.# Create a backup of your current configuration cp config.yaml config.yaml.backup -
Step 2: Update the Repository Choose one of the following commands based on your needs:
- Standard Update (recommended): If you encounter merge conflicts with
git pull origin main
config.yaml, you may need to resolve them manually. - Force Update (if you have conflicts or want to ensure a clean update):
# Fetch latest changes and reset to match remote exactly git fetch origin git reset --hard origin/main
- Standard Update (recommended):
-
Step 3: Restore Your Configuration
# Restore your backed-up configuration cp config.yaml.backup config.yamlNow, proceed to the "Final Steps" section.
After you have updated the code using Method 2 or 3, complete these final steps.
1. Check for New Configuration Options
β Recommended: Compare your restored config.yaml with the new default config to see if there are new options you might want to adopt. The server will add new keys with default values, but you may want to review them.
2. Update Dependencies
β Important: After pulling new code, always update the dependencies to ensure you have the correct versions. Choose the command that matches your hardware:
- For CPU-Only Systems:
pip install -r requirements.txt
- For NVIDIA GPU Systems (CUDA 12.1):
pip install -r requirements-nvidia.txt
- For NVIDIA GPU Systems (CUDA 12.8 / Blackwell):
pip install -r requirements-nvidia-cu128.txt pip install --no-deps git+https://github.com/devnen/chatterbox-v2.git@master
- For AMD GPU Systems:
pip install -r requirements-rocm.txt
3. Restart the Server
If the server was running, stop it (CTRL+C) and restart it to apply all the updates.
python server.pyβ Note: Your custom settings in config.yaml are preserved with this method. The server will automatically add any new configuration options with default values if needed. You can safely delete config.yaml.backup once you've verified everything works correctly.
β Docker Users: If using Docker and you have a local config.yaml mounted as a volume, the same backup/restore process applies before running:
docker compose down docker compose pull # if using pre-built images docker compose up -d --buildFor RTX 5090 / Blackwell GPUs: Use the CUDA 12.8 configuration:
docker compose -f docker-compose-cu128.yml down docker compose -f docker-compose-cu128.yml pull docker compose -f docker-compose-cu128.yml up -d --buildThe most intuitive way to use the server:
- Engine Selector: Use the dropdown at the top to switch between Original Chatterbox and ChatterboxβTurbo. The backend auto-loads the selected engine.
- Text Input: Enter your plain text script. For audiobooks: Simply paste the entire book text - the chunking system will automatically handle long texts and create seamless audio output.
- Voice Mode: Choose:
Predefined Voices: Select a curated voice from the./voicesdirectory.Voice Cloning: Select an uploaded reference file from./reference_audio.
- Presets: Load examples from
ui/presets.yaml. New presets demonstrate Turbo's paralinguistic tags. - Reference/Predefined Audio Management: Import new files and refresh lists.
- Generation Parameters: Adjust Temperature, Exaggeration, CFG Weight, Speed Factor, Seed. Save defaults to
config.yaml. - Chunking Controls: Toggle "Split text into chunks" and adjust "Chunk Size" for long texts.
- Server Configuration: View/edit parts of
config.yaml(requires server restart for some changes). - Audio Player: Play generated audio with waveform visualization.
When the engine selector is set to ChatterboxβTurbo, you can include paralinguistic tags inline:
Hi there [chuckle] β thanks for calling back. One momentβ¦ [cough] sorry about that. Let's get this fixed. Turbo supports native tags like [laugh], [cough], and [chuckle] for more realistic, expressive speech. These tags are ignored when using Original Chatterbox.
The primary endpoint for TTS generation is /tts, which offers detailed control over the synthesis process.
/tts(POST): Main endpoint for speech generation.- Request Body (
CustomTTSRequest):text(string, required): Plain text to synthesize.voice_mode(string, "predefined" or "clone", default "predefined"): Specifies voice source.predefined_voice_id(string, optional): Filename of predefined voice (ifvoice_modeis "predefined").reference_audio_filename(string, optional): Filename of reference audio (ifvoice_modeis "clone").output_format(string, "wav" or "opus", default "wav").split_text(boolean, default True): Whether to chunk long text.chunk_size(integer, default 120): Target characters per chunk.temperature,exaggeration,cfg_weight,seed,speed_factor,language: Generation parameters overriding defaults.
- Response: Streaming audio (
audio/wavoraudio/opus).
- Request Body (
/v1/audio/speech(POST): OpenAI-compatible.input: Text.voice: 'S1', 'S2', 'dialogue', 'predefined_voice_filename.wav', or 'reference_filename.wav'.response_format: 'opus' or 'wav'.speed: Playback speed factor (0.5-2.0).seed: (Optional) Integer seed, -1 for random.
- Helper Endpoints (mostly for UI):
GET /api/ui/initial-data: Fetches all initial configuration, file lists, and presets for the UI.POST /save_settings: Saves partial updates toconfig.yaml.POST /reset_settings: Resetsconfig.yamlto defaults.GET /get_reference_files: Lists files inreference_audio/.GET /get_predefined_voices: Lists formatted voices fromvoices/.POST /upload_reference: Uploads reference audio files.POST /upload_predefined_voice: Uploads predefined voice files.
Run Chatterbox TTS Server easily using Docker. The recommended method uses Docker Compose, which is pre-configured for different GPU types.
- Docker installed.
- Docker Compose installed (usually included with Docker Desktop).
- (For GPU acceleration)
- NVIDIA: Up-to-date drivers and the NVIDIA Container Toolkit installed.
- AMD: Up-to-date ROCm drivers installed on a Linux host. User must be in
videoandrendergroups.
This method uses the provided docker-compose.yml files to manage the container, volumes, and configuration easily.
git clone https://github.com/devnen/Chatterbox-TTS-Server.git cd Chatterbox-TTS-ServerThe default docker-compose.yml is configured for NVIDIA GPUs.
docker compose up -d --buildPrerequisites: Ensure you have ROCm drivers installed on your host system and your user is in the required groups:
# Add your user to required groups (one-time setup) sudo usermod -a -G video,render $USER # Log out and back in for changes to take effectStart the container:
docker compose -f docker-compose-rocm.yml up -d --buildA dedicated compose file is now provided for CPU-only users to avoid GPU driver errors.
docker compose -f docker-compose-cpu.yml up -d --buildβ Note: The first time you run this, Docker will build the image and download model files, which can take some time. Subsequent starts will be much faster.
Open your web browser to http://localhost:PORT (e.g., http://localhost:8004 or the host port you configured).
# Check if container can see NVIDIA GPU docker compose exec chatterbox-tts-server nvidia-smi # Verify PyTorch can access the GPU docker compose exec chatterbox-tts-server python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"# Check if container can see AMD GPU docker compose -f docker-compose-rocm.yml exec chatterbox-tts-server rocm-smi # Verify PyTorch can access the GPU docker compose -f docker-compose-rocm.yml exec chatterbox-tts-server python3 -c "import torch; print(f'ROCm available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"No GPU detected\"}')"docker compose logs -f # For NVIDIA docker compose -f docker-compose-rocm.yml logs -f # For AMD docker compose -f docker-compose-cpu.yml logs -f # For CPUdocker compose down # For NVIDIA docker compose -f docker-compose-rocm.yml down # For AMD docker compose -f docker-compose-cpu.yml down # For CPUdocker compose restart chatterbox-tts-server # For NVIDIA docker compose -f docker-compose-rocm.yml restart chatterbox-tts-server # For AMD docker compose -f docker-compose-cpu.yml restart chatterbox-tts-server # For CPUIf your AMD GPU is not officially supported by ROCm but is similar to a supported architecture, you can override the detected architecture:
# For RX 5000/6000 series (gfx10xx) - override to gfx1030 HSA_OVERRIDE_GFX_VERSION=10.3.0 docker compose -f docker-compose-rocm.yml up -d # For RX 7000 series (gfx11xx) - override to gfx1100 HSA_OVERRIDE_GFX_VERSION=11.0.0 docker compose -f docker-compose-rocm.yml up -d # For Vega cards - override to gfx906 HSA_OVERRIDE_GFX_VERSION=9.0.6 docker compose -f docker-compose-rocm.yml up -dCheck your GPU architecture:
# Method 1: Using rocminfo (if ROCm installed on host) rocminfo | grep "Name:" # Method 2: Using lspci lspci | grep VGACommon GPU Architecture Mappings:
- RX 7900 XTX/XT, RX 7800 XT, RX 7700 XT: gfx1100 β Use
HSA_OVERRIDE_GFX_VERSION=11.0.0 - RX 6900 XT, RX 6800 XT, RX 6700 XT, RX 6600 XT: gfx1030-1032 β Use
HSA_OVERRIDE_GFX_VERSION=10.3.0 - RX 5700 XT, RX 5600 XT: gfx1010 β Use
HSA_OVERRIDE_GFX_VERSION=10.3.0 - Vega 64, Vega 56: gfx900-906 β Use
HSA_OVERRIDE_GFX_VERSION=9.0.6
- Supported GPUs: AMD Instinct data center GPUs and select Radeon GPUs. Check the ROCm compatibility list.
- Operating System: ROCm is currently supported only on Linux systems.
- Performance: AMD GPUs with ROCm provide excellent performance for ML workloads, with support for mixed-precision training.
- PyTorch Version: Uses PyTorch 2.6.0 with ROCm 6.4.1 for optimal compatibility and performance.
-
"Python not found" error:
- Ensure Python 3.10+ is installed and added to PATH
- Windows: Reinstall Python and check "Add Python to PATH" during installation
- Linux: Install with
sudo apt install python3 python3-venv python3-pip
-
"venv module not found" (Linux):
sudo apt install python3-venv
-
Installation hangs or fails:
- Run with verbose mode for details:
python start.py --reinstall --verbose - Check internet connection
- Ensure sufficient disk space (10GB+ recommended)
- Run with verbose mode for details:
-
Permission errors removing venv (Windows):
- Close all terminals and editors that might have files open in the venv folder
- Try running as Administrator
- Manually delete the venv folder:
rmdir /s /q venv
-
Wrong hardware detected:
- The launcher detects NVIDIA GPUs via
nvidia-smiand AMD GPUs viarocm-smi - If detection fails, use direct installation flags:
--cpu,--nvidia,--nvidia-cu128,--rocm
- The launcher detects NVIDIA GPUs via
-
Checking installation type:
# The installation type is stored in venv/.install_type cat venv/.install_type # Linux/macOS type venv\.install_type # Windows
- MPS Not Available: Ensure you have macOS 12.3+ and an Apple Silicon Mac. Verify with
python -c "import torch; print(torch.backends.mps.is_available())" - Installation Conflicts: If you encounter version conflicts, follow the exact Apple Silicon installation sequence in Option 4, installing PyTorch first before other dependencies.
- ONNX Build Errors: Use the specific ONNX version
pip install onnx==1.16.0as shown in the installation steps. - Model Loading Errors: Ensure
config.yamlhasdevice: mpsin thetts_enginesection.
- CUDA Not Available / Slow: Check NVIDIA drivers (
nvidia-smi), ensure correct CUDA-enabled PyTorch is installed (see Installation options). - "No kernel image available" error:
- For RTX 5090/Blackwell: Use
--nvidia-cu128orrequirements-nvidia-cu128.txtinstead of standard NVIDIA installation - For older GPUs (RTX 20/30/40): Use
--nvidiaorrequirements-nvidia.txt
- For RTX 5090/Blackwell: Use
- VRAM Out of Memory (OOM):
- Ensure your GPU meets minimum requirements for Chatterbox.
- Close other GPU-intensive applications.
- If processing very long text even with chunking, try reducing
chunk_size(e.g., 100-150).
- ROCm not working on Windows:
- ROCm only supports Linux - use CPU mode on Windows with AMD GPUs
- The launcher will warn you if you select ROCm on Windows
- Import Errors (e.g.,
chatterbox-tts,librosa): Ensure virtual environment is active and dependencies installed successfully. Try reinstalling:python start.py --reinstall libsndfileError (Linux): Runsudo apt install libsndfile1.- Model Download Fails: Check internet connection.
ChatterboxTTS.from_pretrained()will attempt to download from Hugging Face Hub. Ensuremodel.repo_idinconfig.yamlis correct. - Voice Cloning/Predefined Voice Issues:
- Ensure files exist in the correct directories (
./reference_audio,./voices). - Check server logs for errors related to file loading or processing.
- Ensure files exist in the correct directories (
- Permission Errors (Saving Files/Config): Check write permissions for
./config.yaml,./logs,./outputs,./reference_audio,./voices, and the Hugging Face cache directory if using Docker volumes. - UI Issues / Settings Not Saving: Clear browser cache/local storage. Check browser developer console (F12) for JavaScript errors. Ensure
config.yamlis writable by the server process. - Port Conflict (
Address already in use): Another process is using the port. Stop it or changeserver.portinconfig.yaml(requires server restart).- Find process using port:
netstat -ano | findstr :8004(Windows) orlsof -i :8004(Linux)
- Find process using port:
- Generation Cancel Button: This is a "UI Cancel" - it stops the frontend from waiting but doesn't instantly halt ongoing backend model inference. Clicking Generate again cancels the previous UI wait.
Set the CUDA_VISIBLE_DEVICES environment variable before running python server.py (or before running the launcher) to specify which GPU(s) PyTorch should see. The server uses the first visible one (effectively cuda:0 from PyTorch's perspective).
-
Example (Use only physical GPU 1):
- Linux/macOS:
CUDA_VISIBLE_DEVICES="1" python server.py - Windows CMD:
set CUDA_VISIBLE_DEVICES=1 && python server.py - Windows PowerShell:
$env:CUDA_VISIBLE_DEVICES="1"; python server.py
- Linux/macOS:
-
Example (Use physical GPUs 6 and 7 - server uses GPU 6):
- Linux/macOS:
CUDA_VISIBLE_DEVICES="6,7" python server.py - Windows CMD:
set CUDA_VISIBLE_DEVICES=6,7 && python server.py - Windows PowerShell:
$env:CUDA_VISIBLE_DEVICES="6,7"; python server.py
- Linux/macOS:
Note: CUDA_VISIBLE_DEVICES selects GPUs; it does not fix OOM errors if the chosen GPU lacks sufficient memory.
Check Python version:
python --versionCheck PyTorch and CUDA:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"Check PyTorch architectures (for Blackwell support):
python -c "import torch; print(torch.cuda.get_arch_list())"Test server manually:
# Activate venv first, then: python server.py- Main config: The server uses
config.yamlfor settings. The docker-compose files mount your localconfig.yamlto/app/config.yamlinside the container. - First run: If
config.yamldoesn't exist locally, the application will create a default one with sensible defaults. - Editing config: You can edit the local
config.yamldirectly. Changes to server/model/path settings require a container restart:docker compose restart chatterbox-tts-server
- UI settings: Changes to generation defaults and UI state are often saved automatically by the application.
Persistent data is stored on your host machine via volume mounts:
./config.yaml:/app/config.yaml- Main application configuration./voices:/app/voices- Predefined voice audio files./reference_audio:/app/reference_audio- Your uploaded reference audio files for cloning./outputs:/app/outputs- Generated audio files saved from UI/API./logs:/app/logs- Server log fileshf_cache:/app/hf_cache- Named volume for Hugging Face model cache (persists downloads)
Managing volumes:
# Remove all data (including downloaded models) docker compose down -v # Remove only application data (keep model cache) docker compose down sudo rm -rf voices/ reference_audio/ outputs/ logs/ config.yaml # View volume usage docker system df- Apple Silicon (MPS) Issues:
- MPS Not Available: Ensure you have macOS 12.3+ and an Apple Silicon Mac. Verify with
python -c "import torch; print(torch.backends.mps.is_available())" - Installation Conflicts: If you encounter version conflicts, follow the exact Apple Silicon installation sequence in Option 3, installing PyTorch first before other dependencies.
- ONNX Build Errors: Use the specific ONNX version
pip install onnx==1.16.0as shown in the installation steps. - Model Loading Errors: Ensure
config.yamlhasdevice: mpsin thetts_enginesection.
- MPS Not Available: Ensure you have macOS 12.3+ and an Apple Silicon Mac. Verify with
- CUDA Not Available / Slow: Check NVIDIA drivers (
nvidia-smi), ensure correct CUDA-enabled PyTorch is installed (Installation Step 4). - VRAM Out of Memory (OOM):
- Ensure your GPU meets minimum requirements for Chatterbox.
- Close other GPU-intensive applications.
- If processing very long text even with chunking, try reducing
chunk_size(e.g., 100-150).
- Import Errors (e.g.,
chatterbox-tts,librosa): Ensure virtual environment is active andpip install -r requirements.txtcompleted successfully. libsndfileError (Linux): Runsudo apt install libsndfile1.- Model Download Fails: Check internet connection.
ChatterboxTTS.from_pretrained()will attempt to download from Hugging Face Hub. Ensuremodel.repo_idinconfig.yamlis correct. - Voice Cloning/Predefined Voice Issues:
- Ensure files exist in the correct directories (
./reference_audio,./voices). - Check server logs for errors related to file loading or processing.
- Ensure files exist in the correct directories (
- Permission Errors (Saving Files/Config): Check write permissions for
./config.yaml,./logs,./outputs,./reference_audio,./voices, and the Hugging Face cache directory if using Docker volumes. - UI Issues / Settings Not Saving: Clear browser cache/local storage. Check browser developer console (F12) for JavaScript errors. Ensure
config.yamlis writable by the server process. - Port Conflict (
Address already in use): Another process is using the port. Stop it or changeserver.portinconfig.yaml(requires server restart). - Generation Cancel Button: This is a "UI Cancel" - it stops the frontend from waiting but doesn't instantly halt ongoing backend model inference. Clicking Generate again cancels the previous UI wait.
Set the CUDA_VISIBLE_DEVICES environment variable before running python server.py to specify which GPU(s) PyTorch should see. The server uses the first visible one (effectively cuda:0 from PyTorch's perspective).
-
Example (Use only physical GPU 1):
- Linux/macOS:
CUDA_VISIBLE_DEVICES="1" python server.py - Windows CMD:
set CUDA_VISIBLE_DEVICES=1 && python server.py - Windows PowerShell:
$env:CUDA_VISIBLE_DEVICES="1"; python server.py
- Linux/macOS:
-
Example (Use physical GPUs 6 and 7 - server uses GPU 6):
- Linux/macOS:
CUDA_VISIBLE_DEVICES="6,7" python server.py - Windows CMD:
set CUDA_VISIBLE_DEVICES=6,7 && python server.py - Windows PowerShell:
$env:CUDA_VISIBLE_DEVICES="6,7"; python server.py
- Linux/macOS:
Note: CUDA_VISIBLE_DEVICES selects GPUs; it does not fix OOM errors if the chosen GPU lacks sufficient memory.
Contributions are welcome! Please feel free to open an issue to report bugs or suggest features, or submit a Pull Request for improvements.
This project is licensed under the MIT License.
You can find it here: https://opensource.org/licenses/MIT
- Core Model: This project utilizes the Chatterbox TTS model by Resemble AI.
- UI Inspiration: Special thanks to Lex-au whose Orpheus-FastAPI project served as inspiration for the web interface design.
- Similar Project: This server shares architectural similarities with our Dia-TTS-Server and Kitten-TTS-Server projects, which use different TTS engines.
- Containerization Technologies: Docker and NVIDIA Container Toolkit.
- Core Libraries:

