LocalSoundsAPI

The ultimate portable, offline all-in-one audio studio
Text-to-Speech · Transcription - Subtitles - Music Generation · Sound Effects · Video Production · AI Chatbot

LocalSoundsAPI gives you both a full-featured browser-based web interface and a complete local REST API — use it interactively or call it from scripts, other apps, or automation tools.

Everything runs locally from one folder — no installation, no internet needed after setup.

Included Engines (all fully local & offline)

XTTS v2 – Top-tier multilingual voice cloning with speaker embeddings
Fish Speech – Extremely fast and expressive cloned voices
Kokoro 82M – Lightning-fast English TTS with 20 premium built-in voices
Stable Audio Open 1.0 – Text-to-music and sound effects (CLAP-scored variants)
ACE-Step 3.5B – Advanced multi-line prompt music generation (style + lyrics)
Whisper – On-demand transcription & quality verification for every generated chunk
Local LLM Chatbot – Built-in llama.cpp assistant for writing prompts, scripts, lyrics, stories, and full projects
OpenRouter / LM Studio support – Optional cloud or external local backends for the chatbot

Key Features

Professional post-processing on every engine
De-reverb, de-essing, loudness normalization (-23 LUFS), intelligent silence trimming, peak limiting, and optional Whisper verification with automatic retries.
Full project system
Save jobs with progress tracking, automatic recovery (##recover##), and persistent job.json files.
Powerful built-in Chatbot
Helps you write perfect prompts, lyrics, stories, or entire scripts. Responses can be sent directly to any TTS or music engine with one click.
Per-model device selection
Every model (XTTS, Fish, Kokoro, Stable Audio, ACE-Step, Whisper, local LLM) can be loaded on CPU or any available GPU independently — perfect for mixing heavy and light models.
Run multiple instances
Use (portable) LocalSoundsAPI-Multi.bat to launch several copies on different ports — great for parallel generation or different model setups.
Video production tool
Turn any audio + transcription into a subtitled video (horizontal/vertical, solid color, transparent, or image/video background).
Settings presets – Save and load all your favorite parameters instantly.

Quick Start – Fully Portable (No Installation)

Download the repository code
Go to the main repo → Code → Download ZIP.
Extract it to any folder you like (e.g., Desktop, Documents, or a USB drive). This is your main project folder.
Download the portable binaries from Releases
Go to Releases and download:
- portable-python-env-v1.7z
- bin.zip
Extract the binaries correctly
- Extract portable-python-env-v1.7z directly into your main project folder → it creates the python/ subfolder.
- Extract bin.zip into the existing bin/ folder (inside your main project folder) → it populates bin/ffmpeg/, bin/rubberband/, and bin/espeak-ng/.
Launch the app
- Single instance (recommended for most users):
  Double-click (portable) LocalSoundsAPI-Single.bat
  → It always starts on port 5006 and opens http://127.0.0.1:5006 in your browser.
- Multiple instances (for running several generations in parallel):
  Double-click (portable) LocalSoundsAPI-Multi.bat
  → It will ask you:
  • How many instances do you want?
  • Starting from which port? (e.g., 5006, 5007, 5008...)
  Each instance gets its own port and browser tab.

First run only: The app auto-downloads all models (~8–12 GB total). This happens on a need-to-use basis once and can take 10–40 minutes. Just let it finish.

That's it – completely offline and portable after the first run!

Important Folders

models/ – Place or auto-download TTS/music models here
voices/ – Your reference voice samples for cloning
projects_output/ – All saved jobs and final outputs
brain/ – Chatbot history, archives, and system prompts
settings/ – Your saved parameter presets
bin/ – Bundled ffmpeg, rubberband, eSpeak-ng
python/ – Complete portable Python environment

Project Structure

project-root/ ├── ACE-Step/ # Bundled ACE-Step repo (music generation) ├── bin/ # Portable tools │ ├── ffmpeg/ │ ├── rubberband/ │ └── espeak-ng/ ├── brain/ # Chatbot memory │ ├── context_history/ # Current + archived chats │ └── system_prompt.json ├── fish-speech/ # Bundled Fish Speech repo ├── models/ # All models (auto-downloaded or placed here) │ ├── XTTS-v2/ │ ├── fish-speech-1.5/ │ ├── kokoro-82m/ │ ├── stable-audio-open-1.0/ │ ├── ace_step/ │ └── clap-htsat-unfused/ ├── projects_output/ # Saved jobs and final outputs ├── voices/ # Your reference voice samples ├── settings/ # Saved parameter presets ├── static/ # Web UI (CSS, JS, icons) ├── templates/ # HTML pages ├── routes/ # All Flask endpoints ├── python/ # Portable Python environment (from the 7z) ├── (portable) LocalSoundsAPI-Single.bat ├── (portable) LocalSoundsAPI-Multi.bat ├── main.py ├── config.py └── requirements.txt

Why This Feels So Smooth

Completely self-contained – The bundled portable Python environment is isolated from your system Python. No pip installs, no conda environments, no dependency conflicts, no PATH headaches. Just extract and run.
Truly offline – After the initial model downloads (which you can do once), everything works 100% without internet.
No admin rights needed – Perfect for work/school computers or USB stick setups.
Instant multi-GPU support – Load heavy models on your best GPU and lighter ones (Whisper, Kokoro, Fish) on another or on CPU — all from the same interface.

Tips for the Best Experience

First run? Let the app auto-download the models you need (XTTS, Fish, Kokoro, Stable Audio, ACE-Step, CLAP, Whisper). It only happens once per model.
Low VRAM? Use the per-model device selectors — keep big models on your strongest GPU and run Whisper/Kokoro on CPU or a smaller card.
Want to generate faster? Launch multiple instances with LocalSoundsAPI-Multi.bat — one for TTS, one for music, one for the chatbot, etc.
Chatbot for content creation – Stuck on a prompt or lyric? Ask the built-in assistant — then click the little icons under its reply to send the text straight to XTTS, Fish, Kokoro, Stable Audio, or ACE-Step.
Save everything you like – Use the “Save Path” field to create permanent projects in projects_output/. Temporary generations disappear when you close the app (unless saved).

Enjoy a clean, powerful, completely local creative workflow — no cloud, no subscriptions, no compromises! 🎧✨

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ACE-Step		ACE-Step
bin		bin
brain		brain
fish-speech		fish-speech
models		models
routes		routes
settings		settings
static		static
templates		templates
voices		voices
(portable) LocalSoundsAPI-Multi.bat		(portable) LocalSoundsAPI-Multi.bat
(portable) LocalSoundsAPI-Single.bat		(portable) LocalSoundsAPI-Single.bat
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
[stand-alone-app]-API_client.py		[stand-alone-app]-API_client.py
[stand-alone-app]-print-project-files.py		[stand-alone-app]-print-project-files.py
[stand-alone-app]-project_backup.py		[stand-alone-app]-project_backup.py
audio_post.py		audio_post.py
audio_post_FISH.py		audio_post_FISH.py
audio_post_KOKORO.py		audio_post_KOKORO.py
audio_post_XTTS.py		audio_post_XTTS.py
config.py		config.py
logger.py		logger.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
save_utils.py		save_utils.py
text_utils.py		text_utils.py
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LocalSoundsAPI

Included Engines (all fully local & offline)

Key Features

Quick Start – Fully Portable (No Installation)

Important Folders

Project Structure

Why This Feels So Smooth

Tips for the Best Experience

About

Uh oh!

Releases 1

Packages

Languages

License

rookiemann/LocalSoundsAPI

Folders and files

Latest commit

History

Repository files navigation

LocalSoundsAPI

Included Engines (all fully local & offline)

Key Features

Quick Start – Fully Portable (No Installation)

Important Folders

Project Structure

Why This Feels So Smooth

Tips for the Best Experience

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages