#

text-to-audio

Here are 70 public repositories matching this topic...

Amphion

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

text-to-speech audit speech-synthesis audio-synthesis music-generation voice-conversion vocoder emilia text-to-audio fastspeech2 vits audio-generation singing-voice-conversion vall-e audioldm naturalspeech2 maskgct

Updated May 27, 2025
Python

abogen

denizsafak / abogen

Generate audiobooks from EPUBs, PDFs and text with synchronized captions.

Updated Dec 22, 2025
Python

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

audio computer-vision deep-learning audio-synthesis video-to-audio text-to-audio

Updated Nov 30, 2025
Python

Tencent-Hunyuan / HunyuanVideo-Foley

HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.

tta video-to-audio text-to-audio text-to-video foley-sound-synthesis foley-art aigc-audio text-video-to-audio

Updated Sep 28, 2025
Python

tango

declare-lab / tango

A family of diffusion models for text-to-audio generation.

language-models diffusion diffusion-models text-to-audio audio-generation large-language-models

Updated Jul 29, 2025
Python

gitmylo / audio-webui

A webui for different audio related Neural Networks

music text-to-speech ai generative-audio aio artificial-intelligence tts bark rvc all-in-one generative-music voice-cloning text-to-audio audioldm audiocraft bark-gui rvc-gui

Updated May 19, 2025
Python

ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Updated Jun 29, 2025
Python

FunAudioLLM / ThinkSound

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

tta video-to-audio text-to-audio foley-sound-synthesis aigc-audio text-video-to-audio

Updated Nov 25, 2025
Python

declare-lab / TangoFlux

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

tta text-to-audio generative-ai text-to-audio-ai flow-matching

Updated Jul 29, 2025
Jupyter Notebook

Text-to-Audio / Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

latent-space video-to-audio diffusion-models text-to-audio latent-diffusion

Updated May 22, 2024
Python

ivcylc / OpenMusic

OpenMusic: SOTA Text-to-music (TTM) Generation

ai music-generation mdt dit ai-music diffusion-models text-to-audio music-ai ai-music-generator music-ai-architectures hifi-gan text-to-music vall-e text-to-audio-ai audioldm diffusion-transformer ai-music-generation text-to-music-transformer

Updated Jun 26, 2025
Python

lucidrains / nuwa-pytorch

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

deep-learning transformers artificial-intelligence attention-mechanism text-to-audio text-to-video

Updated Jan 17, 2023
Python

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

text-to-speech multimodality text-to-image text-to-audio text-to-video text-to-music multimodal-models aigc large-language-models llm text-to-3d multimodal-generation mllm text-to-sound large-vision-language-models multimodal-large-language-models lvlm

Updated Apr 4, 2025
HTML

mustango

AMAAI-Lab / mustango

Mustango: Toward Controllable Text-to-Music Generation

diffusion-models text-to-audio text-to-music large-language-models

Updated Jun 2, 2025
Python

haidog-yaqub / EzAudio

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

diffusion-models text-to-audio generative-ai

Updated Dec 17, 2025
Python

TencentARC / AudioStory

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

video-to-audio diffusion-models text-to-audio audio-generation multimodal-large-language-models video-dubbing

Updated Sep 21, 2025
Jupyter Notebook

happylittlecat2333 / Auffusion

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

diffusion diffusion-models text-to-audio audio-generation large-language-models

Updated Mar 25, 2024
Jupyter Notebook

ilaria-manco / word2wave

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

music-generation ai-music text-to-audio audio-generation

Updated Dec 13, 2021
Python

bnsantoso / sub-to-audio

Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.

python text-to-speech tts audio-processing subtitle-conversion text-to-audio subtitle-to-speech subtitle-to-voice subtitle-to-audio

Updated Dec 14, 2023
Python

sony / soundctm

Pytorch implementation of SoundCTM

pytorch diffusion-models text-to-audio audio-generation

Updated Mar 31, 2025
Python

Improve this page

Add a description, image, and links to the text-to-audio topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-to-audio topic, visit your repo's landing page and select "manage topics."