Here are 66 public repositories matching this topic...
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Updated May 27, 2025 Python Generate audiobooks from EPUBs, PDFs and text with synchronized captions.
Updated Sep 18, 2025 Python [CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Updated Sep 24, 2025 Python HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.
Updated Sep 28, 2025 Python A webui for different audio related Neural Networks
Updated May 19, 2025 Python A family of diffusion models for text-to-audio generation.
Updated Jul 29, 2025 Python StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Updated Jun 29, 2025 Python [NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
Updated Sep 19, 2025 Python TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
Updated Jul 29, 2025 Jupyter Notebook PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
Updated May 22, 2024 Python OpenMusic: SOTA Text-to-music (TTM) Generation
Updated Jun 26, 2025 Python Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
Updated Jan 17, 2023 Python 🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Mustango: Toward Controllable Text-to-Music Generation
Updated Jun 2, 2025 Python High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Updated Jul 3, 2025 Python AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Updated Sep 21, 2025 Jupyter Notebook Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
Updated Mar 25, 2024 Jupyter Notebook Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
Updated Dec 13, 2021 Python Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.
Updated Dec 14, 2023 Python Pytorch implementation of SoundCTM
Updated Mar 31, 2025 Python Improve this page Add a description, image, and links to the text-to-audio topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo To associate your repository with the text-to-audio topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.