
Highlights
- Pro
Starred repositories
A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM …
Self-hostable alternative to Google Timeline (Google Location History)
🔍 AI search engine - self-host with local or cloud LLMs
High performance self-hosted photo and video management solution.
Foundational model for human-like, expressive TTS
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
An Open Source text-to-speech system built by inverting Whisper.
Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message…
Instant voice cloning by MIT and MyShell. Audio foundation model.
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
llama.cpp with BakLLaVA model describes what does it see
a state-of-the-art-level open visual language model | 多模态预训练模型
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
⏩ Ship faster with Continuous AI. Build and run custom agents across your IDE, terminal, and CI
AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
Toy Gaussian Splatting visualization in Unity
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Search images with a text or image query, using Open AI's pretrained CLIP model.
Foundational Models for State-of-the-Art Speech and Text Translation