Skip to content
View coder543's full-sized avatar

Highlights

  • Pro

Block or report coder543

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM …

Jupyter Notebook 8,570 1,346 Updated Oct 3, 2025

Self-hostable alternative to Google Timeline (Google Location History)

Ruby 6,967 209 Updated Oct 15, 2025

🔍 AI search engine - self-host with local or cloud LLMs

TypeScript 3,466 323 Updated Sep 27, 2024

High performance self-hosted photo and video management solution.

TypeScript 81,270 4,272 Updated Oct 15, 2025

Foundational model for human-like, expressive TTS

Python 4,188 694 Updated Jul 30, 2024

Cast Mac windows to visionOS

Swift 876 43 Updated Oct 8, 2025

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Python 7,787 593 Updated Jul 17, 2024

This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.

Python 1,375 67 Updated Aug 4, 2025

A fast, local neural text to speech system

C++ 10,140 838 Updated Aug 26, 2025

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 4,502 261 Updated Jun 8, 2025

Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message…

TypeScript 30,777 5,923 Updated Oct 15, 2025

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 34,639 3,823 Updated Apr 19, 2025

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,370 243 Updated Dec 3, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 5,997 625 Updated Aug 10, 2024

llama.cpp with BakLLaVA model describes what does it see

Python 382 41 Updated Nov 8, 2023

A programming framework for agentic AI

Python 50,778 7,756 Updated Oct 8, 2025

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,675 442 Updated May 29, 2024

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Python 4,970 391 Updated Jul 10, 2024

⏩ Ship faster with Continuous AI. Build and run custom agents across your IDE, terminal, and CI

TypeScript 29,336 3,633 Updated Oct 15, 2025

AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.

Rust 3,337 287 Updated Sep 23, 2025
Python 3,379 146 Updated Feb 25, 2024

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,770 570 Updated May 3, 2024

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.

Jupyter Notebook 4,927 566 Updated Sep 17, 2024

Toy Gaussian Splatting visualization in Unity

C# 2,821 365 Updated Aug 5, 2025

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Python 18,936 2,663 Updated Oct 30, 2024

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 18,169 1,920 Updated Oct 15, 2025

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,929 790 Updated Feb 11, 2024

Search images with a text or image query, using Open AI's pretrained CLIP model.

Python 257 24 Updated Jan 15, 2022

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,677 1,160 Updated Nov 14, 2024
Next