Stars
📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程
"RAG-Anything: All-in-One RAG Framework"
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
An open-source RAG-based tool for chatting with your documents.
《动手学大模型Dive into LLMs》系列编程实践教程
Get your documents ready for gen AI
[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
[CVPR 2025] StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
A PyTorch implementation of U-Net for aerial imagery semantic segmentation.
Techniques for deep learning with satellite & aerial imagery
yolov5 车牌检测 车牌识别 中文车牌识别 检测 支持12种中文车牌 支持双层车牌
🤖 GPT Vision, Open Source Vision components for GPTs, generative AI, and LLM projects. Not only UI Components.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Vcstool is a command line tool designed to make working with multiple repositories easier
机器学习实战案例,涉及机器学习、深度学习等各个方向。每个案例代码量在百行左右。
LaTeX-based Template for Doctoral Thesis of Beijing University of Technology(BJUT)
a simple Matlab gui for annotating rotated grasping bounding box
Antipodal Robotic Grasping using GR-ConvNet. IROS 2020.