The open source Meme Search Engine and Finder. Free and built to self-host locally with Python, Ruby, and Docker.
- Updated
Dec 3, 2025 - HTML
The open source Meme Search Engine and Finder. Free and built to self-host locally with Python, Ruby, and Docker.
Multimodal RAG to search and interact locally with technical documents of any kind
[AAAI 2026]Release of code, datasets and model for our work TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
A small VLM that sees everything
Repository for the paper 'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges accepted at SIGDIAL'23.
This is an ;emi completed evaluation software which is based upon the question paper pattern of certain university. which is going to be intregrated with ML to detect the bias correction and help evaluators
Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."