The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
- Updated
Sep 25, 2025
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Ladder Loss for Coherent Visual-Semantic Embedding, AAAI, 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
Text-to-image search engine for fashion runway photos using CLIP and FAISS.
Visual semantic search system | Search across products | Text Query --> Visual Retrieval
Search or tag images
Add a description, image, and links to the visual-semantic-embedding topic page so that developers can more easily learn about it.
To associate your repository with the visual-semantic-embedding topic, visit your repo's landing page and select "manage topics."