Python Embeddings

Open-source Python projects categorized as Embeddings

Top 23 Python Embedding Projects

  1. mem0

    Universal memory layer for AI Agents

    Project mention: Write an Agent | news.ycombinator.com | 2025-11-06
  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. h2ogpt

    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

  4. txtai

    💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

    Project mention: The AI-Native GraphDB + GraphRAG + Graph Memory Landscape & Market Catalog | dev.to | 2025-10-26

    GitHub: https://github.com/neuml/txtai

  5. FlagEmbedding

    Retrieval and Retrieval-augmented LLMs

    Project mention: BGE-Reasoner: An open-source framework for reasoning-intensive retrieval | news.ycombinator.com | 2025-08-27
  6. pytorch-metric-learning

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

  7. AutoRAG

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

  8. lightly

    A python library for self-supervised learning on images.

  9. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  10. hub

    A library for transfer learning by reusing parts of TensorFlow models. (by tensorflow)

  11. towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  12. prompttools

    Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

  13. datachain

    Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images

  14. fastembed

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

  15. ailia-models

    The collection of pre-trained, state-of-the-art AI models for ailia SDK

  16. instructor-embedding

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

    Project mention: One Embedder, Any Task: Instruction-Finetuned Text Embeddings | news.ycombinator.com | 2025-10-08
  17. model2vec

    Fast State-of-the-Art Static Embeddings

    Project mention: EmbeddingGemma: The Best-in-Class Open Model for On-Device Embedding | news.ycombinator.com | 2025-09-04

    Can anyone test it through model2vec?

    https://github.com/MinishLab/model2vec

  18. GPTDiscord

    A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

  19. magnitude

    A fast, efficient universal vector embedding utility package. (by plasticityai)

  20. eda_nlp

    Data augmentation for NLP, presented at EMNLP 2019

  21. ModernBERT

    Bringing BERT into modernity via both architecture changes and scaling

  22. hazm

    Persian NLP Toolkit

  23. contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

  24. SeaGOAT

    local-first semantic code search engine

  25. lightly-train

    All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.

    Project mention: Show HN: Distill DINOv3 into your own model | news.ycombinator.com | 2025-08-15
  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Embeddings discussion

Python Embeddings related posts

  • Introducing Nano Banana Pro: Complete Developer Tutorial

    2 projects | dev.to | 21 Nov 2025
  • How to Build a RAG Solution with Llama Index, ChromaDB, and Ollama

    4 projects | dev.to | 4 Nov 2025
  • One Embedder, Any Task: Instruction-Finetuned Text Embeddings

    1 project | news.ycombinator.com | 8 Oct 2025
  • Translating Cython to Mojo, a first attempt

    5 projects | news.ycombinator.com | 6 Oct 2025
  • Token Counting Meets Amazon Bedrock

    3 projects | dev.to | 16 Sep 2025
  • BGE-Reasoner: An open-source framework for reasoning-intensive retrieval

    2 projects | news.ycombinator.com | 27 Aug 2025
  • Show HN: Distill DINOv3 into your own model

    1 project | news.ycombinator.com | 15 Aug 2025
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 22 Dec 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source Embedding projects in Python? This list will help you:

# Project Stars
1 mem0 44,522
2 h2ogpt 11,976
3 txtai 11,949
4 FlagEmbedding 11,024
5 pytorch-metric-learning 6,282
6 AutoRAG 4,485
7 lightly 3,650
8 hub 3,523
9 towhee 3,426
10 prompttools 2,956
11 datachain 2,716
12 fastembed 2,573
13 ailia-models 2,296
14 instructor-embedding 2,021
15 model2vec 1,957
16 GPTDiscord 1,853
17 magnitude 1,652
18 eda_nlp 1,649
19 ModernBERT 1,594
20 hazm 1,332
21 contextualized-topic-models 1,254
22 SeaGOAT 1,238
23 lightly-train 1,178

Sponsored
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io

Did you know that Python is
the 2nd most popular programming language
based on number of references?