Embedding models - Docs by LangChain

Overview

This overview covers text-based embedding models. LangChain does not currently support multimodal embeddings.

Embedding models transform raw text—such as a sentence, paragraph, or tweet—into a fixed-length vector of numbers that captures its semantic meaning. These vectors allow machines to compare and search text based on meaning rather than exact words. In practice, this means that texts with similar ideas are placed close together in the vector space. For example, instead of matching only the phrase “machine learning”, embeddings can surface documents that discuss related concepts even when different wording is used.

How it works

Vectorization — The model encodes each input string as a high-dimensional vector.
Similarity scoring — Vectors are compared using mathematical metrics to measure how closely related the underlying texts are.

Similarity metrics

Several metrics are commonly used to compare embeddings:

Cosine similarity — measures the angle between two vectors.
Euclidean distance — measures the straight-line distance between points.
Dot product — measures how much one vector projects onto another.

Here’s an example of computing cosine similarity between two vectors:

import numpy as np  def cosine_similarity(vec1, vec2):  dot = np.dot(vec1, vec2)  return dot / (np.linalg.norm(vec1) * np.linalg.norm(vec2))  similarity = cosine_similarity(query_embedding, document_embedding) print("Cosine Similarity:", similarity) 

Interface

LangChain provides a standard interface for text embedding models (e.g., OpenAI, Cohere, Hugging Face) via the Embeddings interface. Two main methods are available:

embed_documents(texts: List[str]) → List[List[float]]: Embeds a list of documents.
embed_query(text: str) → List[float]: Embeds a single query.

The interface allows queries and documents to be embedded with different strategies, though most providers handle them the same way in practice.

Top integrations

Provider	Package
AzureOpenAI	langchain-openai
Ollama	langchain-ollama
Fake	langchain-core
OpenAI	langchain-openai
Google Gemini	langchain-google-genai
Together	langchain-together
Fireworks	langchain-fireworks
MistralAI	langchain-mistralai
Cohere	langchain-cohere
AI/ML API	langchain-aimlapi
Nomic	langchain-nomic
Databricks	databricks-langchain
IBM	langchain-ibm
NVIDIA	langchain-nvidia

Install and use

OpenAI

pip install -qU langchain-openai

import getpass import os  if not os.environ.get("OPENAI_API_KEY"):  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")  from langchain_openai import OpenAIEmbeddings  embeddings = OpenAIEmbeddings(model="text-embedding-3-large")  embeddings.embed_query("Hello, world!") 

Azure

pip install -qU "langchain[azure]"

import getpass import os  if not os.environ.get("AZURE_OPENAI_API_KEY"):  os.environ["AZURE_OPENAI_API_KEY"] = getpass.getpass("Enter API key for Azure: ")  from langchain_openai import AzureOpenAIEmbeddings  embeddings = AzureOpenAIEmbeddings(  azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],  azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],  openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"], )  embeddings.embed_query("Hello, world!") 

Google Gemini

pip install -qU langchain-google-genai

import getpass import os  if not os.environ.get("GOOGLE_API_KEY"):  os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter API key for Google Gemini: ")  from langchain_google_genai import GoogleGenerativeAIEmbeddings  embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")  embeddings.embed_query("Hello, world!") 

Google Vertex

pip install -qU langchain-google-vertexai

from langchain_google_vertexai import VertexAIEmbeddings  embeddings = VertexAIEmbeddings(model="text-embedding-005")  embeddings.embed_query("Hello, world!") 

AWS

pip install -qU langchain-aws

from langchain_aws import BedrockEmbeddings  embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")  embeddings.embed_query("Hello, world!") 

HuggingFace

pip install -qU langchain-huggingface

from langchain_huggingface import HuggingFaceEmbeddings  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2") 

embeddings.embed_query("Hello, world!")

Ollama

pip install -qU langchain-ollama

from langchain_ollama import OllamaEmbeddings  embeddings = OllamaEmbeddings(model="llama3")  embeddings.embed_query("Hello, world!") 

Cohere

pip install -qU langchain-cohere

import getpass import os  if not os.environ.get("COHERE_API_KEY"):  os.environ["COHERE_API_KEY"] = getpass.getpass("Enter API key for Cohere: ")  from langchain_cohere import CohereEmbeddings  embeddings = CohereEmbeddings(model="embed-english-v3.0")  embeddings.embed_query("Hello, world!") 

Mistral AI

pip install -qU langchain-mistralai

import getpass import os  if not os.environ.get("MISTRALAI_API_KEY"):  os.environ["MISTRALAI_API_KEY"] = getpass.getpass("Enter API key for MistralAI: ")  from langchain_mistralai import MistralAIEmbeddings  embeddings = MistralAIEmbeddings(model="mistral-embed")  embeddings.embed_query("Hello, world!") 

Nomic

pip install -qU langchain-nomic

import getpass import os  if not os.environ.get("NOMIC_API_KEY"):  os.environ["NOMIC_API_KEY"] = getpass.getpass("Enter API key for Nomic: ")  from langchain_nomic import NomicEmbeddings  embeddings = NomicEmbeddings(model="nomic-embed-text-v1.5")  embeddings.embed_query("Hello, world!") 

NVIDIA

pip install -qU langchain-nvidia-ai-endpoints

import getpass import os  if not os.environ.get("NVIDIA_API_KEY"):  os.environ["NVIDIA_API_KEY"] = getpass.getpass("Enter API key for NVIDIA: ")  from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings  embeddings = NVIDIAEmbeddings(model="NV-Embed-QA")  embeddings.embed_query("Hello, world!") 

Voyage AI

pip install -qU langchain-voyageai

import getpass import os  if not os.environ.get("VOYAGE_API_KEY"):  os.environ["VOYAGE_API_KEY"] = getpass.getpass("Enter API key for Voyage AI: ")  from langchain-voyageai import VoyageAIEmbeddings  embeddings = VoyageAIEmbeddings(model="voyage-3")  embeddings.embed_query("Hello, world!") 

IBM watsonx

pip install -qU langchain-ibm

import getpass import os  if not os.environ.get("WATSONX_APIKEY"):  os.environ["WATSONX_APIKEY"] = getpass.getpass("Enter API key for IBM watsonx: ")  from langchain_ibm import WatsonxEmbeddings  embeddings = WatsonxEmbeddings(  model_id="ibm/slate-125m-english-rtrvr",  url="https://us-south.ml.cloud.ibm.com",  project_id="<WATSONX PROJECT_ID>", )  embeddings.embed_query("Hello, world!") 

Fake

pip install -qU langchain-core

from langchain_core.embeddings import DeterministicFakeEmbedding  embeddings = DeterministicFakeEmbedding(size=4096)  embeddings.embed_query("Hello, world!") 

xAI

pip install -qU "langchain[langchain-xai]"

import getpass import os  if not os.environ.get("XAI_API_KEY"):  os.environ["XAI_API_KEY"] = getpass.getpass("Enter API key for xAI: ")  from langchain.chat_models import init_chat_model  model = init_chat_model("grok-2", model_provider="xai")  embeddings.embed_query("Hello, world!") 

Perplexity

pip install -qU "langchain[langchain-perplexity]"

import getpass import os  if not os.environ.get("PPLX_API_KEY"):  os.environ["PPLX_API_KEY"] = getpass.getpass("Enter API key for Perplexity: ")  from langchain.chat_models import init_chat_model  model = init_chat_model("llama-3.1-sonar-small-128k-online", model_provider="perplexity")  embeddings.embed_query("Hello, world!") 

DeepSeek

pip install -qU "langchain[langchain-deepseek]"

import getpass import os  if not os.environ.get("DEEPSEEK_API_KEY"):  os.environ["DEEPSEEK_API_KEY"] = getpass.getpass("Enter API key for DeepSeek: ")  from langchain.chat_models import init_chat_model  model = init_chat_model("deepseek-chat", model_provider="deepseek")  embeddings.embed_query("Hello, world!") 

Caching

Embeddings can be stored or temporarily cached to avoid needing to recompute them. Caching embeddings can be done using a CacheBackedEmbeddings. This wrapper stores embeddings in a key-value store, where the text is hashed and the hash is used as the key in the cache. The main supported way to initialize a CacheBackedEmbeddings is from_bytes_store. It takes the following parameters:

underlying_embedder: The embedder to use for embedding.
document_embedding_cache: Any ByteStore for caching document embeddings.
batch_size: (optional, defaults to None) The number of documents to embed between store updates.
namespace: (optional, defaults to "") The namespace to use for the document cache. Helps avoid collisions (e.g., set it to the embedding model name).
query_embedding_cache: (optional, defaults to None) A ByteStore for caching query embeddings, or True to reuse the same store as document_embedding_cache.

import time from langchain.embeddings import CacheBackedEmbeddings  from langchain.storage import LocalFileStore  from langchain_core.vectorstores import InMemoryVectorStore  # Create your underlying embeddings model underlying_embeddings = ... # e.g., OpenAIEmbeddings(), HuggingFaceEmbeddings(), etc.  # Store persists embeddings to the local filesystem # This isn't for production use, but is useful for local store = LocalFileStore("./cache/")   cached_embedder = CacheBackedEmbeddings.from_bytes_store(  underlying_embeddings,  store,  namespace=underlying_embeddings.model )  # Example: caching a query embedding tic = time.time() print(cached_embedder.embed_query("Hello, world!")) print(f"First call took: {time.time() - tic:.2f} seconds")  # Subsequent calls use the cache tic = time.time() print(cached_embedder.embed_query("Hello, world!")) print(f"Second call took: {time.time() - tic:.2f} seconds") 

In production, you would typically use a more robust persistent store, such as a database or cloud storage. Please see stores integrations for options.

Providers

Integrations by component

​Overview

​How it works

​Similarity metrics

​Interface

​Top integrations

​Install and use

​Caching

​All integrations

Aleph Alpha

Anyscale

Ascend

AI/ML API

AwaDB

AzureOpenAI

Baichuan Text Embeddings

Baidu Qianfan

Bedrock

BGE on Hugging Face

Bookend AI

Clarifai

Cloudflare Workers AI

Clova Embeddings

Cohere

DashScope

Databricks

DeepInfra

EDEN AI

Elasticsearch

Embaas

Fake Embeddings

FastEmbed by Qdrant

Fireworks

Google Gemini

Google Vertex AI

GPT4All

Gradient

GreenNode

Hugging Face

IBM watsonx.ai

Infinity

Instruct Embeddings

IPEX-LLM CPU

IPEX-LLM GPU

Intel Extension for Transformers

Jina

John Snow Labs

LASER

Lindorm

Llama.cpp

LLMRails

LocalAI

MiniMax

MistralAI

Model2Vec

ModelScope

MosaicML

Naver

Nebius

Netmind

NLP Cloud

Nomic

NVIDIA NIMs

Oracle Cloud Infrastructure

Ollama

OpenClip

OpenAI

OpenVINO

Optimum Intel

Oracle AI Vector Search

OVHcloud

Pinecone Embeddings

PredictionGuard

PremAI

SageMaker

SambaNovaCloud

SambaStudio

Self Hosted

Sentence Transformers

Overview

How it works

Similarity metrics

Interface

Top integrations

Install and use

Caching

All integrations