Skip to content

langchain-chroma

LangChain integration for Chroma vector database.

Classes

Chroma

Chroma(  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  embedding_function: Embeddings | None = None,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: Settings | None = None,  collection_metadata: dict | None = None,  collection_configuration: (  CreateCollectionConfiguration | None  ) = None,  client: ClientAPI | None = None,  relevance_score_fn: (  Callable[[float], float] | None  ) = None,  create_collection_if_not_exists: bool | None = True,  *,  ssl: bool = False ) 

Bases: VectorStore

Chroma vector store integration.

Setup

Install chromadb, langchain-chroma packages:

pip install -qU chromadb langchain-chroma 

Key init args — indexing params: collection_name: str Name of the collection. embedding_function: Embeddings Embedding function to use.

Key init args — client params: client: Client | None Chroma client to use. client_settings: chromadb.config.Settings | None Chroma client settings. persist_directory: str | None Directory to persist the collection. host: str | None Hostname of a deployed Chroma server. port: int | None Connection port for a deployed Chroma server. Default is 8000. ssl: bool | None Whether to establish an SSL connection with a deployed Chroma server. Default is False. headers: dict[str, str] | None HTTP headers to send to a deployed Chroma server. chroma_cloud_api_key: str | None Chroma Cloud API key. tenant: str | None Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. database: str | None Database name. Required for Chroma Cloud connections. Default is 'default_database'.

Instantiate
from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings  vector_store = Chroma(  collection_name="foo",  embedding_function=OpenAIEmbeddings(),  # other params... ) 
Add Documents
from langchain_core.documents import Document  document_1 = Document(page_content="foo", metadata={"baz": "bar"}) document_2 = Document(page_content="thud", metadata={"bar": "baz"}) document_3 = Document(page_content="i will be deleted :(")  documents = [document_1, document_2, document_3] ids = ["1", "2", "3"] vector_store.add_documents(documents=documents, ids=ids) 
Update Documents
updated_document = Document(  page_content="qux",  metadata={"bar": "baz"}, )  vector_store.update_documents(ids=["1"], documents=[updated_document]) 
Delete Documents
vector_store.delete(ids=["3"]) 
Search with filter

results = vector_store.similarity_search(  query="thud", k=1, filter={"baz": "bar"} ) for doc in results:  print(f"* {doc.page_content} [{doc.metadata}]") 
*foo[{"baz": "bar"}] 

Search with score

results = vector_store.similarity_search_with_score(query="qux", k=1) for doc, score in results:  print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]") 
* [SIM=0.000000] qux [{'bar': 'baz', 'baz': 'bar'}] 

Async
# add documents # await vector_store.aadd_documents(documents=documents, ids=ids)  # delete documents # await vector_store.adelete(ids=["3"])  # search # results = vector_store.asimilarity_search(query="thud",k=1)  # search with score results = await vector_store.asimilarity_search_with_score(query="qux", k=1) for doc, score in results:  print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]") 
* [SIM=0.335463] foo [{'baz': 'bar'}] 
Use as Retriever
retriever = vector_store.as_retriever(  search_type="mmr",  search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5}, ) retriever.invoke("thud") 
[Document(metadata={"baz": "bar"}, page_content="thud")] 

Initialize with a Chroma client.

PARAMETER DESCRIPTION

collection_name

Name of the collection to create.

TYPE: str DEFAULT: _LANGCHAIN_DEFAULT_COLLECTION_NAME

embedding_function

Embedding class object. Used to embed texts.

TYPE: Embeddings | None DEFAULT: None

persist_directory

Directory to persist the collection.

TYPE: str | None DEFAULT: None

host

Hostname of a deployed Chroma server.

TYPE: str | None DEFAULT: None

port

Connection port for a deployed Chroma server. Default is 8000.

TYPE: int | None DEFAULT: None

ssl

Whether to establish an SSL connection with a deployed Chroma server. Default is False.

TYPE: bool DEFAULT: False

headers

HTTP headers to send to a deployed Chroma server.

TYPE: dict[str, str] | None DEFAULT: None

chroma_cloud_api_key

Chroma Cloud API key.

TYPE: str | None DEFAULT: None

tenant

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

TYPE: str | None DEFAULT: None

database

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

TYPE: str | None DEFAULT: None

client_settings

Chroma client settings

TYPE: Settings | None DEFAULT: None

collection_metadata

Collection configurations.

TYPE: dict | None DEFAULT: None

collection_configuration

Index configuration for the collection.

TYPE: CreateCollectionConfiguration | None DEFAULT: None

client

TYPE: ClientAPI | None DEFAULT: None

relevance_score_fn

Function to calculate relevance score from distance. Used only in similarity_search_with_relevance_scores

TYPE: Callable[[float], float] | None DEFAULT: None

create_collection_if_not_exists

Whether to create collection if it doesn't exist. Defaults to True.

TYPE: bool | None DEFAULT: True

METHOD DESCRIPTION
aget_by_ids

Async get documents by their IDs.

adelete

Async delete by vector ID or other criteria.

aadd_texts

Async run more texts through the embeddings and add to the vectorstore.

add_documents

Add or update documents in the vectorstore.

aadd_documents

Async run more documents through the embeddings and add to the vectorstore.

search

Return docs most similar to query using a specified search type.

asearch

Async return docs most similar to query using a specified search type.

asimilarity_search_with_score

Async run similarity search with distance.

similarity_search_with_relevance_scores

Return docs and relevance scores in the range [0, 1].

asimilarity_search_with_relevance_scores

Async return docs and relevance scores in the range [0, 1].

asimilarity_search

Async return docs most similar to query.

asimilarity_search_by_vector

Async return docs most similar to embedding vector.

amax_marginal_relevance_search

Async return docs selected using the maximal marginal relevance.

amax_marginal_relevance_search_by_vector

Async return docs selected using the maximal marginal relevance.

afrom_documents

Async return VectorStore initialized from documents and embeddings.

afrom_texts

Async return VectorStore initialized from texts and embeddings.

as_retriever

Return VectorStoreRetriever initialized from this VectorStore.

encode_image

Get base64 string from image URI.

fork

Fork this vector store.

add_images

Run more images through the embeddings and add to the vectorstore.

add_texts

Run more texts through the embeddings and add to the vectorstore.

similarity_search

Run similarity search with Chroma.

similarity_search_by_vector

Return docs most similar to embedding vector.

similarity_search_by_vector_with_relevance_scores

Return docs most similar to embedding vector and similarity score.

similarity_search_with_score

Run similarity search with Chroma with distance.

similarity_search_with_vectors

Run similarity search with Chroma with vectors.

similarity_search_by_image

Search for similar images based on the given image URI.

similarity_search_by_image_with_relevance_score

Search for similar images based on the given image URI.

max_marginal_relevance_search_by_vector

Return docs selected using the maximal marginal relevance.

max_marginal_relevance_search

Return docs selected using the maximal marginal relevance.

delete_collection

Delete the collection.

reset_collection

Resets the collection.

get

Gets the collection.

get_by_ids

Get documents by their IDs.

update_document

Update a document in the collection.

update_documents

Update a document in the collection.

from_texts

Create a Chroma vectorstore from a raw documents.

from_documents

Create a Chroma vectorstore from a list of documents.

delete

Delete by vector IDs.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def __init__(  self,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  embedding_function: Embeddings | None = None,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: chromadb.config.Settings | None = None,  collection_metadata: dict | None = None,  collection_configuration: CreateCollectionConfiguration | None = None,  client: chromadb.ClientAPI | None = None,  relevance_score_fn: Callable[[float], float] | None = None,  create_collection_if_not_exists: bool | None = True, # noqa: FBT001, FBT002  *,  ssl: bool = False, ) -> None:  """Initialize with a Chroma client.   Args:  collection_name: Name of the collection to create.  embedding_function: Embedding class object. Used to embed texts.  persist_directory: Directory to persist the collection.  host: Hostname of a deployed Chroma server.  port: Connection port for a deployed Chroma server. Default is 8000.  ssl: Whether to establish an SSL connection with a deployed Chroma server.  Default is False.  headers: HTTP headers to send to a deployed Chroma server.  chroma_cloud_api_key: Chroma Cloud API key.  tenant: Tenant ID. Required for Chroma Cloud connections.  Default is 'default_tenant' for local Chroma servers.  database: Database name. Required for Chroma Cloud connections.  Default is 'default_database'.  client_settings: Chroma client settings  collection_metadata: Collection configurations.  collection_configuration: Index configuration for the collection.   client: Chroma client. Documentation:  https://docs.trychroma.com/reference/python/client  relevance_score_fn: Function to calculate relevance score from distance.  Used only in `similarity_search_with_relevance_scores`  create_collection_if_not_exists: Whether to create collection  if it doesn't exist. Defaults to `True`.  """  _tenant = tenant or chromadb.DEFAULT_TENANT  _database = database or chromadb.DEFAULT_DATABASE  _settings = client_settings or Settings()   client_args = {  "persist_directory": persist_directory,  "host": host,  "chroma_cloud_api_key": chroma_cloud_api_key,  }   if sum(arg is not None for arg in client_args.values()) > 1:  provided = [  name for name, value in client_args.items() if value is not None  ]  msg = (  f"Only one of 'persist_directory', 'host' and 'chroma_cloud_api_key' "  f"is allowed, but got {','.join(provided)}"  )  raise ValueError(msg)   if client is not None:  self._client = client   # PersistentClient  elif persist_directory is not None:  self._client = chromadb.PersistentClient(  path=persist_directory,  settings=_settings,  tenant=_tenant,  database=_database,  )   # HttpClient  elif host is not None:  _port = port or 8000  self._client = chromadb.HttpClient(  host=host,  port=_port,  ssl=ssl,  headers=headers,  settings=_settings,  tenant=_tenant,  database=_database,  )   # CloudClient  elif chroma_cloud_api_key is not None:  if not tenant or not database:  msg = (  "Must provide tenant and database values to connect to Chroma Cloud"  )  raise ValueError(msg)  self._client = chromadb.CloudClient(  tenant=tenant,  database=database,  api_key=chroma_cloud_api_key,  settings=_settings,  )   else:  self._client = chromadb.Client(settings=_settings)   self._embedding_function = embedding_function  self._chroma_collection: chromadb.Collection | None = None  self._collection_name = collection_name  self._collection_metadata = collection_metadata  self._collection_configuration = collection_configuration  if create_collection_if_not_exists:  self.__ensure_collection()  else:  self._chroma_collection = self._client.get_collection(name=collection_name)  self.override_relevance_score_fn = relevance_score_fn 

Attributes

embeddings property
embeddings: Embeddings | None 

Access the query embedding object.

Functions

aget_by_ids async
aget_by_ids(ids: Sequence[str]) -> list[Document] 

Async get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

PARAMETER DESCRIPTION
ids

List of ids to retrieve.

TYPE: Sequence[str]

RETURNS DESCRIPTION
list[Document]

List of Documents.

Added in version 0.2.11

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def aget_by_ids(self, ids: Sequence[str], /) -> list[Document]:  """Async get documents by their IDs.   The returned documents are expected to have the ID field set to the ID of the  document in the vector store.   Fewer documents may be returned than requested if some IDs are not found or  if there are duplicated IDs.   Users should not assume that the order of the returned documents matches  the order of the input IDs. Instead, users should rely on the ID field of the  returned documents.   This method should **NOT** raise exceptions if no documents are found for  some IDs.   Args:  ids: List of ids to retrieve.   Returns:  List of Documents.   !!! version-added "Added in version 0.2.11"  """  return await run_in_executor(None, self.get_by_ids, ids) 
adelete async
adelete(  ids: list[str] | None = None, **kwargs: Any ) -> bool | None 

Async delete by vector ID or other criteria.

PARAMETER DESCRIPTION
ids

List of ids to delete. If None, delete all. Default is None.

TYPE: list[str] | None DEFAULT: None

**kwargs

Other keyword arguments that subclasses might use.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
bool | None

True if deletion is successful, False otherwise, None if not implemented.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def adelete(self, ids: list[str] | None = None, **kwargs: Any) -> bool | None:  """Async delete by vector ID or other criteria.   Args:  ids: List of ids to delete. If `None`, delete all. Default is None.  **kwargs: Other keyword arguments that subclasses might use.   Returns:  True if deletion is successful, False otherwise, None if not implemented.  """  return await run_in_executor(None, self.delete, ids, **kwargs) 
aadd_texts async
aadd_texts(  texts: Iterable[str],  metadatas: list[dict] | None = None,  *,  ids: list[str] | None = None,  **kwargs: Any ) -> list[str] 

Async run more texts through the embeddings and add to the vectorstore.

PARAMETER DESCRIPTION
texts

Iterable of strings to add to the vectorstore.

TYPE: Iterable[str]

metadatas

Optional list of metadatas associated with the texts. Default is None.

TYPE: list[dict] | None DEFAULT: None

ids

Optional list

TYPE: list[str] | None DEFAULT: None

**kwargs

vectorstore specific parameters.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[str]

List of ids from adding the texts into the vectorstore.

RAISES DESCRIPTION
ValueError

If the number of metadatas does not match the number of texts.

ValueError

If the number of ids does not match the number of texts.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def aadd_texts(  self,  texts: Iterable[str],  metadatas: list[dict] | None = None,  *,  ids: list[str] | None = None,  **kwargs: Any, ) -> list[str]:  """Async run more texts through the embeddings and add to the vectorstore.   Args:  texts: Iterable of strings to add to the vectorstore.  metadatas: Optional list of metadatas associated with the texts.  Default is None.  ids: Optional list  **kwargs: vectorstore specific parameters.   Returns:  List of ids from adding the texts into the vectorstore.   Raises:  ValueError: If the number of metadatas does not match the number of texts.  ValueError: If the number of ids does not match the number of texts.  """  if ids is not None:  # For backward compatibility  kwargs["ids"] = ids  if type(self).aadd_documents != VectorStore.aadd_documents:  # This condition is triggered if the subclass has provided  # an implementation of the upsert method.  # The existing add_texts  texts_: Sequence[str] = (  texts if isinstance(texts, (list, tuple)) else list(texts)  )  if metadatas and len(metadatas) != len(texts_):  msg = (  "The number of metadatas must match the number of texts."  f"Got {len(metadatas)} metadatas and {len(texts_)} texts."  )  raise ValueError(msg)  metadatas_ = iter(metadatas) if metadatas else cycle([{}])  ids_: Iterator[str | None] = iter(ids) if ids else cycle([None])   docs = [  Document(id=id_, page_content=text, metadata=metadata_)  for text, metadata_, id_ in zip(texts, metadatas_, ids_, strict=False)  ]  return await self.aadd_documents(docs, **kwargs)  return await run_in_executor(None, self.add_texts, texts, metadatas, **kwargs) 
add_documents
add_documents(  documents: list[Document], **kwargs: Any ) -> list[str] 

Add or update documents in the vectorstore.

PARAMETER DESCRIPTION
documents

Documents to add to the vectorstore.

TYPE: list[Document]

**kwargs

Additional keyword arguments. if kwargs contains ids and documents contain ids, the ids in the kwargs will receive precedence.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[str]

List of IDs of the added texts.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
def add_documents(self, documents: list[Document], **kwargs: Any) -> list[str]:  """Add or update documents in the vectorstore.   Args:  documents: Documents to add to the vectorstore.  **kwargs: Additional keyword arguments.  if kwargs contains ids and documents contain ids,  the ids in the kwargs will receive precedence.   Returns:  List of IDs of the added texts.  """  if type(self).add_texts != VectorStore.add_texts:  if "ids" not in kwargs:  ids = [doc.id for doc in documents]   # If there's at least one valid ID, we'll assume that IDs  # should be used.  if any(ids):  kwargs["ids"] = ids   texts = [doc.page_content for doc in documents]  metadatas = [doc.metadata for doc in documents]  return self.add_texts(texts, metadatas, **kwargs)  msg = (  f"`add_documents` and `add_texts` has not been implemented "  f"for {self.__class__.__name__} "  )  raise NotImplementedError(msg) 
aadd_documents async
aadd_documents(  documents: list[Document], **kwargs: Any ) -> list[str] 

Async run more documents through the embeddings and add to the vectorstore.

PARAMETER DESCRIPTION
documents

Documents to add to the vectorstore.

TYPE: list[Document]

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[str]

List of IDs of the added texts.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def aadd_documents(  self, documents: list[Document], **kwargs: Any ) -> list[str]:  """Async run more documents through the embeddings and add to the vectorstore.   Args:  documents: Documents to add to the vectorstore.  **kwargs: Additional keyword arguments.   Returns:  List of IDs of the added texts.  """  # If the async method has been overridden, we'll use that.  if type(self).aadd_texts != VectorStore.aadd_texts:  if "ids" not in kwargs:  ids = [doc.id for doc in documents]   # If there's at least one valid ID, we'll assume that IDs  # should be used.  if any(ids):  kwargs["ids"] = ids   texts = [doc.page_content for doc in documents]  metadatas = [doc.metadata for doc in documents]  return await self.aadd_texts(texts, metadatas, **kwargs)   return await run_in_executor(None, self.add_documents, documents, **kwargs) 
search
search(  query: str, search_type: str, **kwargs: Any ) -> list[Document] 

Return docs most similar to query using a specified search type.

PARAMETER DESCRIPTION
query

Input text

TYPE: str

search_type

Type of search to perform. Can be "similarity", "mmr", or "similarity_score_threshold".

TYPE: str

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents most similar to the query.

RAISES DESCRIPTION
ValueError

If search_type is not one of "similarity", "mmr", or "similarity_score_threshold".

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
def search(self, query: str, search_type: str, **kwargs: Any) -> list[Document]:  """Return docs most similar to query using a specified search type.   Args:  query: Input text  search_type: Type of search to perform. Can be "similarity",  "mmr", or "similarity_score_threshold".  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents most similar to the query.   Raises:  ValueError: If search_type is not one of "similarity",  "mmr", or "similarity_score_threshold".  """  if search_type == "similarity":  return self.similarity_search(query, **kwargs)  if search_type == "similarity_score_threshold":  docs_and_similarities = self.similarity_search_with_relevance_scores(  query, **kwargs  )  return [doc for doc, _ in docs_and_similarities]  if search_type == "mmr":  return self.max_marginal_relevance_search(query, **kwargs)  msg = (  f"search_type of {search_type} not allowed. Expected "  "search_type to be 'similarity', 'similarity_score_threshold'"  " or 'mmr'."  )  raise ValueError(msg) 
asearch async
asearch(  query: str, search_type: str, **kwargs: Any ) -> list[Document] 

Async return docs most similar to query using a specified search type.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

search_type

Type of search to perform. Can be "similarity", "mmr", or "similarity_score_threshold".

TYPE: str

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents most similar to the query.

RAISES DESCRIPTION
ValueError

If search_type is not one of "similarity", "mmr", or "similarity_score_threshold".

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def asearch(  self, query: str, search_type: str, **kwargs: Any ) -> list[Document]:  """Async return docs most similar to query using a specified search type.   Args:  query: Input text.  search_type: Type of search to perform. Can be "similarity",  "mmr", or "similarity_score_threshold".  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents most similar to the query.   Raises:  ValueError: If search_type is not one of "similarity",  "mmr", or "similarity_score_threshold".  """  if search_type == "similarity":  return await self.asimilarity_search(query, **kwargs)  if search_type == "similarity_score_threshold":  docs_and_similarities = await self.asimilarity_search_with_relevance_scores(  query, **kwargs  )  return [doc for doc, _ in docs_and_similarities]  if search_type == "mmr":  return await self.amax_marginal_relevance_search(query, **kwargs)  msg = (  f"search_type of {search_type} not allowed. Expected "  "search_type to be 'similarity', 'similarity_score_threshold' or 'mmr'."  )  raise ValueError(msg) 
asimilarity_search_with_score async
asimilarity_search_with_score(  *args: Any, **kwargs: Any ) -> list[tuple[Document, float]] 

Async run similarity search with distance.

PARAMETER DESCRIPTION
*args

Arguments to pass to the search method.

TYPE: Any DEFAULT: ()

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of Tuples of (doc, similarity_score).

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def asimilarity_search_with_score(  self, *args: Any, **kwargs: Any ) -> list[tuple[Document, float]]:  """Async run similarity search with distance.   Args:  *args: Arguments to pass to the search method.  **kwargs: Arguments to pass to the search method.   Returns:  List of Tuples of (doc, similarity_score).  """  # This is a temporary workaround to make the similarity search  # asynchronous. The proper solution is to make the similarity search  # asynchronous in the vector store implementations.  return await run_in_executor(  None, self.similarity_search_with_score, *args, **kwargs  ) 
similarity_search_with_relevance_scores
similarity_search_with_relevance_scores(  query: str, k: int = 4, **kwargs: Any ) -> list[tuple[Document, float]] 

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: 4

**kwargs

kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to filter the resulting set of retrieved docs.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of Tuples of (doc, similarity_score).

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
def similarity_search_with_relevance_scores(  self,  query: str,  k: int = 4,  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Return docs and relevance scores in the range [0, 1].   0 is dissimilar, 1 is most similar.   Args:  query: Input text.  k: Number of Documents to return. Defaults to 4.  **kwargs: kwargs to be passed to similarity search. Should include:  score_threshold: Optional, a floating point value between 0 to 1 to  filter the resulting set of retrieved docs.   Returns:  List of Tuples of (doc, similarity_score).  """  score_threshold = kwargs.pop("score_threshold", None)   docs_and_similarities = self._similarity_search_with_relevance_scores(  query, k=k, **kwargs  )  if any(  similarity < 0.0 or similarity > 1.0  for _, similarity in docs_and_similarities  ):  warnings.warn(  "Relevance scores must be between"  f" 0 and 1, got {docs_and_similarities}",  stacklevel=2,  )   if score_threshold is not None:  docs_and_similarities = [  (doc, similarity)  for doc, similarity in docs_and_similarities  if similarity >= score_threshold  ]  if len(docs_and_similarities) == 0:  logger.warning(  "No relevant docs were retrieved using the "  "relevance score threshold %s",  score_threshold,  )  return docs_and_similarities 
asimilarity_search_with_relevance_scores async
asimilarity_search_with_relevance_scores(  query: str, k: int = 4, **kwargs: Any ) -> list[tuple[Document, float]] 

Async return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: 4

**kwargs

kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to filter the resulting set of retrieved docs

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of Tuples of (doc, similarity_score)

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def asimilarity_search_with_relevance_scores(  self,  query: str,  k: int = 4,  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Async return docs and relevance scores in the range [0, 1].   0 is dissimilar, 1 is most similar.   Args:  query: Input text.  k: Number of Documents to return. Defaults to 4.  **kwargs: kwargs to be passed to similarity search. Should include:  score_threshold: Optional, a floating point value between 0 to 1 to  filter the resulting set of retrieved docs   Returns:  List of Tuples of (doc, similarity_score)  """  score_threshold = kwargs.pop("score_threshold", None)   docs_and_similarities = await self._asimilarity_search_with_relevance_scores(  query, k=k, **kwargs  )  if any(  similarity < 0.0 or similarity > 1.0  for _, similarity in docs_and_similarities  ):  warnings.warn(  "Relevance scores must be between"  f" 0 and 1, got {docs_and_similarities}",  stacklevel=2,  )   if score_threshold is not None:  docs_and_similarities = [  (doc, similarity)  for doc, similarity in docs_and_similarities  if similarity >= score_threshold  ]  if len(docs_and_similarities) == 0:  logger.warning(  "No relevant docs were retrieved using the "  "relevance score threshold %s",  score_threshold,  )  return docs_and_similarities 
asimilarity_search(  query: str, k: int = 4, **kwargs: Any ) -> list[Document] 

Async return docs most similar to query.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: 4

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents most similar to the query.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def asimilarity_search(  self, query: str, k: int = 4, **kwargs: Any ) -> list[Document]:  """Async return docs most similar to query.   Args:  query: Input text.  k: Number of Documents to return. Defaults to 4.  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents most similar to the query.  """  # This is a temporary workaround to make the similarity search  # asynchronous. The proper solution is to make the similarity search  # asynchronous in the vector store implementations.  return await run_in_executor(None, self.similarity_search, query, k=k, **kwargs) 
asimilarity_search_by_vector async
asimilarity_search_by_vector(  embedding: list[float], k: int = 4, **kwargs: Any ) -> list[Document] 

Async return docs most similar to embedding vector.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: list[float]

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: 4

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents most similar to the query vector.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def asimilarity_search_by_vector(  self, embedding: list[float], k: int = 4, **kwargs: Any ) -> list[Document]:  """Async return docs most similar to embedding vector.   Args:  embedding: Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents most similar to the query vector.  """  # This is a temporary workaround to make the similarity search  # asynchronous. The proper solution is to make the similarity search  # asynchronous in the vector store implementations.  return await run_in_executor(  None, self.similarity_search_by_vector, embedding, k=k, **kwargs  ) 
amax_marginal_relevance_search(  query: str,  k: int = 4,  fetch_k: int = 20,  lambda_mult: float = 0.5,  **kwargs: Any ) -> list[Document] 

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER DESCRIPTION
query

Text to look up documents similar to.

TYPE: str

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: 4

fetch_k

Number of Documents to fetch to pass to MMR algorithm. Default is 20.

TYPE: int DEFAULT: 20

lambda_mult

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

TYPE: float DEFAULT: 0.5

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents selected by maximal marginal relevance.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def amax_marginal_relevance_search(  self,  query: str,  k: int = 4,  fetch_k: int = 20,  lambda_mult: float = 0.5,  **kwargs: Any, ) -> list[Document]:  """Async return docs selected using the maximal marginal relevance.   Maximal marginal relevance optimizes for similarity to query AND diversity  among selected documents.   Args:  query: Text to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  fetch_k: Number of Documents to fetch to pass to MMR algorithm.  Default is 20.  lambda_mult: Number between 0 and 1 that determines the degree  of diversity among the results with 0 corresponding  to maximum diversity and 1 to minimum diversity.  Defaults to 0.5.  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents selected by maximal marginal relevance.  """  # This is a temporary workaround to make the similarity search  # asynchronous. The proper solution is to make the similarity search  # asynchronous in the vector store implementations.  return await run_in_executor(  None,  self.max_marginal_relevance_search,  query,  k=k,  fetch_k=fetch_k,  lambda_mult=lambda_mult,  **kwargs,  ) 
amax_marginal_relevance_search_by_vector async
amax_marginal_relevance_search_by_vector(  embedding: list[float],  k: int = 4,  fetch_k: int = 20,  lambda_mult: float = 0.5,  **kwargs: Any ) -> list[Document] 

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: list[float]

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: 4

fetch_k

Number of Documents to fetch to pass to MMR algorithm. Default is 20.

TYPE: int DEFAULT: 20

lambda_mult

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

TYPE: float DEFAULT: 0.5

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents selected by maximal marginal relevance.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
async def amax_marginal_relevance_search_by_vector(  self,  embedding: list[float],  k: int = 4,  fetch_k: int = 20,  lambda_mult: float = 0.5,  **kwargs: Any, ) -> list[Document]:  """Async return docs selected using the maximal marginal relevance.   Maximal marginal relevance optimizes for similarity to query AND diversity  among selected documents.   Args:  embedding: Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  fetch_k: Number of Documents to fetch to pass to MMR algorithm.  Default is 20.  lambda_mult: Number between 0 and 1 that determines the degree  of diversity among the results with 0 corresponding  to maximum diversity and 1 to minimum diversity.  Defaults to 0.5.  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents selected by maximal marginal relevance.  """  return await run_in_executor(  None,  self.max_marginal_relevance_search_by_vector,  embedding,  k=k,  fetch_k=fetch_k,  lambda_mult=lambda_mult,  **kwargs,  ) 
afrom_documents async classmethod
afrom_documents(  documents: list[Document],  embedding: Embeddings,  **kwargs: Any ) -> Self 

Async return VectorStore initialized from documents and embeddings.

PARAMETER DESCRIPTION
documents

List of Documents to add to the vectorstore.

TYPE: list[Document]

embedding

Embedding function to use.

TYPE: Embeddings

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Self

VectorStore initialized from documents and embeddings.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
@classmethod async def afrom_documents(  cls,  documents: list[Document],  embedding: Embeddings,  **kwargs: Any, ) -> Self:  """Async return VectorStore initialized from documents and embeddings.   Args:  documents: List of Documents to add to the vectorstore.  embedding: Embedding function to use.  **kwargs: Additional keyword arguments.   Returns:  VectorStore initialized from documents and embeddings.  """  texts = [d.page_content for d in documents]  metadatas = [d.metadata for d in documents]   if "ids" not in kwargs:  ids = [doc.id for doc in documents]   # If there's at least one valid ID, we'll assume that IDs  # should be used.  if any(ids):  kwargs["ids"] = ids   return await cls.afrom_texts(texts, embedding, metadatas=metadatas, **kwargs) 
afrom_texts async classmethod
afrom_texts(  texts: list[str],  embedding: Embeddings,  metadatas: list[dict] | None = None,  *,  ids: list[str] | None = None,  **kwargs: Any ) -> Self 

Async return VectorStore initialized from texts and embeddings.

PARAMETER DESCRIPTION
texts

Texts to add to the vectorstore.

TYPE: list[str]

embedding

Embedding function to use.

TYPE: Embeddings

metadatas

Optional list of metadatas associated with the texts. Default is None.

TYPE: list[dict] | None DEFAULT: None

ids

Optional list of IDs associated with the texts.

TYPE: list[str] | None DEFAULT: None

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Self

VectorStore initialized from texts and embeddings.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
@classmethod async def afrom_texts(  cls,  texts: list[str],  embedding: Embeddings,  metadatas: list[dict] | None = None,  *,  ids: list[str] | None = None,  **kwargs: Any, ) -> Self:  """Async return VectorStore initialized from texts and embeddings.   Args:  texts: Texts to add to the vectorstore.  embedding: Embedding function to use.  metadatas: Optional list of metadatas associated with the texts.  Default is None.  ids: Optional list of IDs associated with the texts.  **kwargs: Additional keyword arguments.   Returns:  VectorStore initialized from texts and embeddings.  """  if ids is not None:  kwargs["ids"] = ids  return await run_in_executor(  None, cls.from_texts, texts, embedding, metadatas, **kwargs  ) 
as_retriever
as_retriever(**kwargs: Any) -> VectorStoreRetriever 

Return VectorStoreRetriever initialized from this VectorStore.

PARAMETER DESCRIPTION
**kwargs

Keyword arguments to pass to the search function. Can include: search_type: Defines the type of search that the Retriever should perform. Can be "similarity" (default), "mmr", or "similarity_score_threshold". search_kwargs: Keyword arguments to pass to the search function. Can include things like: k: Amount of documents to return (Default: 4) score_threshold: Minimum relevance threshold for similarity_score_threshold fetch_k: Amount of documents to pass to MMR algorithm (Default: 20) lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5) filter: Filter by document metadata

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
VectorStoreRetriever

Retriever class for VectorStore.

Examples:

# Retrieve more documents with higher diversity # Useful if your dataset has many similar documents docsearch.as_retriever(  search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25} )  # Fetch more documents for the MMR algorithm to consider # But only return the top 5 docsearch.as_retriever(search_type="mmr", search_kwargs={"k": 5, "fetch_k": 50})  # Only retrieve documents that have a relevance score # Above a certain threshold docsearch.as_retriever(  search_type="similarity_score_threshold",  search_kwargs={"score_threshold": 0.8}, )  # Only get the single most similar document from the dataset docsearch.as_retriever(search_kwargs={"k": 1})  # Use a filter to only retrieve documents from a specific paper docsearch.as_retriever(  search_kwargs={"filter": {"paper_title": "GPT-4 Technical Report"}} ) 

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py
def as_retriever(self, **kwargs: Any) -> VectorStoreRetriever:  """Return VectorStoreRetriever initialized from this VectorStore.   Args:  **kwargs: Keyword arguments to pass to the search function.  Can include:  search_type: Defines the type of search that the Retriever should  perform. Can be "similarity" (default), "mmr", or  "similarity_score_threshold".  search_kwargs: Keyword arguments to pass to the search function. Can  include things like:  k: Amount of documents to return (Default: 4)  score_threshold: Minimum relevance threshold  for similarity_score_threshold  fetch_k: Amount of documents to pass to MMR algorithm  (Default: 20)  lambda_mult: Diversity of results returned by MMR;  1 for minimum diversity and 0 for maximum. (Default: 0.5)  filter: Filter by document metadata   Returns:  Retriever class for VectorStore.   Examples:  ```python  # Retrieve more documents with higher diversity  # Useful if your dataset has many similar documents  docsearch.as_retriever(  search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25}  )   # Fetch more documents for the MMR algorithm to consider  # But only return the top 5  docsearch.as_retriever(search_type="mmr", search_kwargs={"k": 5, "fetch_k": 50})   # Only retrieve documents that have a relevance score  # Above a certain threshold  docsearch.as_retriever(  search_type="similarity_score_threshold",  search_kwargs={"score_threshold": 0.8},  )   # Only get the single most similar document from the dataset  docsearch.as_retriever(search_kwargs={"k": 1})   # Use a filter to only retrieve documents from a specific paper  docsearch.as_retriever(  search_kwargs={"filter": {"paper_title": "GPT-4 Technical Report"}}  )  ```  """  tags = kwargs.pop("tags", None) or [*self._get_retriever_tags()]  return VectorStoreRetriever(vectorstore=self, tags=tags, **kwargs) 
__ensure_collection
__ensure_collection() -> None 

Ensure that the collection exists or create it.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def __ensure_collection(self) -> None:  """Ensure that the collection exists or create it."""  self._chroma_collection = self._client.get_or_create_collection(  name=self._collection_name,  embedding_function=None,  metadata=self._collection_metadata,  configuration=self._collection_configuration,  ) 
__query_collection
__query_collection(  query_texts: list[str] | None = None,  query_embeddings: list[list[float]] | None = None,  n_results: int = 4,  where: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document] | QueryResult 

Query the chroma collection.

PARAMETER DESCRIPTION
query_texts

List of query texts.

TYPE: list[str] | None DEFAULT: None

query_embeddings

List of query embeddings.

TYPE: list[list[float]] | None DEFAULT: None

n_results

Number of results to return. Defaults to 4.

TYPE: int DEFAULT: 4

where

dict used to filter results by metadata. E.g. {"color" : "red"}.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document] | QueryResult

List of n_results nearest neighbor embeddings for provided

list[Document] | QueryResult

query_embeddings or query_texts.

See more: https://docs.trychroma.com/reference/py-collection#query

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
@xor_args(("query_texts", "query_embeddings")) def __query_collection(  self,  query_texts: list[str] | None = None,  query_embeddings: list[list[float]] | None = None,  n_results: int = 4,  where: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[Document] | chromadb.QueryResult:  """Query the chroma collection.   Args:  query_texts: List of query texts.  query_embeddings: List of query embeddings.  n_results: Number of results to return. Defaults to 4.  where: dict used to filter results by metadata.  E.g. {"color" : "red"}.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of `n_results` nearest neighbor embeddings for provided  query_embeddings or query_texts.   See more: https://docs.trychroma.com/reference/py-collection#query  """  return self._collection.query(  query_texts=query_texts,  query_embeddings=query_embeddings, # type: ignore[arg-type]  n_results=n_results,  where=where, # type: ignore[arg-type]  where_document=where_document, # type: ignore[arg-type]  **kwargs,  ) 
encode_image staticmethod
encode_image(uri: str) -> str 

Get base64 string from image URI.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
@staticmethod def encode_image(uri: str) -> str:  """Get base64 string from image URI."""  with Path(uri).open("rb") as image_file:  return base64.b64encode(image_file.read()).decode("utf-8") 
fork
fork(new_name: str) -> Chroma 

Fork this vector store.

PARAMETER DESCRIPTION
new_name

New name for the forked store.

TYPE: str

RETURNS DESCRIPTION
Chroma

A new Chroma store forked from this vector store.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def fork(self, new_name: str) -> Chroma:  """Fork this vector store.   Args:  new_name: New name for the forked store.   Returns:  A new Chroma store forked from this vector store.   """  forked_collection = self._collection.fork(new_name=new_name)  return Chroma(  client=self._client,  embedding_function=self._embedding_function,  collection_name=forked_collection.name,  ) 
add_images
add_images(  uris: list[str],  metadatas: list[dict] | None = None,  ids: list[str] | None = None, ) -> list[str] 

Run more images through the embeddings and add to the vectorstore.

PARAMETER DESCRIPTION
uris

File path to the image.

TYPE: list[str]

metadatas

Optional list of metadatas. When querying, you can filter on this metadata.

TYPE: list[dict] | None DEFAULT: None

ids

Optional list of IDs. (Items without IDs will be assigned UUIDs)

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

List of IDs of the added images.

RAISES DESCRIPTION
ValueError

When metadata is incorrect.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def add_images(  self,  uris: list[str],  metadatas: list[dict] | None = None,  ids: list[str] | None = None, ) -> list[str]:  """Run more images through the embeddings and add to the vectorstore.   Args:  uris: File path to the image.  metadatas: Optional list of metadatas.  When querying, you can filter on this metadata.  ids: Optional list of IDs. (Items without IDs will be assigned UUIDs)   Returns:  List of IDs of the added images.   Raises:  ValueError: When metadata is incorrect.  """  # Map from uris to b64 encoded strings  b64_texts = [self.encode_image(uri=uri) for uri in uris]  # Populate IDs  if ids is None:  ids = [str(uuid.uuid4()) for _ in uris]  else:  ids = [id_ if id_ is not None else str(uuid.uuid4()) for id_ in ids]  embeddings = None  # Set embeddings  if self._embedding_function is not None and hasattr(  self._embedding_function,  "embed_image",  ):  embeddings = self._embedding_function.embed_image(uris=uris)  if metadatas:  # fill metadatas with empty dicts if somebody  # did not specify metadata for all images  length_diff = len(uris) - len(metadatas)  if length_diff:  metadatas = metadatas + [{}] * length_diff  empty_ids = []  non_empty_ids = []  for idx, m in enumerate(metadatas):  if m:  non_empty_ids.append(idx)  else:  empty_ids.append(idx)  if non_empty_ids:  metadatas = [metadatas[idx] for idx in non_empty_ids]  images_with_metadatas = [b64_texts[idx] for idx in non_empty_ids]  embeddings_with_metadatas = (  [embeddings[idx] for idx in non_empty_ids] if embeddings else None  )  ids_with_metadata = [ids[idx] for idx in non_empty_ids]  try:  self._collection.upsert(  metadatas=metadatas, # type: ignore[arg-type]  embeddings=embeddings_with_metadatas, # type: ignore[arg-type]  documents=images_with_metadatas,  ids=ids_with_metadata,  )  except ValueError as e:  if "Expected metadata value to be" in str(e):  msg = (  "Try filtering complex metadata using "  "langchain_community.vectorstores.utils.filter_complex_metadata."  )  raise ValueError(e.args[0] + "\n\n" + msg) from e  raise e  if empty_ids:  images_without_metadatas = [b64_texts[j] for j in empty_ids]  embeddings_without_metadatas = (  [embeddings[j] for j in empty_ids] if embeddings else None  )  ids_without_metadatas = [ids[j] for j in empty_ids]  self._collection.upsert(  embeddings=embeddings_without_metadatas,  documents=images_without_metadatas,  ids=ids_without_metadatas,  )  else:  self._collection.upsert(  embeddings=embeddings,  documents=b64_texts,  ids=ids,  )  return ids 
add_texts
add_texts(  texts: Iterable[str],  metadatas: list[dict] | None = None,  ids: list[str] | None = None,  **kwargs: Any ) -> list[str] 

Run more texts through the embeddings and add to the vectorstore.

PARAMETER DESCRIPTION
texts

Texts to add to the vectorstore.

TYPE: Iterable[str]

metadatas

Optional list of metadatas. When querying, you can filter on this metadata.

TYPE: list[dict] | None DEFAULT: None

ids

Optional list of IDs. (Items without IDs will be assigned UUIDs)

TYPE: list[str] | None DEFAULT: None

kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[str]

List of IDs of the added texts.

RAISES DESCRIPTION
ValueError

When metadata is incorrect.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def add_texts(  self,  texts: Iterable[str],  metadatas: list[dict] | None = None,  ids: list[str] | None = None,  **kwargs: Any, ) -> list[str]:  """Run more texts through the embeddings and add to the vectorstore.   Args:  texts: Texts to add to the vectorstore.  metadatas: Optional list of metadatas.  When querying, you can filter on this metadata.  ids: Optional list of IDs. (Items without IDs will be assigned UUIDs)  kwargs: Additional keyword arguments.   Returns:  List of IDs of the added texts.   Raises:  ValueError: When metadata is incorrect.  """  if ids is None:  ids = [str(uuid.uuid4()) for _ in texts]  else:  ids = [id_ if id_ is not None else str(uuid.uuid4()) for id_ in ids]   embeddings = None  texts = list(texts)  if self._embedding_function is not None:  embeddings = self._embedding_function.embed_documents(texts)  if metadatas:  # fill metadatas with empty dicts if somebody  # did not specify metadata for all texts  length_diff = len(texts) - len(metadatas)  if length_diff:  metadatas = metadatas + [{}] * length_diff  empty_ids = []  non_empty_ids = []  for idx, m in enumerate(metadatas):  if m:  non_empty_ids.append(idx)  else:  empty_ids.append(idx)  if non_empty_ids:  metadatas = [metadatas[idx] for idx in non_empty_ids]  texts_with_metadatas = [texts[idx] for idx in non_empty_ids]  embeddings_with_metadatas = (  [embeddings[idx] for idx in non_empty_ids]  if embeddings is not None and len(embeddings) > 0  else None  )  ids_with_metadata = [ids[idx] for idx in non_empty_ids]  try:  self._collection.upsert(  metadatas=metadatas, # type: ignore[arg-type]  embeddings=embeddings_with_metadatas, # type: ignore[arg-type]  documents=texts_with_metadatas,  ids=ids_with_metadata,  )  except ValueError as e:  if "Expected metadata value to be" in str(e):  msg = (  "Try filtering complex metadata from the document using "  "langchain_community.vectorstores.utils.filter_complex_metadata."  )  raise ValueError(e.args[0] + "\n\n" + msg) from e  raise e  if empty_ids:  texts_without_metadatas = [texts[j] for j in empty_ids]  embeddings_without_metadatas = (  [embeddings[j] for j in empty_ids] if embeddings else None  )  ids_without_metadatas = [ids[j] for j in empty_ids]  self._collection.upsert(  embeddings=embeddings_without_metadatas, # type: ignore[arg-type]  documents=texts_without_metadatas,  ids=ids_without_metadatas,  )  else:  self._collection.upsert(  embeddings=embeddings, # type: ignore[arg-type]  documents=texts,  ids=ids,  )  return ids 
similarity_search(  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document] 

Run similarity search with Chroma.

PARAMETER DESCRIPTION
query

Query text to search for.

TYPE: str

k

Number of results to return. Defaults to 4.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of documents most similar to the query text.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def similarity_search(  self,  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  **kwargs: Any, ) -> list[Document]:  """Run similarity search with Chroma.   Args:  query: Query text to search for.  k: Number of results to return. Defaults to 4.  filter: Filter by metadata.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of documents most similar to the query text.  """  docs_and_scores = self.similarity_search_with_score(  query,  k,  filter=filter,  **kwargs,  )  return [doc for doc, _ in docs_and_scores] 
similarity_search_by_vector
similarity_search_by_vector(  embedding: list[float],  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document] 

Return docs most similar to embedding vector.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: list[float]

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents most similar to the query vector.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def similarity_search_by_vector(  self,  embedding: list[float],  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[Document]:  """Return docs most similar to embedding vector.   Args:  embedding: Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  filter: Filter by metadata.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of Documents most similar to the query vector.  """  results = self.__query_collection(  query_embeddings=[embedding],  n_results=k,  where=filter,  where_document=where_document,  **kwargs,  )  return _results_to_docs(results) 
similarity_search_by_vector_with_relevance_scores
similarity_search_by_vector_with_relevance_scores(  embedding: list[float],  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[tuple[Document, float]] 

Return docs most similar to embedding vector and similarity score.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: List[float]

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the documents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of documents most similar to the query text and relevance score

list[tuple[Document, float]]

in float for each. Lower score represents more similarity.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def similarity_search_by_vector_with_relevance_scores(  self,  embedding: list[float],  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Return docs most similar to embedding vector and similarity score.   Args:  embedding (List[float]): Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  filter: Filter by metadata.  where_document: dict used to filter by the documents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of documents most similar to the query text and relevance score  in float for each. Lower score represents more similarity.  """  results = self.__query_collection(  query_embeddings=[embedding],  n_results=k,  where=filter,  where_document=where_document,  **kwargs,  )  return _results_to_docs_and_scores(results) 
similarity_search_with_score
similarity_search_with_score(  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[tuple[Document, float]] 

Run similarity search with Chroma with distance.

PARAMETER DESCRIPTION
query

Query text to search for.

TYPE: str

k

Number of results to return. Defaults to 4.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of documents most similar to the query text and

list[tuple[Document, float]]

distance in float for each. Lower score represents more similarity.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def similarity_search_with_score(  self,  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Run similarity search with Chroma with distance.   Args:  query: Query text to search for.  k: Number of results to return. Defaults to 4.  filter: Filter by metadata.  where_document: dict used to filter by document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of documents most similar to the query text and  distance in float for each. Lower score represents more similarity.  """  if self._embedding_function is None:  results = self.__query_collection(  query_texts=[query],  n_results=k,  where=filter,  where_document=where_document,  **kwargs,  )  else:  query_embedding = self._embedding_function.embed_query(query)  results = self.__query_collection(  query_embeddings=[query_embedding],  n_results=k,  where=filter,  where_document=where_document,  **kwargs,  )   return _results_to_docs_and_scores(results) 
similarity_search_with_vectors
similarity_search_with_vectors(  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[tuple[Document, ndarray]] 

Run similarity search with Chroma with vectors.

PARAMETER DESCRIPTION
query

Query text to search for.

TYPE: str

k

Number of results to return. Defaults to 4.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, ndarray]]

List of documents most similar to the query text and

list[tuple[Document, ndarray]]

embedding vectors for each.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def similarity_search_with_vectors(  self,  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[tuple[Document, np.ndarray]]:  """Run similarity search with Chroma with vectors.   Args:  query: Query text to search for.  k: Number of results to return. Defaults to 4.  filter: Filter by metadata.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of documents most similar to the query text and  embedding vectors for each.  """  include = ["documents", "metadatas", "embeddings"]  if self._embedding_function is None:  results = self.__query_collection(  query_texts=[query],  n_results=k,  where=filter,  where_document=where_document,  include=include,  **kwargs,  )  else:  query_embedding = self._embedding_function.embed_query(query)  results = self.__query_collection(  query_embeddings=[query_embedding],  n_results=k,  where=filter,  where_document=where_document,  include=include,  **kwargs,  )   return _results_to_docs_and_vectors(results) 
similarity_search_by_image
similarity_search_by_image(  uri: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document] 

Search for similar images based on the given image URI.

PARAMETER DESCRIPTION
uri

URI of the image to search for.

TYPE: str

k

Number of results to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

**kwargs

Additional arguments to pass to function.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Images most similar to the provided image. Each element in list is a

list[Document]

LangChain Document Object. The page content is b64 encoded image, metadata

list[Document]

is default or as defined by user.

RAISES DESCRIPTION
ValueError

If the embedding function does not support image embeddings.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def similarity_search_by_image(  self,  uri: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  **kwargs: Any, ) -> list[Document]:  """Search for similar images based on the given image URI.   Args:  uri: URI of the image to search for.  k: Number of results to return.  filter: Filter by metadata.  **kwargs: Additional arguments to pass to function.    Returns:  List of Images most similar to the provided image. Each element in list is a  LangChain Document Object. The page content is b64 encoded image, metadata  is default or as defined by user.   Raises:  ValueError: If the embedding function does not support image embeddings.  """  if self._embedding_function is not None and hasattr(  self._embedding_function, "embed_image"  ):  # Obtain image embedding  # Assuming embed_image returns a single embedding  image_embedding = self._embedding_function.embed_image(uris=[uri])   # Perform similarity search based on the obtained embedding  return self.similarity_search_by_vector(  embedding=image_embedding,  k=k,  filter=filter,  **kwargs,  )  msg = "The embedding function must support image embedding."  raise ValueError(msg) 
similarity_search_by_image_with_relevance_score
similarity_search_by_image_with_relevance_score(  uri: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  **kwargs: Any ) -> list[tuple[Document, float]] 

Search for similar images based on the given image URI.

PARAMETER DESCRIPTION
uri

URI of the image to search for.

TYPE: str

k

Number of results to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

**kwargs

Additional arguments to pass to function.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of tuples containing documents similar to the query image and their

list[tuple[Document, float]]

similarity scores. 0th element in each tuple is a LangChain Document Object.

list[tuple[Document, float]]

The page content is b64 encoded img, metadata is default or defined by user.

RAISES DESCRIPTION
ValueError

If the embedding function does not support image embeddings.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def similarity_search_by_image_with_relevance_score(  self,  uri: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Search for similar images based on the given image URI.   Args:  uri: URI of the image to search for.  k: Number of results to return.  filter: Filter by metadata.  **kwargs: Additional arguments to pass to function.   Returns:  List of tuples containing documents similar to the query image and their  similarity scores. 0th element in each tuple is a LangChain Document Object.  The page content is b64 encoded img, metadata is default or defined by user.   Raises:  ValueError: If the embedding function does not support image embeddings.  """  if self._embedding_function is not None and hasattr(  self._embedding_function, "embed_image"  ):  # Obtain image embedding  # Assuming embed_image returns a single embedding  image_embedding = self._embedding_function.embed_image(uris=[uri])   # Perform similarity search based on the obtained embedding  return self.similarity_search_by_vector_with_relevance_scores(  embedding=image_embedding,  k=k,  filter=filter,  **kwargs,  )  msg = "The embedding function must support image embedding."  raise ValueError(msg) 
max_marginal_relevance_search_by_vector
max_marginal_relevance_search_by_vector(  embedding: list[float],  k: int = DEFAULT_K,  fetch_k: int = 20,  lambda_mult: float = 0.5,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document] 

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: list[float]

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: DEFAULT_K

fetch_k

Number of Documents to fetch to pass to MMR algorithm. Defaults to 20.

TYPE: int DEFAULT: 20

lambda_mult

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

TYPE: float DEFAULT: 0.5

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents selected by maximal marginal relevance.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def max_marginal_relevance_search_by_vector(  self,  embedding: list[float],  k: int = DEFAULT_K,  fetch_k: int = 20,  lambda_mult: float = 0.5,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[Document]:  """Return docs selected using the maximal marginal relevance.   Maximal marginal relevance optimizes for similarity to query AND diversity  among selected documents.   Args:  embedding: Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  fetch_k: Number of Documents to fetch to pass to MMR algorithm. Defaults to  20.  lambda_mult: Number between 0 and 1 that determines the degree  of diversity among the results with 0 corresponding  to maximum diversity and 1 to minimum diversity.  Defaults to 0.5.  filter: Filter by metadata.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of Documents selected by maximal marginal relevance.  """  results = self.__query_collection(  query_embeddings=[embedding],  n_results=fetch_k,  where=filter,  where_document=where_document,  include=["metadatas", "documents", "distances", "embeddings"],  **kwargs,  )  mmr_selected = maximal_marginal_relevance(  np.array(embedding, dtype=np.float32),  results["embeddings"][0],  k=k,  lambda_mult=lambda_mult,  )   candidates = _results_to_docs(results)   return [r for i, r in enumerate(candidates) if i in mmr_selected] 
max_marginal_relevance_search(  query: str,  k: int = DEFAULT_K,  fetch_k: int = 20,  lambda_mult: float = 0.5,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document] 

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER DESCRIPTION
query

Text to look up documents similar to.

TYPE: str

k

Number of Documents to return. Defaults to 4.

TYPE: int DEFAULT: DEFAULT_K

fetch_k

Number of Documents to fetch to pass to MMR algorithm.

TYPE: int DEFAULT: 20

lambda_mult

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

TYPE: float DEFAULT: 0.5

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Documents selected by maximal marginal relevance.

RAISES DESCRIPTION
ValueError

If the embedding function is not provided.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def max_marginal_relevance_search(  self,  query: str,  k: int = DEFAULT_K,  fetch_k: int = 20,  lambda_mult: float = 0.5,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[Document]:  """Return docs selected using the maximal marginal relevance.   Maximal marginal relevance optimizes for similarity to query AND diversity  among selected documents.   Args:  query: Text to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  fetch_k: Number of Documents to fetch to pass to MMR algorithm.  lambda_mult: Number between 0 and 1 that determines the degree  of diversity among the results with 0 corresponding  to maximum diversity and 1 to minimum diversity.  Defaults to 0.5.  filter: Filter by metadata.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of Documents selected by maximal marginal relevance.   Raises:  ValueError: If the embedding function is not provided.  """  if self._embedding_function is None:  msg = "For MMR search, you must specify an embedding function on creation."  raise ValueError(  msg,  )   embedding = self._embedding_function.embed_query(query)  return self.max_marginal_relevance_search_by_vector(  embedding,  k,  fetch_k,  lambda_mult=lambda_mult,  filter=filter,  where_document=where_document,  ) 
delete_collection
delete_collection() -> None 

Delete the collection.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def delete_collection(self) -> None:  """Delete the collection."""  self._client.delete_collection(self._collection.name)  self._chroma_collection = None 
reset_collection
reset_collection() -> None 

Resets the collection.

Resets the collection by deleting the collection and recreating an empty one.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def reset_collection(self) -> None:  """Resets the collection.   Resets the collection by deleting the collection and recreating an empty one.  """  self.delete_collection()  self.__ensure_collection() 
get
get(  ids: str | list[str] | None = None,  where: Where | None = None,  limit: int | None = None,  offset: int | None = None,  where_document: WhereDocument | None = None,  include: list[str] | None = None, ) -> dict[str, Any] 

Gets the collection.

PARAMETER DESCRIPTION
ids

The ids of the embeddings to get. Optional.

TYPE: str | list[str] | None DEFAULT: None

where

A Where type dict used to filter results by. E.g. {"$and": [{"color": "red"}, {"price": 4.20}]} Optional.

TYPE: Where | None DEFAULT: None

limit

The number of documents to return. Optional.

TYPE: int | None DEFAULT: None

offset

The offset to start returning results from. Useful for paging results with limit. Optional.

TYPE: int | None DEFAULT: None

where_document

A WhereDocument type dict used to filter by the documents. E.g. {"$contains": "hello"}. Optional.

TYPE: WhereDocument | None DEFAULT: None

include

A list of what to include in the results. Can contain "embeddings", "metadatas", "documents". Ids are always included. Defaults to ["metadatas", "documents"]. Optional.

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
dict[str, Any]

A dict with the keys "ids", "embeddings", "metadatas", "documents".

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def get(  self,  ids: str | list[str] | None = None,  where: Where | None = None,  limit: int | None = None,  offset: int | None = None,  where_document: WhereDocument | None = None,  include: list[str] | None = None, ) -> dict[str, Any]:  """Gets the collection.   Args:  ids: The ids of the embeddings to get. Optional.  where: A Where type dict used to filter results by.  E.g. `{"$and": [{"color": "red"}, {"price": 4.20}]}` Optional.  limit: The number of documents to return. Optional.  offset: The offset to start returning results from.  Useful for paging results with limit. Optional.  where_document: A WhereDocument type dict used to filter by the documents.  E.g. `{"$contains": "hello"}`. Optional.  include: A list of what to include in the results.  Can contain `"embeddings"`, `"metadatas"`, `"documents"`.  Ids are always included.  Defaults to `["metadatas", "documents"]`. Optional.   Returns:  A dict with the keys `"ids"`, `"embeddings"`, `"metadatas"`, `"documents"`.  """  kwargs = {  "ids": ids,  "where": where,  "limit": limit,  "offset": offset,  "where_document": where_document,  }   if include is not None:  kwargs["include"] = include   return self._collection.get(**kwargs) # type: ignore[arg-type, return-value] 
get_by_ids
get_by_ids(ids: Sequence[str]) -> list[Document] 

Get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

PARAMETER DESCRIPTION
ids

List of ids to retrieve.

TYPE: Sequence[str]

RETURNS DESCRIPTION
list[Document]

List of Documents.

Added in 0.2.1

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def get_by_ids(self, ids: Sequence[str], /) -> list[Document]:  """Get documents by their IDs.   The returned documents are expected to have the ID field set to the ID of the  document in the vector store.   Fewer documents may be returned than requested if some IDs are not found or  if there are duplicated IDs.   Users should not assume that the order of the returned documents matches  the order of the input IDs. Instead, users should rely on the ID field of the  returned documents.   This method should **NOT** raise exceptions if no documents are found for  some IDs.   Args:  ids: List of ids to retrieve.   Returns:  List of Documents.   !!! version-added "Added in 0.2.1"  """  results = self.get(ids=list(ids))  return [  Document(page_content=doc, metadata=meta, id=doc_id)  for doc, meta, doc_id in zip(  results["documents"],  results["metadatas"],  results["ids"],  strict=False,  )  ] 
update_document
update_document(  document_id: str, document: Document ) -> None 

Update a document in the collection.

PARAMETER DESCRIPTION
document_id

ID of the document to update.

TYPE: str

document

Document to update.

TYPE: Document

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def update_document(self, document_id: str, document: Document) -> None:  """Update a document in the collection.   Args:  document_id: ID of the document to update.  document: Document to update.  """  return self.update_documents([document_id], [document]) 
update_documents
update_documents(  ids: list[str], documents: list[Document] ) -> None 

Update a document in the collection.

PARAMETER DESCRIPTION
ids

List of ids of the document to update.

TYPE: list[str]

documents

List of documents to update.

TYPE: list[Document]

RAISES DESCRIPTION
ValueError

If the embedding function is not provided.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def update_documents(self, ids: list[str], documents: list[Document]) -> None:  """Update a document in the collection.   Args:  ids: List of ids of the document to update.  documents: List of documents to update.   Raises:  ValueError: If the embedding function is not provided.  """  text = [document.page_content for document in documents]  metadata = [document.metadata for document in documents]  if self._embedding_function is None:  msg = "For update, you must specify an embedding function on creation."  raise ValueError(  msg,  )  embeddings = self._embedding_function.embed_documents(text)   if hasattr(  self._client,  "get_max_batch_size",  ) or hasattr( # for Chroma 0.5.1 and above  self._client,  "max_batch_size",  ): # for Chroma 0.4.10 and above  from chromadb.utils.batch_utils import create_batches   for batch in create_batches(  api=self._client,  ids=ids,  metadatas=metadata, # type: ignore[arg-type]  documents=text,  embeddings=embeddings, # type: ignore[arg-type]  ):  self._collection.update(  ids=batch[0],  embeddings=batch[1],  documents=batch[3],  metadatas=batch[2],  )  else:  self._collection.update(  ids=ids,  embeddings=embeddings, # type: ignore[arg-type]  documents=text,  metadatas=metadata, # type: ignore[arg-type]  ) 
from_texts classmethod
from_texts(  texts: list[str],  embedding: Embeddings | None = None,  metadatas: list[dict] | None = None,  ids: list[str] | None = None,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: Settings | None = None,  client: ClientAPI | None = None,  collection_metadata: dict | None = None,  collection_configuration: (  CreateCollectionConfiguration | None  ) = None,  *,  ssl: bool = False,  **kwargs: Any ) -> Chroma 

Create a Chroma vectorstore from a raw documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

PARAMETER DESCRIPTION
texts

List of texts to add to the collection.

TYPE: list[str]

collection_name

Name of the collection to create.

TYPE: str DEFAULT: _LANGCHAIN_DEFAULT_COLLECTION_NAME

persist_directory

Directory to persist the collection.

TYPE: str | None DEFAULT: None

host

Hostname of a deployed Chroma server.

TYPE: str | None DEFAULT: None

port

Connection port for a deployed Chroma server. Default is 8000.

TYPE: int | None DEFAULT: None

ssl

Whether to establish an SSL connection with a deployed Chroma server. Default is False.

TYPE: bool DEFAULT: False

headers

HTTP headers to send to a deployed Chroma server.

TYPE: dict[str, str] | None DEFAULT: None

chroma_cloud_api_key

Chroma Cloud API key.

TYPE: str | None DEFAULT: None

tenant

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

TYPE: str | None DEFAULT: None

database

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

TYPE: str | None DEFAULT: None

embedding

Embedding function.

TYPE: Embeddings | None DEFAULT: None

metadatas

List of metadatas.

TYPE: list[dict] | None DEFAULT: None

ids

List of document IDs.

TYPE: list[str] | None DEFAULT: None

client_settings

Chroma client settings.

TYPE: Settings | None DEFAULT: None

client

TYPE: ClientAPI | None DEFAULT: None

collection_metadata

Collection configurations.

TYPE: dict | None DEFAULT: None

collection_configuration

Index configuration for the collection.

TYPE: CreateCollectionConfiguration | None DEFAULT: None

kwargs

Additional keyword arguments to initialize a Chroma client.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Chroma

Chroma vectorstore.

TYPE: Chroma

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
@classmethod def from_texts(  cls: type[Chroma],  texts: list[str],  embedding: Embeddings | None = None,  metadatas: list[dict] | None = None,  ids: list[str] | None = None,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: chromadb.config.Settings | None = None,  client: chromadb.ClientAPI | None = None,  collection_metadata: dict | None = None,  collection_configuration: CreateCollectionConfiguration | None = None,  *,  ssl: bool = False,  **kwargs: Any, ) -> Chroma:  """Create a Chroma vectorstore from a raw documents.   If a persist_directory is specified, the collection will be persisted there.  Otherwise, the data will be ephemeral in-memory.   Args:  texts: List of texts to add to the collection.  collection_name: Name of the collection to create.  persist_directory: Directory to persist the collection.  host: Hostname of a deployed Chroma server.  port: Connection port for a deployed Chroma server.  Default is 8000.  ssl: Whether to establish an SSL connection with a deployed Chroma server.  Default is False.  headers: HTTP headers to send to a deployed Chroma server.  chroma_cloud_api_key: Chroma Cloud API key.  tenant: Tenant ID. Required for Chroma Cloud connections.  Default is 'default_tenant' for local Chroma servers.  database: Database name. Required for Chroma Cloud connections.  Default is 'default_database'.  embedding: Embedding function.  metadatas: List of metadatas.  ids: List of document IDs.  client_settings: Chroma client settings.  client: Chroma client. Documentation:  https://docs.trychroma.com/reference/python/client  collection_metadata: Collection configurations.  collection_configuration: Index configuration for the collection.   kwargs: Additional keyword arguments to initialize a Chroma client.   Returns:  Chroma: Chroma vectorstore.  """  chroma_collection = cls(  collection_name=collection_name,  embedding_function=embedding,  persist_directory=persist_directory,  host=host,  port=port,  ssl=ssl,  headers=headers,  chroma_cloud_api_key=chroma_cloud_api_key,  tenant=tenant,  database=database,  client_settings=client_settings,  client=client,  collection_metadata=collection_metadata,  collection_configuration=collection_configuration,  **kwargs,  )  if ids is None:  ids = [str(uuid.uuid4()) for _ in texts]  else:  ids = [id_ if id_ is not None else str(uuid.uuid4()) for id_ in ids]  if hasattr(  chroma_collection._client,  "get_max_batch_size",  ) or hasattr( # for Chroma 0.5.1 and above  chroma_collection._client,  "max_batch_size",  ): # for Chroma 0.4.10 and above  from chromadb.utils.batch_utils import create_batches   for batch in create_batches(  api=chroma_collection._client,  ids=ids,  metadatas=metadatas, # type: ignore[arg-type]  documents=texts,  ):  chroma_collection.add_texts(  texts=batch[3] if batch[3] else [],  metadatas=batch[2] if batch[2] else None, # type: ignore[arg-type]  ids=batch[0],  )  else:  chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)  return chroma_collection 
from_documents classmethod
from_documents(  documents: list[Document],  embedding: Embeddings | None = None,  ids: list[str] | None = None,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: Settings | None = None,  client: ClientAPI | None = None,  collection_metadata: dict | None = None,  collection_configuration: (  CreateCollectionConfiguration | None  ) = None,  *,  ssl: bool = False,  **kwargs: Any ) -> Chroma 

Create a Chroma vectorstore from a list of documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

PARAMETER DESCRIPTION
collection_name

Name of the collection to create.

TYPE: str DEFAULT: _LANGCHAIN_DEFAULT_COLLECTION_NAME

persist_directory

Directory to persist the collection.

TYPE: str | None DEFAULT: None

host

Hostname of a deployed Chroma server.

TYPE: str | None DEFAULT: None

port

Connection port for a deployed Chroma server. Default is 8000.

TYPE: int | None DEFAULT: None

ssl

Whether to establish an SSL connection with a deployed Chroma server. Default is False.

TYPE: bool DEFAULT: False

headers

HTTP headers to send to a deployed Chroma server.

TYPE: dict[str, str] | None DEFAULT: None

chroma_cloud_api_key

Chroma Cloud API key.

TYPE: str | None DEFAULT: None

tenant

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

TYPE: str | None DEFAULT: None

database

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

TYPE: str | None DEFAULT: None

ids

List of document IDs.

documents

List of documents to add to the vectorstore.

TYPE: list[Document]

embedding

Embedding function.

TYPE: Embeddings | None DEFAULT: None

client_settings

Chroma client settings.

TYPE: Settings | None DEFAULT: None

client

TYPE: ClientAPI | None DEFAULT: None

collection_metadata

Collection configurations.

TYPE: dict | None DEFAULT: None

collection_configuration

Index configuration for the collection.

TYPE: CreateCollectionConfiguration | None DEFAULT: None

kwargs

Additional keyword arguments to initialize a Chroma client.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Chroma

Chroma vectorstore.

TYPE: Chroma

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
@classmethod def from_documents(  cls: type[Chroma],  documents: list[Document],  embedding: Embeddings | None = None,  ids: list[str] | None = None,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: chromadb.config.Settings | None = None,  client: chromadb.ClientAPI | None = None, # Add this line  collection_metadata: dict | None = None,  collection_configuration: CreateCollectionConfiguration | None = None,  *,  ssl: bool = False,  **kwargs: Any, ) -> Chroma:  """Create a Chroma vectorstore from a list of documents.   If a persist_directory is specified, the collection will be persisted there.  Otherwise, the data will be ephemeral in-memory.   Args:  collection_name: Name of the collection to create.  persist_directory: Directory to persist the collection.  host: Hostname of a deployed Chroma server.  port: Connection port for a deployed Chroma server. Default is 8000.  ssl: Whether to establish an SSL connection with a deployed Chroma server.  Default is False.  headers: HTTP headers to send to a deployed Chroma server.  chroma_cloud_api_key: Chroma Cloud API key.  tenant: Tenant ID. Required for Chroma Cloud connections.  Default is 'default_tenant' for local Chroma servers.  database: Database name. Required for Chroma Cloud connections.  Default is 'default_database'.  ids : List of document IDs.  documents: List of documents to add to the vectorstore.  embedding: Embedding function.  client_settings: Chroma client settings.  client: Chroma client. Documentation:  https://docs.trychroma.com/reference/python/client  collection_metadata: Collection configurations.  collection_configuration: Index configuration for the collection.   kwargs: Additional keyword arguments to initialize a Chroma client.   Returns:  Chroma: Chroma vectorstore.  """  texts = [doc.page_content for doc in documents]  metadatas = [doc.metadata for doc in documents]  if ids is None:  ids = [doc.id if doc.id else str(uuid.uuid4()) for doc in documents]  return cls.from_texts(  texts=texts,  embedding=embedding,  metadatas=metadatas,  ids=ids,  collection_name=collection_name,  persist_directory=persist_directory,  host=host,  port=port,  ssl=ssl,  headers=headers,  chroma_cloud_api_key=chroma_cloud_api_key,  tenant=tenant,  database=database,  client_settings=client_settings,  client=client,  collection_metadata=collection_metadata,  collection_configuration=collection_configuration,  **kwargs,  ) 
delete
delete(ids: list[str] | None = None, **kwargs: Any) -> None 

Delete by vector IDs.

PARAMETER DESCRIPTION
ids

List of ids to delete.

TYPE: list[str] | None DEFAULT: None

kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py
def delete(self, ids: list[str] | None = None, **kwargs: Any) -> None:  """Delete by vector IDs.   Args:  ids: List of ids to delete.  kwargs: Additional keyword arguments.  """  self._collection.delete(ids=ids, **kwargs)