`langchain-chroma`¶

LangChain integration for Chroma vector database.

Classes¶

Chroma ¶

Chroma(  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  embedding_function: Embeddings | None = None,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: Settings | None = None,  collection_metadata: dict | None = None,  collection_configuration: (  CreateCollectionConfiguration | None  ) = None,  client: ClientAPI | None = None,  relevance_score_fn: (  Callable[[float], float] | None  ) = None,  create_collection_if_not_exists: bool | None = True,  *,  ssl: bool = False )

Bases: VectorStore

Chroma vector store integration.

Setup

Install chromadb, langchain-chroma packages:

pip install -qU chromadb langchain-chroma

Key init args — indexing params: collection_name: str Name of the collection. embedding_function: Embeddings Embedding function to use.

Key init args — client params: client: Client | None Chroma client to use. client_settings: chromadb.config.Settings | None Chroma client settings. persist_directory: str | None Directory to persist the collection. host: str | None Hostname of a deployed Chroma server. port: int | None Connection port for a deployed Chroma server. Default is 8000. ssl: bool | None Whether to establish an SSL connection with a deployed Chroma server. Default is False. headers: dict[str, str] | None HTTP headers to send to a deployed Chroma server. chroma_cloud_api_key: str | None Chroma Cloud API key. tenant: str | None Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. database: str | None Database name. Required for Chroma Cloud connections. Default is 'default_database'.

Instantiate

from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings  vector_store = Chroma(  collection_name="foo",  embedding_function=OpenAIEmbeddings(),  # other params... )

Add Documents

from langchain_core.documents import Document  document_1 = Document(page_content="foo", metadata={"baz": "bar"}) document_2 = Document(page_content="thud", metadata={"bar": "baz"}) document_3 = Document(page_content="i will be deleted :(")  documents = [document_1, document_2, document_3] ids = ["1", "2", "3"] vector_store.add_documents(documents=documents, ids=ids)

Update Documents

updated_document = Document(  page_content="qux",  metadata={"bar": "baz"}, )  vector_store.update_documents(ids=["1"], documents=[updated_document])

Delete Documents

vector_store.delete(ids=["3"])

Search

results = vector_store.similarity_search(query="thud", k=1) for doc in results:  print(f"* {doc.page_content} [{doc.metadata}]")

*thud[{"baz": "bar"}]

Search with filter

results = vector_store.similarity_search(  query="thud", k=1, filter={"baz": "bar"} ) for doc in results:  print(f"* {doc.page_content} [{doc.metadata}]")

*foo[{"baz": "bar"}]

Search with score

results = vector_store.similarity_search_with_score(query="qux", k=1) for doc, score in results:  print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.000000] qux [{'bar': 'baz', 'baz': 'bar'}]

Async

# add documents # await vector_store.aadd_documents(documents=documents, ids=ids)  # delete documents # await vector_store.adelete(ids=["3"])  # search # results = vector_store.asimilarity_search(query="thud",k=1)  # search with score results = await vector_store.asimilarity_search_with_score(query="qux", k=1) for doc, score in results:  print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.335463] foo [{'baz': 'bar'}]

Use as Retriever

retriever = vector_store.as_retriever(  search_type="mmr",  search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5}, ) retriever.invoke("thud")

[Document(metadata={"baz": "bar"}, page_content="thud")]

Initialize with a Chroma client.

PARAMETER	DESCRIPTION
`collection_name` ¶	Name of the collection to create. TYPE: `str` DEFAULT: `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`embedding_function` ¶	Embedding class object. Used to embed texts. TYPE: `Embeddings \| None` DEFAULT: `None`
`persist_directory` ¶	Directory to persist the collection. TYPE: `str \| None` DEFAULT: `None`
`host` ¶	Hostname of a deployed Chroma server. TYPE: `str \| None` DEFAULT: `None`
`port` ¶	Connection port for a deployed Chroma server. Default is 8000. TYPE: `int \| None` DEFAULT: `None`
`ssl` ¶	Whether to establish an SSL connection with a deployed Chroma server. Default is False. TYPE: `bool` DEFAULT: `False`
`headers` ¶	HTTP headers to send to a deployed Chroma server. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`chroma_cloud_api_key` ¶	Chroma Cloud API key. TYPE: `str \| None` DEFAULT: `None`
`tenant` ¶	Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. TYPE: `str \| None` DEFAULT: `None`
`database` ¶	Database name. Required for Chroma Cloud connections. Default is 'default_database'. TYPE: `str \| None` DEFAULT: `None`
`client_settings` ¶	Chroma client settings TYPE: `Settings \| None` DEFAULT: `None`
`collection_metadata` ¶	Collection configurations. TYPE: `dict \| None` DEFAULT: `None`
`collection_configuration` ¶	Index configuration for the collection. TYPE: `CreateCollectionConfiguration \| None` DEFAULT: `None`
`client` ¶	Chroma client. Documentation: https://docs.trychroma.com/reference/python/client TYPE: `ClientAPI \| None` DEFAULT: `None`
`relevance_score_fn` ¶	Function to calculate relevance score from distance. Used only in `similarity_search_with_relevance_scores` TYPE: `Callable[[float], float] \| None` DEFAULT: `None`
`create_collection_if_not_exists` ¶	Whether to create collection if it doesn't exist. Defaults to `True`. TYPE: `bool \| None` DEFAULT: `True`

METHOD	DESCRIPTION
`aget_by_ids`	Async get documents by their IDs.
`adelete`	Async delete by vector ID or other criteria.
`aadd_texts`	Async run more texts through the embeddings and add to the vectorstore.
`add_documents`	Add or update documents in the vectorstore.
`aadd_documents`	Async run more documents through the embeddings and add to the vectorstore.
`search`	Return docs most similar to query using a specified search type.
`asearch`	Async return docs most similar to query using a specified search type.
`asimilarity_search_with_score`	Async run similarity search with distance.
`similarity_search_with_relevance_scores`	Return docs and relevance scores in the range [0, 1].
`asimilarity_search_with_relevance_scores`	Async return docs and relevance scores in the range [0, 1].
`asimilarity_search`	Async return docs most similar to query.
`asimilarity_search_by_vector`	Async return docs most similar to embedding vector.
`amax_marginal_relevance_search`	Async return docs selected using the maximal marginal relevance.
`amax_marginal_relevance_search_by_vector`	Async return docs selected using the maximal marginal relevance.
`afrom_documents`	Async return VectorStore initialized from documents and embeddings.
`afrom_texts`	Async return VectorStore initialized from texts and embeddings.
`as_retriever`	Return VectorStoreRetriever initialized from this VectorStore.
`encode_image`	Get base64 string from image URI.
`fork`	Fork this vector store.
`add_images`	Run more images through the embeddings and add to the vectorstore.
`add_texts`	Run more texts through the embeddings and add to the vectorstore.
`similarity_search`	Run similarity search with Chroma.
`similarity_search_by_vector`	Return docs most similar to embedding vector.
`similarity_search_by_vector_with_relevance_scores`	Return docs most similar to embedding vector and similarity score.
`similarity_search_with_score`	Run similarity search with Chroma with distance.
`similarity_search_with_vectors`	Run similarity search with Chroma with vectors.
`similarity_search_by_image`	Search for similar images based on the given image URI.
`similarity_search_by_image_with_relevance_score`	Search for similar images based on the given image URI.
`max_marginal_relevance_search_by_vector`	Return docs selected using the maximal marginal relevance.
`max_marginal_relevance_search`	Return docs selected using the maximal marginal relevance.
`delete_collection`	Delete the collection.
`reset_collection`	Resets the collection.
`get`	Gets the collection.
`get_by_ids`	Get documents by their IDs.
`update_document`	Update a document in the collection.
`update_documents`	Update a document in the collection.
`from_texts`	Create a Chroma vectorstore from a raw documents.
`from_documents`	Create a Chroma vectorstore from a list of documents.
`delete`	Delete by vector IDs.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def __init__(  self,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  embedding_function: Embeddings | None = None,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: chromadb.config.Settings | None = None,  collection_metadata: dict | None = None,  collection_configuration: CreateCollectionConfiguration | None = None,  client: chromadb.ClientAPI | None = None,  relevance_score_fn: Callable[[float], float] | None = None,  create_collection_if_not_exists: bool | None = True, # noqa: FBT001, FBT002  *,  ssl: bool = False, ) -> None:  """Initialize with a Chroma client.   Args:  collection_name: Name of the collection to create.  embedding_function: Embedding class object. Used to embed texts.  persist_directory: Directory to persist the collection.  host: Hostname of a deployed Chroma server.  port: Connection port for a deployed Chroma server. Default is 8000.  ssl: Whether to establish an SSL connection with a deployed Chroma server.  Default is False.  headers: HTTP headers to send to a deployed Chroma server.  chroma_cloud_api_key: Chroma Cloud API key.  tenant: Tenant ID. Required for Chroma Cloud connections.  Default is 'default_tenant' for local Chroma servers.  database: Database name. Required for Chroma Cloud connections.  Default is 'default_database'.  client_settings: Chroma client settings  collection_metadata: Collection configurations.  collection_configuration: Index configuration for the collection.   client: Chroma client. Documentation:  https://docs.trychroma.com/reference/python/client  relevance_score_fn: Function to calculate relevance score from distance.  Used only in `similarity_search_with_relevance_scores`  create_collection_if_not_exists: Whether to create collection  if it doesn't exist. Defaults to `True`.  """  _tenant = tenant or chromadb.DEFAULT_TENANT  _database = database or chromadb.DEFAULT_DATABASE  _settings = client_settings or Settings()   client_args = {  "persist_directory": persist_directory,  "host": host,  "chroma_cloud_api_key": chroma_cloud_api_key,  }   if sum(arg is not None for arg in client_args.values()) > 1:  provided = [  name for name, value in client_args.items() if value is not None  ]  msg = (  f"Only one of 'persist_directory', 'host' and 'chroma_cloud_api_key' "  f"is allowed, but got {','.join(provided)}"  )  raise ValueError(msg)   if client is not None:  self._client = client   # PersistentClient  elif persist_directory is not None:  self._client = chromadb.PersistentClient(  path=persist_directory,  settings=_settings,  tenant=_tenant,  database=_database,  )   # HttpClient  elif host is not None:  _port = port or 8000  self._client = chromadb.HttpClient(  host=host,  port=_port,  ssl=ssl,  headers=headers,  settings=_settings,  tenant=_tenant,  database=_database,  )   # CloudClient  elif chroma_cloud_api_key is not None:  if not tenant or not database:  msg = (  "Must provide tenant and database values to connect to Chroma Cloud"  )  raise ValueError(msg)  self._client = chromadb.CloudClient(  tenant=tenant,  database=database,  api_key=chroma_cloud_api_key,  settings=_settings,  )   else:  self._client = chromadb.Client(settings=_settings)   self._embedding_function = embedding_function  self._chroma_collection: chromadb.Collection | None = None  self._collection_name = collection_name  self._collection_metadata = collection_metadata  self._collection_configuration = collection_configuration  if create_collection_if_not_exists:  self.__ensure_collection()  else:  self._chroma_collection = self._client.get_collection(name=collection_name)  self.override_relevance_score_fn = relevance_score_fn 

Attributes¶

embeddings `property` ¶

embeddings: Embeddings | None

Access the query embedding object.

Functions¶

aget_by_ids `async` ¶

aget_by_ids(ids: Sequence[str]) -> list[Document]

Async get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

PARAMETER	DESCRIPTION
`ids` ¶	List of ids to retrieve. TYPE: `Sequence[str]`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents.

Added in version 0.2.11

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def aget_by_ids(self, ids: Sequence[str], /) -> list[Document]:  """Async get documents by their IDs.   The returned documents are expected to have the ID field set to the ID of the  document in the vector store.   Fewer documents may be returned than requested if some IDs are not found or  if there are duplicated IDs.   Users should not assume that the order of the returned documents matches  the order of the input IDs. Instead, users should rely on the ID field of the  returned documents.   This method should **NOT** raise exceptions if no documents are found for  some IDs.   Args:  ids: List of ids to retrieve.   Returns:  List of Documents.   !!! version-added "Added in version 0.2.11"  """  return await run_in_executor(None, self.get_by_ids, ids) 

adelete `async` ¶

adelete(  ids: list[str] | None = None, **kwargs: Any ) -> bool | None

Async delete by vector ID or other criteria.

PARAMETER	DESCRIPTION
`ids` ¶	List of ids to delete. If `None`, delete all. Default is None. TYPE: `list[str] \| None` DEFAULT: `None`
`**kwargs` ¶	Other keyword arguments that subclasses might use. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`bool \| None`	True if deletion is successful, False otherwise, None if not implemented.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def adelete(self, ids: list[str] | None = None, **kwargs: Any) -> bool | None:  """Async delete by vector ID or other criteria.   Args:  ids: List of ids to delete. If `None`, delete all. Default is None.  **kwargs: Other keyword arguments that subclasses might use.   Returns:  True if deletion is successful, False otherwise, None if not implemented.  """  return await run_in_executor(None, self.delete, ids, **kwargs) 

aadd_texts `async` ¶

aadd_texts(  texts: Iterable[str],  metadatas: list[dict] | None = None,  *,  ids: list[str] | None = None,  **kwargs: Any ) -> list[str]

Async run more texts through the embeddings and add to the vectorstore.

PARAMETER	DESCRIPTION
`texts` ¶	Iterable of strings to add to the vectorstore. TYPE: `Iterable[str]`
`metadatas` ¶	Optional list of metadatas associated with the texts. Default is None. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids` ¶	Optional list TYPE: `list[str] \| None` DEFAULT: `None`
`**kwargs` ¶	vectorstore specific parameters. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[str]`	List of ids from adding the texts into the vectorstore.

RAISES	DESCRIPTION
`ValueError`	If the number of metadatas does not match the number of texts.
`ValueError`	If the number of ids does not match the number of texts.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def aadd_texts(  self,  texts: Iterable[str],  metadatas: list[dict] | None = None,  *,  ids: list[str] | None = None,  **kwargs: Any, ) -> list[str]:  """Async run more texts through the embeddings and add to the vectorstore.   Args:  texts: Iterable of strings to add to the vectorstore.  metadatas: Optional list of metadatas associated with the texts.  Default is None.  ids: Optional list  **kwargs: vectorstore specific parameters.   Returns:  List of ids from adding the texts into the vectorstore.   Raises:  ValueError: If the number of metadatas does not match the number of texts.  ValueError: If the number of ids does not match the number of texts.  """  if ids is not None:  # For backward compatibility  kwargs["ids"] = ids  if type(self).aadd_documents != VectorStore.aadd_documents:  # This condition is triggered if the subclass has provided  # an implementation of the upsert method.  # The existing add_texts  texts_: Sequence[str] = (  texts if isinstance(texts, (list, tuple)) else list(texts)  )  if metadatas and len(metadatas) != len(texts_):  msg = (  "The number of metadatas must match the number of texts."  f"Got {len(metadatas)} metadatas and {len(texts_)} texts."  )  raise ValueError(msg)  metadatas_ = iter(metadatas) if metadatas else cycle([{}])  ids_: Iterator[str | None] = iter(ids) if ids else cycle([None])   docs = [  Document(id=id_, page_content=text, metadata=metadata_)  for text, metadata_, id_ in zip(texts, metadatas_, ids_, strict=False)  ]  return await self.aadd_documents(docs, **kwargs)  return await run_in_executor(None, self.add_texts, texts, metadatas, **kwargs) 

add_documents ¶

add_documents(  documents: list[Document], **kwargs: Any ) -> list[str]

Add or update documents in the vectorstore.

PARAMETER	DESCRIPTION
`documents` ¶	Documents to add to the vectorstore. TYPE: `list[Document]`
`**kwargs` ¶	Additional keyword arguments. if kwargs contains ids and documents contain ids, the ids in the kwargs will receive precedence. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[str]`	List of IDs of the added texts.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

def add_documents(self, documents: list[Document], **kwargs: Any) -> list[str]:  """Add or update documents in the vectorstore.   Args:  documents: Documents to add to the vectorstore.  **kwargs: Additional keyword arguments.  if kwargs contains ids and documents contain ids,  the ids in the kwargs will receive precedence.   Returns:  List of IDs of the added texts.  """  if type(self).add_texts != VectorStore.add_texts:  if "ids" not in kwargs:  ids = [doc.id for doc in documents]   # If there's at least one valid ID, we'll assume that IDs  # should be used.  if any(ids):  kwargs["ids"] = ids   texts = [doc.page_content for doc in documents]  metadatas = [doc.metadata for doc in documents]  return self.add_texts(texts, metadatas, **kwargs)  msg = (  f"`add_documents` and `add_texts` has not been implemented "  f"for {self.__class__.__name__} "  )  raise NotImplementedError(msg) 

aadd_documents `async` ¶

aadd_documents(  documents: list[Document], **kwargs: Any ) -> list[str]

Async run more documents through the embeddings and add to the vectorstore.

PARAMETER	DESCRIPTION
`documents` ¶	Documents to add to the vectorstore. TYPE: `list[Document]`
`**kwargs` ¶	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[str]`	List of IDs of the added texts.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def aadd_documents(  self, documents: list[Document], **kwargs: Any ) -> list[str]:  """Async run more documents through the embeddings and add to the vectorstore.   Args:  documents: Documents to add to the vectorstore.  **kwargs: Additional keyword arguments.   Returns:  List of IDs of the added texts.  """  # If the async method has been overridden, we'll use that.  if type(self).aadd_texts != VectorStore.aadd_texts:  if "ids" not in kwargs:  ids = [doc.id for doc in documents]   # If there's at least one valid ID, we'll assume that IDs  # should be used.  if any(ids):  kwargs["ids"] = ids   texts = [doc.page_content for doc in documents]  metadatas = [doc.metadata for doc in documents]  return await self.aadd_texts(texts, metadatas, **kwargs)   return await run_in_executor(None, self.add_documents, documents, **kwargs) 

search ¶

search(  query: str, search_type: str, **kwargs: Any ) -> list[Document]

Return docs most similar to query using a specified search type.

PARAMETER	DESCRIPTION
`query` ¶	Input text TYPE: `str`
`search_type` ¶	Type of search to perform. Can be "similarity", "mmr", or "similarity_score_threshold". TYPE: `str`
`**kwargs` ¶	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents most similar to the query.

RAISES	DESCRIPTION
`ValueError`	If search_type is not one of "similarity", "mmr", or "similarity_score_threshold".

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

def search(self, query: str, search_type: str, **kwargs: Any) -> list[Document]:  """Return docs most similar to query using a specified search type.   Args:  query: Input text  search_type: Type of search to perform. Can be "similarity",  "mmr", or "similarity_score_threshold".  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents most similar to the query.   Raises:  ValueError: If search_type is not one of "similarity",  "mmr", or "similarity_score_threshold".  """  if search_type == "similarity":  return self.similarity_search(query, **kwargs)  if search_type == "similarity_score_threshold":  docs_and_similarities = self.similarity_search_with_relevance_scores(  query, **kwargs  )  return [doc for doc, _ in docs_and_similarities]  if search_type == "mmr":  return self.max_marginal_relevance_search(query, **kwargs)  msg = (  f"search_type of {search_type} not allowed. Expected "  "search_type to be 'similarity', 'similarity_score_threshold'"  " or 'mmr'."  )  raise ValueError(msg) 

asearch `async` ¶

asearch(  query: str, search_type: str, **kwargs: Any ) -> list[Document]

Async return docs most similar to query using a specified search type.

PARAMETER	DESCRIPTION
`query` ¶	Input text. TYPE: `str`
`search_type` ¶	Type of search to perform. Can be "similarity", "mmr", or "similarity_score_threshold". TYPE: `str`
`**kwargs` ¶	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents most similar to the query.

RAISES	DESCRIPTION
`ValueError`	If search_type is not one of "similarity", "mmr", or "similarity_score_threshold".

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def asearch(  self, query: str, search_type: str, **kwargs: Any ) -> list[Document]:  """Async return docs most similar to query using a specified search type.   Args:  query: Input text.  search_type: Type of search to perform. Can be "similarity",  "mmr", or "similarity_score_threshold".  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents most similar to the query.   Raises:  ValueError: If search_type is not one of "similarity",  "mmr", or "similarity_score_threshold".  """  if search_type == "similarity":  return await self.asimilarity_search(query, **kwargs)  if search_type == "similarity_score_threshold":  docs_and_similarities = await self.asimilarity_search_with_relevance_scores(  query, **kwargs  )  return [doc for doc, _ in docs_and_similarities]  if search_type == "mmr":  return await self.amax_marginal_relevance_search(query, **kwargs)  msg = (  f"search_type of {search_type} not allowed. Expected "  "search_type to be 'similarity', 'similarity_score_threshold' or 'mmr'."  )  raise ValueError(msg) 

asimilarity_search_with_score `async` ¶

asimilarity_search_with_score(  *args: Any, **kwargs: Any ) -> list[tuple[Document, float]]

Async run similarity search with distance.

PARAMETER	DESCRIPTION
`*args` ¶	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `()`
`**kwargs` ¶	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of Tuples of (doc, similarity_score).

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def asimilarity_search_with_score(  self, *args: Any, **kwargs: Any ) -> list[tuple[Document, float]]:  """Async run similarity search with distance.   Args:  *args: Arguments to pass to the search method.  **kwargs: Arguments to pass to the search method.   Returns:  List of Tuples of (doc, similarity_score).  """  # This is a temporary workaround to make the similarity search  # asynchronous. The proper solution is to make the similarity search  # asynchronous in the vector store implementations.  return await run_in_executor(  None, self.similarity_search_with_score, *args, **kwargs  ) 

similarity_search_with_relevance_scores ¶

similarity_search_with_relevance_scores(  query: str, k: int = 4, **kwargs: Any ) -> list[tuple[Document, float]]

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

PARAMETER	DESCRIPTION
`query` ¶	Input text. TYPE: `str`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `4`
`**kwargs` ¶	kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to filter the resulting set of retrieved docs. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of Tuples of (doc, similarity_score).

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

def similarity_search_with_relevance_scores(  self,  query: str,  k: int = 4,  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Return docs and relevance scores in the range [0, 1].   0 is dissimilar, 1 is most similar.   Args:  query: Input text.  k: Number of Documents to return. Defaults to 4.  **kwargs: kwargs to be passed to similarity search. Should include:  score_threshold: Optional, a floating point value between 0 to 1 to  filter the resulting set of retrieved docs.   Returns:  List of Tuples of (doc, similarity_score).  """  score_threshold = kwargs.pop("score_threshold", None)   docs_and_similarities = self._similarity_search_with_relevance_scores(  query, k=k, **kwargs  )  if any(  similarity < 0.0 or similarity > 1.0  for _, similarity in docs_and_similarities  ):  warnings.warn(  "Relevance scores must be between"  f" 0 and 1, got {docs_and_similarities}",  stacklevel=2,  )   if score_threshold is not None:  docs_and_similarities = [  (doc, similarity)  for doc, similarity in docs_and_similarities  if similarity >= score_threshold  ]  if len(docs_and_similarities) == 0:  logger.warning(  "No relevant docs were retrieved using the "  "relevance score threshold %s",  score_threshold,  )  return docs_and_similarities 

asimilarity_search_with_relevance_scores `async` ¶

asimilarity_search_with_relevance_scores(  query: str, k: int = 4, **kwargs: Any ) -> list[tuple[Document, float]]

Async return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

PARAMETER	DESCRIPTION
`query` ¶	Input text. TYPE: `str`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `4`
`**kwargs` ¶	kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to filter the resulting set of retrieved docs TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of Tuples of (doc, similarity_score)

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def asimilarity_search_with_relevance_scores(  self,  query: str,  k: int = 4,  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Async return docs and relevance scores in the range [0, 1].   0 is dissimilar, 1 is most similar.   Args:  query: Input text.  k: Number of Documents to return. Defaults to 4.  **kwargs: kwargs to be passed to similarity search. Should include:  score_threshold: Optional, a floating point value between 0 to 1 to  filter the resulting set of retrieved docs   Returns:  List of Tuples of (doc, similarity_score)  """  score_threshold = kwargs.pop("score_threshold", None)   docs_and_similarities = await self._asimilarity_search_with_relevance_scores(  query, k=k, **kwargs  )  if any(  similarity < 0.0 or similarity > 1.0  for _, similarity in docs_and_similarities  ):  warnings.warn(  "Relevance scores must be between"  f" 0 and 1, got {docs_and_similarities}",  stacklevel=2,  )   if score_threshold is not None:  docs_and_similarities = [  (doc, similarity)  for doc, similarity in docs_and_similarities  if similarity >= score_threshold  ]  if len(docs_and_similarities) == 0:  logger.warning(  "No relevant docs were retrieved using the "  "relevance score threshold %s",  score_threshold,  )  return docs_and_similarities 

asimilarity_search `async` ¶

asimilarity_search(  query: str, k: int = 4, **kwargs: Any ) -> list[Document]

Async return docs most similar to query.

PARAMETER	DESCRIPTION
`query` ¶	Input text. TYPE: `str`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `4`
`**kwargs` ¶	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents most similar to the query.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def asimilarity_search(  self, query: str, k: int = 4, **kwargs: Any ) -> list[Document]:  """Async return docs most similar to query.   Args:  query: Input text.  k: Number of Documents to return. Defaults to 4.  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents most similar to the query.  """  # This is a temporary workaround to make the similarity search  # asynchronous. The proper solution is to make the similarity search  # asynchronous in the vector store implementations.  return await run_in_executor(None, self.similarity_search, query, k=k, **kwargs) 

asimilarity_search_by_vector `async` ¶

asimilarity_search_by_vector(  embedding: list[float], k: int = 4, **kwargs: Any ) -> list[Document]

Async return docs most similar to embedding vector.

PARAMETER	DESCRIPTION
`embedding` ¶	Embedding to look up documents similar to. TYPE: `list[float]`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `4`
`**kwargs` ¶	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents most similar to the query vector.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def asimilarity_search_by_vector(  self, embedding: list[float], k: int = 4, **kwargs: Any ) -> list[Document]:  """Async return docs most similar to embedding vector.   Args:  embedding: Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents most similar to the query vector.  """  # This is a temporary workaround to make the similarity search  # asynchronous. The proper solution is to make the similarity search  # asynchronous in the vector store implementations.  return await run_in_executor(  None, self.similarity_search_by_vector, embedding, k=k, **kwargs  ) 

amax_marginal_relevance_search `async` ¶

amax_marginal_relevance_search(  query: str,  k: int = 4,  fetch_k: int = 20,  lambda_mult: float = 0.5,  **kwargs: Any ) -> list[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER	DESCRIPTION
`query` ¶	Text to look up documents similar to. TYPE: `str`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `4`
`fetch_k` ¶	Number of Documents to fetch to pass to MMR algorithm. Default is 20. TYPE: `int` DEFAULT: `20`
`lambda_mult` ¶	Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5. TYPE: `float` DEFAULT: `0.5`
`**kwargs` ¶	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents selected by maximal marginal relevance.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def amax_marginal_relevance_search(  self,  query: str,  k: int = 4,  fetch_k: int = 20,  lambda_mult: float = 0.5,  **kwargs: Any, ) -> list[Document]:  """Async return docs selected using the maximal marginal relevance.   Maximal marginal relevance optimizes for similarity to query AND diversity  among selected documents.   Args:  query: Text to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  fetch_k: Number of Documents to fetch to pass to MMR algorithm.  Default is 20.  lambda_mult: Number between 0 and 1 that determines the degree  of diversity among the results with 0 corresponding  to maximum diversity and 1 to minimum diversity.  Defaults to 0.5.  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents selected by maximal marginal relevance.  """  # This is a temporary workaround to make the similarity search  # asynchronous. The proper solution is to make the similarity search  # asynchronous in the vector store implementations.  return await run_in_executor(  None,  self.max_marginal_relevance_search,  query,  k=k,  fetch_k=fetch_k,  lambda_mult=lambda_mult,  **kwargs,  ) 

amax_marginal_relevance_search_by_vector `async` ¶

amax_marginal_relevance_search_by_vector(  embedding: list[float],  k: int = 4,  fetch_k: int = 20,  lambda_mult: float = 0.5,  **kwargs: Any ) -> list[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER	DESCRIPTION
`embedding` ¶	Embedding to look up documents similar to. TYPE: `list[float]`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `4`
`fetch_k` ¶	Number of Documents to fetch to pass to MMR algorithm. Default is 20. TYPE: `int` DEFAULT: `20`
`lambda_mult` ¶	Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5. TYPE: `float` DEFAULT: `0.5`
`**kwargs` ¶	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents selected by maximal marginal relevance.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

async def amax_marginal_relevance_search_by_vector(  self,  embedding: list[float],  k: int = 4,  fetch_k: int = 20,  lambda_mult: float = 0.5,  **kwargs: Any, ) -> list[Document]:  """Async return docs selected using the maximal marginal relevance.   Maximal marginal relevance optimizes for similarity to query AND diversity  among selected documents.   Args:  embedding: Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  fetch_k: Number of Documents to fetch to pass to MMR algorithm.  Default is 20.  lambda_mult: Number between 0 and 1 that determines the degree  of diversity among the results with 0 corresponding  to maximum diversity and 1 to minimum diversity.  Defaults to 0.5.  **kwargs: Arguments to pass to the search method.   Returns:  List of Documents selected by maximal marginal relevance.  """  return await run_in_executor(  None,  self.max_marginal_relevance_search_by_vector,  embedding,  k=k,  fetch_k=fetch_k,  lambda_mult=lambda_mult,  **kwargs,  ) 

afrom_documents `async` `classmethod` ¶

afrom_documents(  documents: list[Document],  embedding: Embeddings,  **kwargs: Any ) -> Self

Async return VectorStore initialized from documents and embeddings.

PARAMETER	DESCRIPTION
`documents` ¶	List of Documents to add to the vectorstore. TYPE: `list[Document]`
`embedding` ¶	Embedding function to use. TYPE: `Embeddings`
`**kwargs` ¶	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Self`	VectorStore initialized from documents and embeddings.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

@classmethod async def afrom_documents(  cls,  documents: list[Document],  embedding: Embeddings,  **kwargs: Any, ) -> Self:  """Async return VectorStore initialized from documents and embeddings.   Args:  documents: List of Documents to add to the vectorstore.  embedding: Embedding function to use.  **kwargs: Additional keyword arguments.   Returns:  VectorStore initialized from documents and embeddings.  """  texts = [d.page_content for d in documents]  metadatas = [d.metadata for d in documents]   if "ids" not in kwargs:  ids = [doc.id for doc in documents]   # If there's at least one valid ID, we'll assume that IDs  # should be used.  if any(ids):  kwargs["ids"] = ids   return await cls.afrom_texts(texts, embedding, metadatas=metadatas, **kwargs) 

afrom_texts `async` `classmethod` ¶

afrom_texts(  texts: list[str],  embedding: Embeddings,  metadatas: list[dict] | None = None,  *,  ids: list[str] | None = None,  **kwargs: Any ) -> Self

Async return VectorStore initialized from texts and embeddings.

PARAMETER	DESCRIPTION
`texts` ¶	Texts to add to the vectorstore. TYPE: `list[str]`
`embedding` ¶	Embedding function to use. TYPE: `Embeddings`
`metadatas` ¶	Optional list of metadatas associated with the texts. Default is None. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids` ¶	Optional list of IDs associated with the texts. TYPE: `list[str] \| None` DEFAULT: `None`
`**kwargs` ¶	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Self`	VectorStore initialized from texts and embeddings.

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

@classmethod async def afrom_texts(  cls,  texts: list[str],  embedding: Embeddings,  metadatas: list[dict] | None = None,  *,  ids: list[str] | None = None,  **kwargs: Any, ) -> Self:  """Async return VectorStore initialized from texts and embeddings.   Args:  texts: Texts to add to the vectorstore.  embedding: Embedding function to use.  metadatas: Optional list of metadatas associated with the texts.  Default is None.  ids: Optional list of IDs associated with the texts.  **kwargs: Additional keyword arguments.   Returns:  VectorStore initialized from texts and embeddings.  """  if ids is not None:  kwargs["ids"] = ids  return await run_in_executor(  None, cls.from_texts, texts, embedding, metadatas, **kwargs  ) 

as_retriever ¶

as_retriever(**kwargs: Any) -> VectorStoreRetriever

Return VectorStoreRetriever initialized from this VectorStore.

PARAMETER DESCRIPTION

`**kwargs` ¶

Keyword arguments to pass to the search function. Can include: search_type: Defines the type of search that the Retriever should perform. Can be "similarity" (default), "mmr", or "similarity_score_threshold". search_kwargs: Keyword arguments to pass to the search function. Can include things like: k: Amount of documents to return (Default: 4) score_threshold: Minimum relevance threshold for similarity_score_threshold fetch_k: Amount of documents to pass to MMR algorithm (Default: 20) lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5) filter: Filter by document metadata

TYPE: Any DEFAULT: {}

RETURNS	DESCRIPTION
`VectorStoreRetriever`	Retriever class for VectorStore.

Examples:

# Retrieve more documents with higher diversity # Useful if your dataset has many similar documents docsearch.as_retriever(  search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25} )  # Fetch more documents for the MMR algorithm to consider # But only return the top 5 docsearch.as_retriever(search_type="mmr", search_kwargs={"k": 5, "fetch_k": 50})  # Only retrieve documents that have a relevance score # Above a certain threshold docsearch.as_retriever(  search_type="similarity_score_threshold",  search_kwargs={"score_threshold": 0.8}, )  # Only get the single most similar document from the dataset docsearch.as_retriever(search_kwargs={"k": 1})  # Use a filter to only retrieve documents from a specific paper docsearch.as_retriever(  search_kwargs={"filter": {"paper_title": "GPT-4 Technical Report"}} )

Source code in .venv/lib/python3.13/site-packages/langchain_core/vectorstores/base.py

def as_retriever(self, **kwargs: Any) -> VectorStoreRetriever:  """Return VectorStoreRetriever initialized from this VectorStore.   Args:  **kwargs: Keyword arguments to pass to the search function.  Can include:  search_type: Defines the type of search that the Retriever should  perform. Can be "similarity" (default), "mmr", or  "similarity_score_threshold".  search_kwargs: Keyword arguments to pass to the search function. Can  include things like:  k: Amount of documents to return (Default: 4)  score_threshold: Minimum relevance threshold  for similarity_score_threshold  fetch_k: Amount of documents to pass to MMR algorithm  (Default: 20)  lambda_mult: Diversity of results returned by MMR;  1 for minimum diversity and 0 for maximum. (Default: 0.5)  filter: Filter by document metadata   Returns:  Retriever class for VectorStore.   Examples:  ```python  # Retrieve more documents with higher diversity  # Useful if your dataset has many similar documents  docsearch.as_retriever(  search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25}  )   # Fetch more documents for the MMR algorithm to consider  # But only return the top 5  docsearch.as_retriever(search_type="mmr", search_kwargs={"k": 5, "fetch_k": 50})   # Only retrieve documents that have a relevance score  # Above a certain threshold  docsearch.as_retriever(  search_type="similarity_score_threshold",  search_kwargs={"score_threshold": 0.8},  )   # Only get the single most similar document from the dataset  docsearch.as_retriever(search_kwargs={"k": 1})   # Use a filter to only retrieve documents from a specific paper  docsearch.as_retriever(  search_kwargs={"filter": {"paper_title": "GPT-4 Technical Report"}}  )  ```  """  tags = kwargs.pop("tags", None) or [*self._get_retriever_tags()]  return VectorStoreRetriever(vectorstore=self, tags=tags, **kwargs) 

__ensure_collection ¶

__ensure_collection() -> None

Ensure that the collection exists or create it.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def __ensure_collection(self) -> None:  """Ensure that the collection exists or create it."""  self._chroma_collection = self._client.get_or_create_collection(  name=self._collection_name,  embedding_function=None,  metadata=self._collection_metadata,  configuration=self._collection_configuration,  ) 

__query_collection ¶

__query_collection(  query_texts: list[str] | None = None,  query_embeddings: list[list[float]] | None = None,  n_results: int = 4,  where: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document] | QueryResult

Query the chroma collection.

PARAMETER	DESCRIPTION
`query_texts` ¶	List of query texts. TYPE: `list[str] \| None` DEFAULT: `None`
`query_embeddings` ¶	List of query embeddings. TYPE: `list[list[float]] \| None` DEFAULT: `None`
`n_results` ¶	Number of results to return. Defaults to 4. TYPE: `int` DEFAULT: `4`
`where` ¶	dict used to filter results by metadata. E.g. {"color" : "red"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document` ¶	dict used to filter by the document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document] \| QueryResult`	List of `n_results` nearest neighbor embeddings for provided
`list[Document] \| QueryResult`	query_embeddings or query_texts.

See more: https://docs.trychroma.com/reference/py-collection#query

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

@xor_args(("query_texts", "query_embeddings")) def __query_collection(  self,  query_texts: list[str] | None = None,  query_embeddings: list[list[float]] | None = None,  n_results: int = 4,  where: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[Document] | chromadb.QueryResult:  """Query the chroma collection.   Args:  query_texts: List of query texts.  query_embeddings: List of query embeddings.  n_results: Number of results to return. Defaults to 4.  where: dict used to filter results by metadata.  E.g. {"color" : "red"}.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of `n_results` nearest neighbor embeddings for provided  query_embeddings or query_texts.   See more: https://docs.trychroma.com/reference/py-collection#query  """  return self._collection.query(  query_texts=query_texts,  query_embeddings=query_embeddings, # type: ignore[arg-type]  n_results=n_results,  where=where, # type: ignore[arg-type]  where_document=where_document, # type: ignore[arg-type]  **kwargs,  ) 

encode_image `staticmethod` ¶

encode_image(uri: str) -> str

Get base64 string from image URI.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

@staticmethod def encode_image(uri: str) -> str:  """Get base64 string from image URI."""  with Path(uri).open("rb") as image_file:  return base64.b64encode(image_file.read()).decode("utf-8") 

fork ¶

fork(new_name: str) -> Chroma

Fork this vector store.

PARAMETER	DESCRIPTION
`new_name` ¶	New name for the forked store. TYPE: `str`

RETURNS	DESCRIPTION
`Chroma`	A new Chroma store forked from this vector store.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def fork(self, new_name: str) -> Chroma:  """Fork this vector store.   Args:  new_name: New name for the forked store.   Returns:  A new Chroma store forked from this vector store.   """  forked_collection = self._collection.fork(new_name=new_name)  return Chroma(  client=self._client,  embedding_function=self._embedding_function,  collection_name=forked_collection.name,  ) 

add_images ¶

add_images(  uris: list[str],  metadatas: list[dict] | None = None,  ids: list[str] | None = None, ) -> list[str]

Run more images through the embeddings and add to the vectorstore.

PARAMETER	DESCRIPTION
`uris` ¶	File path to the image. TYPE: `list[str]`
`metadatas` ¶	Optional list of metadatas. When querying, you can filter on this metadata. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids` ¶	Optional list of IDs. (Items without IDs will be assigned UUIDs) TYPE: `list[str] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[str]`	List of IDs of the added images.

RAISES	DESCRIPTION
`ValueError`	When metadata is incorrect.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def add_images(  self,  uris: list[str],  metadatas: list[dict] | None = None,  ids: list[str] | None = None, ) -> list[str]:  """Run more images through the embeddings and add to the vectorstore.   Args:  uris: File path to the image.  metadatas: Optional list of metadatas.  When querying, you can filter on this metadata.  ids: Optional list of IDs. (Items without IDs will be assigned UUIDs)   Returns:  List of IDs of the added images.   Raises:  ValueError: When metadata is incorrect.  """  # Map from uris to b64 encoded strings  b64_texts = [self.encode_image(uri=uri) for uri in uris]  # Populate IDs  if ids is None:  ids = [str(uuid.uuid4()) for _ in uris]  else:  ids = [id_ if id_ is not None else str(uuid.uuid4()) for id_ in ids]  embeddings = None  # Set embeddings  if self._embedding_function is not None and hasattr(  self._embedding_function,  "embed_image",  ):  embeddings = self._embedding_function.embed_image(uris=uris)  if metadatas:  # fill metadatas with empty dicts if somebody  # did not specify metadata for all images  length_diff = len(uris) - len(metadatas)  if length_diff:  metadatas = metadatas + [{}] * length_diff  empty_ids = []  non_empty_ids = []  for idx, m in enumerate(metadatas):  if m:  non_empty_ids.append(idx)  else:  empty_ids.append(idx)  if non_empty_ids:  metadatas = [metadatas[idx] for idx in non_empty_ids]  images_with_metadatas = [b64_texts[idx] for idx in non_empty_ids]  embeddings_with_metadatas = (  [embeddings[idx] for idx in non_empty_ids] if embeddings else None  )  ids_with_metadata = [ids[idx] for idx in non_empty_ids]  try:  self._collection.upsert(  metadatas=metadatas, # type: ignore[arg-type]  embeddings=embeddings_with_metadatas, # type: ignore[arg-type]  documents=images_with_metadatas,  ids=ids_with_metadata,  )  except ValueError as e:  if "Expected metadata value to be" in str(e):  msg = (  "Try filtering complex metadata using "  "langchain_community.vectorstores.utils.filter_complex_metadata."  )  raise ValueError(e.args[0] + "\n\n" + msg) from e  raise e  if empty_ids:  images_without_metadatas = [b64_texts[j] for j in empty_ids]  embeddings_without_metadatas = (  [embeddings[j] for j in empty_ids] if embeddings else None  )  ids_without_metadatas = [ids[j] for j in empty_ids]  self._collection.upsert(  embeddings=embeddings_without_metadatas,  documents=images_without_metadatas,  ids=ids_without_metadatas,  )  else:  self._collection.upsert(  embeddings=embeddings,  documents=b64_texts,  ids=ids,  )  return ids 

add_texts ¶

add_texts(  texts: Iterable[str],  metadatas: list[dict] | None = None,  ids: list[str] | None = None,  **kwargs: Any ) -> list[str]

Run more texts through the embeddings and add to the vectorstore.

PARAMETER	DESCRIPTION
`texts` ¶	Texts to add to the vectorstore. TYPE: `Iterable[str]`
`metadatas` ¶	Optional list of metadatas. When querying, you can filter on this metadata. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids` ¶	Optional list of IDs. (Items without IDs will be assigned UUIDs) TYPE: `list[str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[str]`	List of IDs of the added texts.

RAISES	DESCRIPTION
`ValueError`	When metadata is incorrect.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def add_texts(  self,  texts: Iterable[str],  metadatas: list[dict] | None = None,  ids: list[str] | None = None,  **kwargs: Any, ) -> list[str]:  """Run more texts through the embeddings and add to the vectorstore.   Args:  texts: Texts to add to the vectorstore.  metadatas: Optional list of metadatas.  When querying, you can filter on this metadata.  ids: Optional list of IDs. (Items without IDs will be assigned UUIDs)  kwargs: Additional keyword arguments.   Returns:  List of IDs of the added texts.   Raises:  ValueError: When metadata is incorrect.  """  if ids is None:  ids = [str(uuid.uuid4()) for _ in texts]  else:  ids = [id_ if id_ is not None else str(uuid.uuid4()) for id_ in ids]   embeddings = None  texts = list(texts)  if self._embedding_function is not None:  embeddings = self._embedding_function.embed_documents(texts)  if metadatas:  # fill metadatas with empty dicts if somebody  # did not specify metadata for all texts  length_diff = len(texts) - len(metadatas)  if length_diff:  metadatas = metadatas + [{}] * length_diff  empty_ids = []  non_empty_ids = []  for idx, m in enumerate(metadatas):  if m:  non_empty_ids.append(idx)  else:  empty_ids.append(idx)  if non_empty_ids:  metadatas = [metadatas[idx] for idx in non_empty_ids]  texts_with_metadatas = [texts[idx] for idx in non_empty_ids]  embeddings_with_metadatas = (  [embeddings[idx] for idx in non_empty_ids]  if embeddings is not None and len(embeddings) > 0  else None  )  ids_with_metadata = [ids[idx] for idx in non_empty_ids]  try:  self._collection.upsert(  metadatas=metadatas, # type: ignore[arg-type]  embeddings=embeddings_with_metadatas, # type: ignore[arg-type]  documents=texts_with_metadatas,  ids=ids_with_metadata,  )  except ValueError as e:  if "Expected metadata value to be" in str(e):  msg = (  "Try filtering complex metadata from the document using "  "langchain_community.vectorstores.utils.filter_complex_metadata."  )  raise ValueError(e.args[0] + "\n\n" + msg) from e  raise e  if empty_ids:  texts_without_metadatas = [texts[j] for j in empty_ids]  embeddings_without_metadatas = (  [embeddings[j] for j in empty_ids] if embeddings else None  )  ids_without_metadatas = [ids[j] for j in empty_ids]  self._collection.upsert(  embeddings=embeddings_without_metadatas, # type: ignore[arg-type]  documents=texts_without_metadatas,  ids=ids_without_metadatas,  )  else:  self._collection.upsert(  embeddings=embeddings, # type: ignore[arg-type]  documents=texts,  ids=ids,  )  return ids 

similarity_search ¶

similarity_search(  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document]

Run similarity search with Chroma.

PARAMETER	DESCRIPTION
`query` ¶	Query text to search for. TYPE: `str`
`k` ¶	Number of results to return. Defaults to 4. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of documents most similar to the query text.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def similarity_search(  self,  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  **kwargs: Any, ) -> list[Document]:  """Run similarity search with Chroma.   Args:  query: Query text to search for.  k: Number of results to return. Defaults to 4.  filter: Filter by metadata.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of documents most similar to the query text.  """  docs_and_scores = self.similarity_search_with_score(  query,  k,  filter=filter,  **kwargs,  )  return [doc for doc, _ in docs_and_scores] 

similarity_search_by_vector ¶

similarity_search_by_vector(  embedding: list[float],  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document]

Return docs most similar to embedding vector.

PARAMETER	DESCRIPTION
`embedding` ¶	Embedding to look up documents similar to. TYPE: `list[float]`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document` ¶	dict used to filter by the document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents most similar to the query vector.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def similarity_search_by_vector(  self,  embedding: list[float],  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[Document]:  """Return docs most similar to embedding vector.   Args:  embedding: Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  filter: Filter by metadata.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of Documents most similar to the query vector.  """  results = self.__query_collection(  query_embeddings=[embedding],  n_results=k,  where=filter,  where_document=where_document,  **kwargs,  )  return _results_to_docs(results) 

similarity_search_by_vector_with_relevance_scores ¶

similarity_search_by_vector_with_relevance_scores(  embedding: list[float],  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[tuple[Document, float]]

Return docs most similar to embedding vector and similarity score.

PARAMETER	DESCRIPTION
`embedding` ¶	Embedding to look up documents similar to. TYPE: `List[float]`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document` ¶	dict used to filter by the documents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of documents most similar to the query text and relevance score
`list[tuple[Document, float]]`	in float for each. Lower score represents more similarity.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def similarity_search_by_vector_with_relevance_scores(  self,  embedding: list[float],  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Return docs most similar to embedding vector and similarity score.   Args:  embedding (List[float]): Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  filter: Filter by metadata.  where_document: dict used to filter by the documents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of documents most similar to the query text and relevance score  in float for each. Lower score represents more similarity.  """  results = self.__query_collection(  query_embeddings=[embedding],  n_results=k,  where=filter,  where_document=where_document,  **kwargs,  )  return _results_to_docs_and_scores(results) 

similarity_search_with_score ¶

similarity_search_with_score(  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[tuple[Document, float]]

Run similarity search with Chroma with distance.

PARAMETER	DESCRIPTION
`query` ¶	Query text to search for. TYPE: `str`
`k` ¶	Number of results to return. Defaults to 4. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document` ¶	dict used to filter by document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of documents most similar to the query text and
`list[tuple[Document, float]]`	distance in float for each. Lower score represents more similarity.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def similarity_search_with_score(  self,  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Run similarity search with Chroma with distance.   Args:  query: Query text to search for.  k: Number of results to return. Defaults to 4.  filter: Filter by metadata.  where_document: dict used to filter by document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of documents most similar to the query text and  distance in float for each. Lower score represents more similarity.  """  if self._embedding_function is None:  results = self.__query_collection(  query_texts=[query],  n_results=k,  where=filter,  where_document=where_document,  **kwargs,  )  else:  query_embedding = self._embedding_function.embed_query(query)  results = self.__query_collection(  query_embeddings=[query_embedding],  n_results=k,  where=filter,  where_document=where_document,  **kwargs,  )   return _results_to_docs_and_scores(results) 

similarity_search_with_vectors ¶

similarity_search_with_vectors(  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[tuple[Document, ndarray]]

Run similarity search with Chroma with vectors.

PARAMETER	DESCRIPTION
`query` ¶	Query text to search for. TYPE: `str`
`k` ¶	Number of results to return. Defaults to 4. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document` ¶	dict used to filter by the document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, ndarray]]`	List of documents most similar to the query text and
`list[tuple[Document, ndarray]]`	embedding vectors for each.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def similarity_search_with_vectors(  self,  query: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[tuple[Document, np.ndarray]]:  """Run similarity search with Chroma with vectors.   Args:  query: Query text to search for.  k: Number of results to return. Defaults to 4.  filter: Filter by metadata.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of documents most similar to the query text and  embedding vectors for each.  """  include = ["documents", "metadatas", "embeddings"]  if self._embedding_function is None:  results = self.__query_collection(  query_texts=[query],  n_results=k,  where=filter,  where_document=where_document,  include=include,  **kwargs,  )  else:  query_embedding = self._embedding_function.embed_query(query)  results = self.__query_collection(  query_embeddings=[query_embedding],  n_results=k,  where=filter,  where_document=where_document,  include=include,  **kwargs,  )   return _results_to_docs_and_vectors(results) 

similarity_search_by_image ¶

similarity_search_by_image(  uri: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document]

Search for similar images based on the given image URI.

PARAMETER	DESCRIPTION
`uri` ¶	URI of the image to search for. TYPE: `str`
`k` ¶	Number of results to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`**kwargs` ¶	Additional arguments to pass to function. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Images most similar to the provided image. Each element in list is a
`list[Document]`	LangChain Document Object. The page content is b64 encoded image, metadata
`list[Document]`	is default or as defined by user.

RAISES	DESCRIPTION
`ValueError`	If the embedding function does not support image embeddings.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def similarity_search_by_image(  self,  uri: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  **kwargs: Any, ) -> list[Document]:  """Search for similar images based on the given image URI.   Args:  uri: URI of the image to search for.  k: Number of results to return.  filter: Filter by metadata.  **kwargs: Additional arguments to pass to function.    Returns:  List of Images most similar to the provided image. Each element in list is a  LangChain Document Object. The page content is b64 encoded image, metadata  is default or as defined by user.   Raises:  ValueError: If the embedding function does not support image embeddings.  """  if self._embedding_function is not None and hasattr(  self._embedding_function, "embed_image"  ):  # Obtain image embedding  # Assuming embed_image returns a single embedding  image_embedding = self._embedding_function.embed_image(uris=[uri])   # Perform similarity search based on the obtained embedding  return self.similarity_search_by_vector(  embedding=image_embedding,  k=k,  filter=filter,  **kwargs,  )  msg = "The embedding function must support image embedding."  raise ValueError(msg) 

similarity_search_by_image_with_relevance_score ¶

similarity_search_by_image_with_relevance_score(  uri: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None,  **kwargs: Any ) -> list[tuple[Document, float]]

Search for similar images based on the given image URI.

PARAMETER	DESCRIPTION
`uri` ¶	URI of the image to search for. TYPE: `str`
`k` ¶	Number of results to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`**kwargs` ¶	Additional arguments to pass to function. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of tuples containing documents similar to the query image and their
`list[tuple[Document, float]]`	similarity scores. 0^th element in each tuple is a LangChain Document Object.
`list[tuple[Document, float]]`	The page content is b64 encoded img, metadata is default or defined by user.

RAISES	DESCRIPTION
`ValueError`	If the embedding function does not support image embeddings.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def similarity_search_by_image_with_relevance_score(  self,  uri: str,  k: int = DEFAULT_K,  filter: dict[str, str] | None = None, # noqa: A002  **kwargs: Any, ) -> list[tuple[Document, float]]:  """Search for similar images based on the given image URI.   Args:  uri: URI of the image to search for.  k: Number of results to return.  filter: Filter by metadata.  **kwargs: Additional arguments to pass to function.   Returns:  List of tuples containing documents similar to the query image and their  similarity scores. 0th element in each tuple is a LangChain Document Object.  The page content is b64 encoded img, metadata is default or defined by user.   Raises:  ValueError: If the embedding function does not support image embeddings.  """  if self._embedding_function is not None and hasattr(  self._embedding_function, "embed_image"  ):  # Obtain image embedding  # Assuming embed_image returns a single embedding  image_embedding = self._embedding_function.embed_image(uris=[uri])   # Perform similarity search based on the obtained embedding  return self.similarity_search_by_vector_with_relevance_scores(  embedding=image_embedding,  k=k,  filter=filter,  **kwargs,  )  msg = "The embedding function must support image embedding."  raise ValueError(msg) 

max_marginal_relevance_search_by_vector ¶

max_marginal_relevance_search_by_vector(  embedding: list[float],  k: int = DEFAULT_K,  fetch_k: int = 20,  lambda_mult: float = 0.5,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER	DESCRIPTION
`embedding` ¶	Embedding to look up documents similar to. TYPE: `list[float]`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `DEFAULT_K`
`fetch_k` ¶	Number of Documents to fetch to pass to MMR algorithm. Defaults to 20. TYPE: `int` DEFAULT: `20`
`lambda_mult` ¶	Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5. TYPE: `float` DEFAULT: `0.5`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document` ¶	dict used to filter by the document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents selected by maximal marginal relevance.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def max_marginal_relevance_search_by_vector(  self,  embedding: list[float],  k: int = DEFAULT_K,  fetch_k: int = 20,  lambda_mult: float = 0.5,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[Document]:  """Return docs selected using the maximal marginal relevance.   Maximal marginal relevance optimizes for similarity to query AND diversity  among selected documents.   Args:  embedding: Embedding to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  fetch_k: Number of Documents to fetch to pass to MMR algorithm. Defaults to  20.  lambda_mult: Number between 0 and 1 that determines the degree  of diversity among the results with 0 corresponding  to maximum diversity and 1 to minimum diversity.  Defaults to 0.5.  filter: Filter by metadata.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of Documents selected by maximal marginal relevance.  """  results = self.__query_collection(  query_embeddings=[embedding],  n_results=fetch_k,  where=filter,  where_document=where_document,  include=["metadatas", "documents", "distances", "embeddings"],  **kwargs,  )  mmr_selected = maximal_marginal_relevance(  np.array(embedding, dtype=np.float32),  results["embeddings"][0],  k=k,  lambda_mult=lambda_mult,  )   candidates = _results_to_docs(results)   return [r for i, r in enumerate(candidates) if i in mmr_selected] 

max_marginal_relevance_search ¶

max_marginal_relevance_search(  query: str,  k: int = DEFAULT_K,  fetch_k: int = 20,  lambda_mult: float = 0.5,  filter: dict[str, str] | None = None,  where_document: dict[str, str] | None = None,  **kwargs: Any ) -> list[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER	DESCRIPTION
`query` ¶	Text to look up documents similar to. TYPE: `str`
`k` ¶	Number of Documents to return. Defaults to 4. TYPE: `int` DEFAULT: `DEFAULT_K`
`fetch_k` ¶	Number of Documents to fetch to pass to MMR algorithm. TYPE: `int` DEFAULT: `20`
`lambda_mult` ¶	Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5. TYPE: `float` DEFAULT: `0.5`
`filter` ¶	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document` ¶	dict used to filter by the document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents selected by maximal marginal relevance.

RAISES	DESCRIPTION
`ValueError`	If the embedding function is not provided.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def max_marginal_relevance_search(  self,  query: str,  k: int = DEFAULT_K,  fetch_k: int = 20,  lambda_mult: float = 0.5,  filter: dict[str, str] | None = None, # noqa: A002  where_document: dict[str, str] | None = None,  **kwargs: Any, ) -> list[Document]:  """Return docs selected using the maximal marginal relevance.   Maximal marginal relevance optimizes for similarity to query AND diversity  among selected documents.   Args:  query: Text to look up documents similar to.  k: Number of Documents to return. Defaults to 4.  fetch_k: Number of Documents to fetch to pass to MMR algorithm.  lambda_mult: Number between 0 and 1 that determines the degree  of diversity among the results with 0 corresponding  to maximum diversity and 1 to minimum diversity.  Defaults to 0.5.  filter: Filter by metadata.  where_document: dict used to filter by the document contents.  E.g. {"$contains": "hello"}.  kwargs: Additional keyword arguments to pass to Chroma collection query.   Returns:  List of Documents selected by maximal marginal relevance.   Raises:  ValueError: If the embedding function is not provided.  """  if self._embedding_function is None:  msg = "For MMR search, you must specify an embedding function on creation."  raise ValueError(  msg,  )   embedding = self._embedding_function.embed_query(query)  return self.max_marginal_relevance_search_by_vector(  embedding,  k,  fetch_k,  lambda_mult=lambda_mult,  filter=filter,  where_document=where_document,  ) 

delete_collection ¶

delete_collection() -> None

Delete the collection.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def delete_collection(self) -> None:  """Delete the collection."""  self._client.delete_collection(self._collection.name)  self._chroma_collection = None 

reset_collection ¶

reset_collection() -> None

Resets the collection.

Resets the collection by deleting the collection and recreating an empty one.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def reset_collection(self) -> None:  """Resets the collection.   Resets the collection by deleting the collection and recreating an empty one.  """  self.delete_collection()  self.__ensure_collection() 

get ¶

get(  ids: str | list[str] | None = None,  where: Where | None = None,  limit: int | None = None,  offset: int | None = None,  where_document: WhereDocument | None = None,  include: list[str] | None = None, ) -> dict[str, Any]

Gets the collection.

PARAMETER	DESCRIPTION
`ids` ¶	The ids of the embeddings to get. Optional. TYPE: `str \| list[str] \| None` DEFAULT: `None`
`where` ¶	A Where type dict used to filter results by. E.g. `{"$and": [{"color": "red"}, {"price": 4.20}]}` Optional. TYPE: `Where \| None` DEFAULT: `None`
`limit` ¶	The number of documents to return. Optional. TYPE: `int \| None` DEFAULT: `None`
`offset` ¶	The offset to start returning results from. Useful for paging results with limit. Optional. TYPE: `int \| None` DEFAULT: `None`
`where_document` ¶	A WhereDocument type dict used to filter by the documents. E.g. `{"$contains": "hello"}`. Optional. TYPE: `WhereDocument \| None` DEFAULT: `None`
`include` ¶	A list of what to include in the results. Can contain `"embeddings"`, `"metadatas"`, `"documents"`. Ids are always included. Defaults to `["metadatas", "documents"]`. Optional. TYPE: `list[str] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict[str, Any]`	A dict with the keys `"ids"`, `"embeddings"`, `"metadatas"`, `"documents"`.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def get(  self,  ids: str | list[str] | None = None,  where: Where | None = None,  limit: int | None = None,  offset: int | None = None,  where_document: WhereDocument | None = None,  include: list[str] | None = None, ) -> dict[str, Any]:  """Gets the collection.   Args:  ids: The ids of the embeddings to get. Optional.  where: A Where type dict used to filter results by.  E.g. `{"$and": [{"color": "red"}, {"price": 4.20}]}` Optional.  limit: The number of documents to return. Optional.  offset: The offset to start returning results from.  Useful for paging results with limit. Optional.  where_document: A WhereDocument type dict used to filter by the documents.  E.g. `{"$contains": "hello"}`. Optional.  include: A list of what to include in the results.  Can contain `"embeddings"`, `"metadatas"`, `"documents"`.  Ids are always included.  Defaults to `["metadatas", "documents"]`. Optional.   Returns:  A dict with the keys `"ids"`, `"embeddings"`, `"metadatas"`, `"documents"`.  """  kwargs = {  "ids": ids,  "where": where,  "limit": limit,  "offset": offset,  "where_document": where_document,  }   if include is not None:  kwargs["include"] = include   return self._collection.get(**kwargs) # type: ignore[arg-type, return-value] 

get_by_ids ¶

get_by_ids(ids: Sequence[str]) -> list[Document]

Get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

PARAMETER	DESCRIPTION
`ids` ¶	List of ids to retrieve. TYPE: `Sequence[str]`

RETURNS	DESCRIPTION
`list[Document]`	List of Documents.

Added in 0.2.1

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def get_by_ids(self, ids: Sequence[str], /) -> list[Document]:  """Get documents by their IDs.   The returned documents are expected to have the ID field set to the ID of the  document in the vector store.   Fewer documents may be returned than requested if some IDs are not found or  if there are duplicated IDs.   Users should not assume that the order of the returned documents matches  the order of the input IDs. Instead, users should rely on the ID field of the  returned documents.   This method should **NOT** raise exceptions if no documents are found for  some IDs.   Args:  ids: List of ids to retrieve.   Returns:  List of Documents.   !!! version-added "Added in 0.2.1"  """  results = self.get(ids=list(ids))  return [  Document(page_content=doc, metadata=meta, id=doc_id)  for doc, meta, doc_id in zip(  results["documents"],  results["metadatas"],  results["ids"],  strict=False,  )  ] 

update_document ¶

update_document(  document_id: str, document: Document ) -> None

Update a document in the collection.

PARAMETER	DESCRIPTION
`document_id` ¶	ID of the document to update. TYPE: `str`
`document` ¶	Document to update. TYPE: `Document`

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def update_document(self, document_id: str, document: Document) -> None:  """Update a document in the collection.   Args:  document_id: ID of the document to update.  document: Document to update.  """  return self.update_documents([document_id], [document]) 

update_documents ¶

update_documents(  ids: list[str], documents: list[Document] ) -> None

Update a document in the collection.

PARAMETER	DESCRIPTION
`ids` ¶	List of ids of the document to update. TYPE: `list[str]`
`documents` ¶	List of documents to update. TYPE: `list[Document]`

RAISES	DESCRIPTION
`ValueError`	If the embedding function is not provided.

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def update_documents(self, ids: list[str], documents: list[Document]) -> None:  """Update a document in the collection.   Args:  ids: List of ids of the document to update.  documents: List of documents to update.   Raises:  ValueError: If the embedding function is not provided.  """  text = [document.page_content for document in documents]  metadata = [document.metadata for document in documents]  if self._embedding_function is None:  msg = "For update, you must specify an embedding function on creation."  raise ValueError(  msg,  )  embeddings = self._embedding_function.embed_documents(text)   if hasattr(  self._client,  "get_max_batch_size",  ) or hasattr( # for Chroma 0.5.1 and above  self._client,  "max_batch_size",  ): # for Chroma 0.4.10 and above  from chromadb.utils.batch_utils import create_batches   for batch in create_batches(  api=self._client,  ids=ids,  metadatas=metadata, # type: ignore[arg-type]  documents=text,  embeddings=embeddings, # type: ignore[arg-type]  ):  self._collection.update(  ids=batch[0],  embeddings=batch[1],  documents=batch[3],  metadatas=batch[2],  )  else:  self._collection.update(  ids=ids,  embeddings=embeddings, # type: ignore[arg-type]  documents=text,  metadatas=metadata, # type: ignore[arg-type]  ) 

from_texts `classmethod` ¶

from_texts(  texts: list[str],  embedding: Embeddings | None = None,  metadatas: list[dict] | None = None,  ids: list[str] | None = None,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: Settings | None = None,  client: ClientAPI | None = None,  collection_metadata: dict | None = None,  collection_configuration: (  CreateCollectionConfiguration | None  ) = None,  *,  ssl: bool = False,  **kwargs: Any ) -> Chroma

Create a Chroma vectorstore from a raw documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

PARAMETER	DESCRIPTION
`texts` ¶	List of texts to add to the collection. TYPE: `list[str]`
`collection_name` ¶	Name of the collection to create. TYPE: `str` DEFAULT: `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`persist_directory` ¶	Directory to persist the collection. TYPE: `str \| None` DEFAULT: `None`
`host` ¶	Hostname of a deployed Chroma server. TYPE: `str \| None` DEFAULT: `None`
`port` ¶	Connection port for a deployed Chroma server. Default is 8000. TYPE: `int \| None` DEFAULT: `None`
`ssl` ¶	Whether to establish an SSL connection with a deployed Chroma server. Default is False. TYPE: `bool` DEFAULT: `False`
`headers` ¶	HTTP headers to send to a deployed Chroma server. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`chroma_cloud_api_key` ¶	Chroma Cloud API key. TYPE: `str \| None` DEFAULT: `None`
`tenant` ¶	Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. TYPE: `str \| None` DEFAULT: `None`
`database` ¶	Database name. Required for Chroma Cloud connections. Default is 'default_database'. TYPE: `str \| None` DEFAULT: `None`
`embedding` ¶	Embedding function. TYPE: `Embeddings \| None` DEFAULT: `None`
`metadatas` ¶	List of metadatas. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids` ¶	List of document IDs. TYPE: `list[str] \| None` DEFAULT: `None`
`client_settings` ¶	Chroma client settings. TYPE: `Settings \| None` DEFAULT: `None`
`client` ¶	Chroma client. Documentation: https://docs.trychroma.com/reference/python/client TYPE: `ClientAPI \| None` DEFAULT: `None`
`collection_metadata` ¶	Collection configurations. TYPE: `dict \| None` DEFAULT: `None`
`collection_configuration` ¶	Index configuration for the collection. TYPE: `CreateCollectionConfiguration \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to initialize a Chroma client. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Chroma`	Chroma vectorstore. TYPE: `Chroma`

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

@classmethod def from_texts(  cls: type[Chroma],  texts: list[str],  embedding: Embeddings | None = None,  metadatas: list[dict] | None = None,  ids: list[str] | None = None,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: chromadb.config.Settings | None = None,  client: chromadb.ClientAPI | None = None,  collection_metadata: dict | None = None,  collection_configuration: CreateCollectionConfiguration | None = None,  *,  ssl: bool = False,  **kwargs: Any, ) -> Chroma:  """Create a Chroma vectorstore from a raw documents.   If a persist_directory is specified, the collection will be persisted there.  Otherwise, the data will be ephemeral in-memory.   Args:  texts: List of texts to add to the collection.  collection_name: Name of the collection to create.  persist_directory: Directory to persist the collection.  host: Hostname of a deployed Chroma server.  port: Connection port for a deployed Chroma server.  Default is 8000.  ssl: Whether to establish an SSL connection with a deployed Chroma server.  Default is False.  headers: HTTP headers to send to a deployed Chroma server.  chroma_cloud_api_key: Chroma Cloud API key.  tenant: Tenant ID. Required for Chroma Cloud connections.  Default is 'default_tenant' for local Chroma servers.  database: Database name. Required for Chroma Cloud connections.  Default is 'default_database'.  embedding: Embedding function.  metadatas: List of metadatas.  ids: List of document IDs.  client_settings: Chroma client settings.  client: Chroma client. Documentation:  https://docs.trychroma.com/reference/python/client  collection_metadata: Collection configurations.  collection_configuration: Index configuration for the collection.   kwargs: Additional keyword arguments to initialize a Chroma client.   Returns:  Chroma: Chroma vectorstore.  """  chroma_collection = cls(  collection_name=collection_name,  embedding_function=embedding,  persist_directory=persist_directory,  host=host,  port=port,  ssl=ssl,  headers=headers,  chroma_cloud_api_key=chroma_cloud_api_key,  tenant=tenant,  database=database,  client_settings=client_settings,  client=client,  collection_metadata=collection_metadata,  collection_configuration=collection_configuration,  **kwargs,  )  if ids is None:  ids = [str(uuid.uuid4()) for _ in texts]  else:  ids = [id_ if id_ is not None else str(uuid.uuid4()) for id_ in ids]  if hasattr(  chroma_collection._client,  "get_max_batch_size",  ) or hasattr( # for Chroma 0.5.1 and above  chroma_collection._client,  "max_batch_size",  ): # for Chroma 0.4.10 and above  from chromadb.utils.batch_utils import create_batches   for batch in create_batches(  api=chroma_collection._client,  ids=ids,  metadatas=metadatas, # type: ignore[arg-type]  documents=texts,  ):  chroma_collection.add_texts(  texts=batch[3] if batch[3] else [],  metadatas=batch[2] if batch[2] else None, # type: ignore[arg-type]  ids=batch[0],  )  else:  chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)  return chroma_collection 

from_documents `classmethod` ¶

from_documents(  documents: list[Document],  embedding: Embeddings | None = None,  ids: list[str] | None = None,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: Settings | None = None,  client: ClientAPI | None = None,  collection_metadata: dict | None = None,  collection_configuration: (  CreateCollectionConfiguration | None  ) = None,  *,  ssl: bool = False,  **kwargs: Any ) -> Chroma

Create a Chroma vectorstore from a list of documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

PARAMETER	DESCRIPTION
`collection_name` ¶	Name of the collection to create. TYPE: `str` DEFAULT: `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`persist_directory` ¶	Directory to persist the collection. TYPE: `str \| None` DEFAULT: `None`
`host` ¶	Hostname of a deployed Chroma server. TYPE: `str \| None` DEFAULT: `None`
`port` ¶	Connection port for a deployed Chroma server. Default is 8000. TYPE: `int \| None` DEFAULT: `None`
`ssl` ¶	Whether to establish an SSL connection with a deployed Chroma server. Default is False. TYPE: `bool` DEFAULT: `False`
`headers` ¶	HTTP headers to send to a deployed Chroma server. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`chroma_cloud_api_key` ¶	Chroma Cloud API key. TYPE: `str \| None` DEFAULT: `None`
`tenant` ¶	Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. TYPE: `str \| None` DEFAULT: `None`
`database` ¶	Database name. Required for Chroma Cloud connections. Default is 'default_database'. TYPE: `str \| None` DEFAULT: `None`
`ids` ¶	List of document IDs.
`documents` ¶	List of documents to add to the vectorstore. TYPE: `list[Document]`
`embedding` ¶	Embedding function. TYPE: `Embeddings \| None` DEFAULT: `None`
`client_settings` ¶	Chroma client settings. TYPE: `Settings \| None` DEFAULT: `None`
`client` ¶	Chroma client. Documentation: https://docs.trychroma.com/reference/python/client TYPE: `ClientAPI \| None` DEFAULT: `None`
`collection_metadata` ¶	Collection configurations. TYPE: `dict \| None` DEFAULT: `None`
`collection_configuration` ¶	Index configuration for the collection. TYPE: `CreateCollectionConfiguration \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments to initialize a Chroma client. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Chroma`	Chroma vectorstore. TYPE: `Chroma`

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

@classmethod def from_documents(  cls: type[Chroma],  documents: list[Document],  embedding: Embeddings | None = None,  ids: list[str] | None = None,  collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,  persist_directory: str | None = None,  host: str | None = None,  port: int | None = None,  headers: dict[str, str] | None = None,  chroma_cloud_api_key: str | None = None,  tenant: str | None = None,  database: str | None = None,  client_settings: chromadb.config.Settings | None = None,  client: chromadb.ClientAPI | None = None, # Add this line  collection_metadata: dict | None = None,  collection_configuration: CreateCollectionConfiguration | None = None,  *,  ssl: bool = False,  **kwargs: Any, ) -> Chroma:  """Create a Chroma vectorstore from a list of documents.   If a persist_directory is specified, the collection will be persisted there.  Otherwise, the data will be ephemeral in-memory.   Args:  collection_name: Name of the collection to create.  persist_directory: Directory to persist the collection.  host: Hostname of a deployed Chroma server.  port: Connection port for a deployed Chroma server. Default is 8000.  ssl: Whether to establish an SSL connection with a deployed Chroma server.  Default is False.  headers: HTTP headers to send to a deployed Chroma server.  chroma_cloud_api_key: Chroma Cloud API key.  tenant: Tenant ID. Required for Chroma Cloud connections.  Default is 'default_tenant' for local Chroma servers.  database: Database name. Required for Chroma Cloud connections.  Default is 'default_database'.  ids : List of document IDs.  documents: List of documents to add to the vectorstore.  embedding: Embedding function.  client_settings: Chroma client settings.  client: Chroma client. Documentation:  https://docs.trychroma.com/reference/python/client  collection_metadata: Collection configurations.  collection_configuration: Index configuration for the collection.   kwargs: Additional keyword arguments to initialize a Chroma client.   Returns:  Chroma: Chroma vectorstore.  """  texts = [doc.page_content for doc in documents]  metadatas = [doc.metadata for doc in documents]  if ids is None:  ids = [doc.id if doc.id else str(uuid.uuid4()) for doc in documents]  return cls.from_texts(  texts=texts,  embedding=embedding,  metadatas=metadatas,  ids=ids,  collection_name=collection_name,  persist_directory=persist_directory,  host=host,  port=port,  ssl=ssl,  headers=headers,  chroma_cloud_api_key=chroma_cloud_api_key,  tenant=tenant,  database=database,  client_settings=client_settings,  client=client,  collection_metadata=collection_metadata,  collection_configuration=collection_configuration,  **kwargs,  ) 

delete ¶

delete(ids: list[str] | None = None, **kwargs: Any) -> None

Delete by vector IDs.

PARAMETER	DESCRIPTION
`ids` ¶	List of ids to delete. TYPE: `list[str] \| None` DEFAULT: `None`
`kwargs` ¶	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

Source code in .venv/lib/python3.13/site-packages/langchain_chroma/vectorstores.py

def delete(self, ids: list[str] | None = None, **kwargs: Any) -> None:  """Delete by vector IDs.   Args:  ids: List of ids to delete.  kwargs: Additional keyword arguments.  """  self._collection.delete(ids=ids, **kwargs) 

langchain-chroma¶

Classes¶

Chroma ¶

collection_name ¶

embedding_function ¶

persist_directory ¶

host ¶

port ¶

ssl ¶

headers ¶

chroma_cloud_api_key ¶

tenant ¶

database ¶

client_settings ¶

collection_metadata ¶

collection_configuration ¶

client ¶

relevance_score_fn ¶

create_collection_if_not_exists ¶

Attributes¶

embeddings property ¶

Functions¶

aget_by_ids async ¶

ids ¶

adelete async ¶

ids ¶

**kwargs ¶

aadd_texts async ¶

texts ¶

metadatas ¶

ids ¶

**kwargs ¶

add_documents ¶

documents ¶

**kwargs ¶

aadd_documents async ¶

documents ¶

**kwargs ¶

search ¶

query ¶

search_type ¶

**kwargs ¶

asearch async ¶

query ¶

search_type ¶

**kwargs ¶

asimilarity_search_with_score async ¶

*args ¶

**kwargs ¶

similarity_search_with_relevance_scores ¶

query ¶

k ¶

**kwargs ¶

asimilarity_search_with_relevance_scores async ¶

query ¶

k ¶

**kwargs ¶

asimilarity_search async ¶

query ¶

k ¶

**kwargs ¶

asimilarity_search_by_vector async ¶

embedding ¶

k ¶

**kwargs ¶

amax_marginal_relevance_search async ¶

query ¶

k ¶

fetch_k ¶

lambda_mult ¶

**kwargs ¶

amax_marginal_relevance_search_by_vector async ¶

embedding ¶

k ¶

fetch_k ¶

lambda_mult ¶

**kwargs ¶

afrom_documents async classmethod ¶

documents ¶

embedding ¶

`langchain-chroma`¶

`collection_name` ¶

`embedding_function` ¶

`persist_directory` ¶

`host` ¶

`port` ¶

`ssl` ¶

`headers` ¶

`chroma_cloud_api_key` ¶

`tenant` ¶

`database` ¶

`client_settings` ¶

`collection_metadata` ¶

`collection_configuration` ¶

`client` ¶

`relevance_score_fn` ¶

`create_collection_if_not_exists` ¶

embeddings `property` ¶

aget_by_ids `async` ¶

`ids` ¶

adelete `async` ¶

`ids` ¶

`**kwargs` ¶

aadd_texts `async` ¶

`texts` ¶

`metadatas` ¶

`ids` ¶

`**kwargs` ¶

`documents` ¶

`**kwargs` ¶

aadd_documents `async` ¶

`documents` ¶

`**kwargs` ¶

`query` ¶

`search_type` ¶

`**kwargs` ¶

asearch `async` ¶

`query` ¶

`search_type` ¶

`**kwargs` ¶

asimilarity_search_with_score `async` ¶

`*args` ¶

`**kwargs` ¶

`query` ¶

`k` ¶

`**kwargs` ¶

asimilarity_search_with_relevance_scores `async` ¶

`query` ¶

`k` ¶

`**kwargs` ¶

asimilarity_search `async` ¶

`query` ¶

`k` ¶

`**kwargs` ¶

asimilarity_search_by_vector `async` ¶

`embedding` ¶

`k` ¶

`**kwargs` ¶

amax_marginal_relevance_search `async` ¶

`query` ¶

`k` ¶

`fetch_k` ¶

`lambda_mult` ¶

`**kwargs` ¶

amax_marginal_relevance_search_by_vector `async` ¶

`embedding` ¶

`k` ¶

`fetch_k` ¶

`lambda_mult` ¶

`**kwargs` ¶

afrom_documents `async` `classmethod` ¶

`documents` ¶

`embedding` ¶

`**kwargs` ¶

afrom_texts `async` `classmethod` ¶

`texts` ¶

`embedding` ¶

`metadatas` ¶

`ids` ¶

`**kwargs` ¶