Azure Cosmos DB Mongo vCore

This notebook shows you how to leverage this integrated vector database to store documents in collections, create indicies and perform vector search queries using approximate nearest neighbor algorithms such as COS (cosine distance), L2 (Euclidean distance), and IP (inner product) to locate documents close to the query vectors. Azure Cosmos DB is the database that powers OpenAI’s ChatGPT service. It offers single-digit millisecond response times, automatic and instant scalability, along with guaranteed speed at any scale. Azure Cosmos DB for MongoDB vCore(learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/) provides developers with a fully managed MongoDB-compatible database service for building modern applications with a familiar architecture. You can apply your MongoDB experience and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the API for MongoDB vCore account’s connection string. Sign Up for lifetime free access to get started today.

%pip install -qU pymongo langchain-openai langchain-community

Note: you may need to restart the kernel to use updated packages.

import os  CONNECTION_STRING = "YOUR_CONNECTION_STRING" INDEX_NAME = "izzy-test-index" NAMESPACE = "izzy_test_db.izzy_test_collection" DB_NAME, COLLECTION_NAME = NAMESPACE.split(".") 

We want to use AzureOpenAIEmbeddings so we need to set up our Azure OpenAI API Key alongside other environment variables.

# Set up the OpenAI Environment Variables  os.environ["AZURE_OPENAI_API_KEY"] = "YOUR_AZURE_OPENAI_API_KEY" os.environ["AZURE_OPENAI_ENDPOINT"] = "YOUR_AZURE_OPENAI_ENDPOINT" os.environ["AZURE_OPENAI_API_VERSION"] = "2023-05-15" os.environ["OPENAI_EMBEDDINGS_MODEL_NAME"] = "text-embedding-ada-002" # the model name 

Now, we need to load the documents into the collection, create the index and then run our queries against the index to retrieve matches. Please refer to the documentation if you have questions about certain parameters

from langchain_community.document_loaders import TextLoader from langchain_community.vectorstores.azure_cosmos_db import (  AzureCosmosDBVectorSearch,  CosmosDBSimilarityType,  CosmosDBVectorSearchType, ) from langchain_openai import AzureOpenAIEmbeddings from langchain_text_splitters import CharacterTextSplitter  SOURCE_FILE_NAME = "../../how_to/state_of_the_union.txt"  loader = TextLoader(SOURCE_FILE_NAME) documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents)  # OpenAI Settings model_deployment = os.getenv(  "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada" ) model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")   openai_embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(  model=model_name, chunk_size=1 ) 

docs[0]

Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.') 

from pymongo import MongoClient  # INDEX_NAME = "izzy-test-index-2" # NAMESPACE = "izzy_test_db.izzy_test_collection" # DB_NAME, COLLECTION_NAME = NAMESPACE.split(".")  client: MongoClient = MongoClient(CONNECTION_STRING) collection = client[DB_NAME][COLLECTION_NAME]  model_deployment = os.getenv(  "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada" ) model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")  vectorstore = AzureCosmosDBVectorSearch.from_documents(  docs,  openai_embeddings,  collection=collection,  index_name=INDEX_NAME, )  # Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search num_lists = 100 dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_IVF m = 16 ef_construction = 64 ef_search = 40 score_threshold = 0.1  vectorstore.create_index(  num_lists, dimensions, similarity_algorithm, kind, m, ef_construction )  """ # DiskANN vectorstore maxDegree = 40 dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_DISKANN lBuild = 20  vectorstore.create_index(  dimensions=dimensions,  similarity=similarity_algorithm,  kind=kind ,  max_degree=maxDegree,  l_build=lBuild,  )  # -----------------------------------------------------------  # HNSW vectorstore dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_HNSW m = 16 ef_construction = 64  vectorstore.create_index(  dimensions=dimensions,  similarity=similarity_algorithm,  kind=kind ,  m=m,  ef_construction=ef_construction,  ) """ 

'\n# DiskANN vectorstore\nmaxDegree = 40\ndimensions = 1536\nsimilarity_algorithm = CosmosDBSimilarityType.COS\nkind = CosmosDBVectorSearchType.VECTOR_DISKANN\nlBuild = 20\n\nvectorstore.create_index(\n dimensions=dimensions,\n similarity=similarity_algorithm,\n kind=kind ,\n max_degree=maxDegree,\n l_build=lBuild,\n )\n\n# -----------------------------------------------------------\n\n# HNSW vectorstore\ndimensions = 1536\nsimilarity_algorithm = CosmosDBSimilarityType.COS\nkind = CosmosDBVectorSearchType.VECTOR_HNSW\nm = 16\nef_construction = 64\n\nvectorstore.create_index(\n dimensions=dimensions,\n similarity=similarity_algorithm,\n kind=kind ,\n m=m,\n ef_construction=ef_construction,\n )\n' 

# perform a similarity search between the embedding of the query and the embeddings of the documents query = "What did the president say about Ketanji Brown Jackson" docs = vectorstore.similarity_search(query) 

print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.  Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.  One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.  And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. 

Once the documents have been loaded and the index has been created, you can now instantiate the vector store directly and run queries against the index

vectorstore = AzureCosmosDBVectorSearch.from_connection_string(  CONNECTION_STRING, NAMESPACE, openai_embeddings, index_name=INDEX_NAME )  # perform a similarity search between a query and the ingested documents query = "What did the president say about Ketanji Brown Jackson" docs = vectorstore.similarity_search(query)  print(docs[0].page_content) 

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.  Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.  One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.  And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. 

vectorstore = AzureCosmosDBVectorSearch(  collection, openai_embeddings, index_name=INDEX_NAME )  # perform a similarity search between a query and the ingested documents query = "What did the president say about Ketanji Brown Jackson" docs = vectorstore.similarity_search(query)  print(docs[0].page_content) 

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.  Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.  One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.  And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. 

Filtered vector search (Preview)

Azure Cosmos DB for MongoDB supports pre-filtering with

lt,

lte,

eq,

neq,

gte,

gt,

in,

nin, and $regex. To use this feature, enable “filtering vector search” in the “Preview Features” tab of your Azure Subscription. Learn more about preview features here.

# create a filter index vectorstore.create_filter_index(  property_to_filter="metadata.source", index_name="filter_index" ) 

{'raw': {'defaultShard': {'numIndexesBefore': 3,  'numIndexesAfter': 4,  'createdCollectionAutomatically': False,  'ok': 1}},  'ok': 1} 

query = "What did the president say about Ketanji Brown Jackson" docs = vectorstore.similarity_search(  query, pre_filter={"metadata.source": {"$ne": "filter content"}} ) 

len(docs)

docs = vectorstore.similarity_search(  query,  pre_filter={"metadata.source": {"$ne": "../../how_to/state_of_the_union.txt"}}, ) 

len(docs)

Edit the source of this page on GitHub

Popular Providers

Integrations by component

Filtered vector search (Preview)

Popular Providers

Integrations by component

​Filtered vector search (Preview)

Filtered vector search (Preview)