Get Started¶

This page demonstrates how to combine Graph Traversal and Vector Search using langchain-graph-retriever with langchain.

Pre-requisites¶

We assume you already have a working langchain installation, including an LLM and embedding model as well as a supported vector store.

In that case, you only need to install langchain-graph-retriever:

pip install langchain langchain-graph-retriever

Preparing Data¶

Loading data is exactly the same as for whichever vector store you use. The main thing to consider is what structured information you wish to include in the metadata to support traversal.

For this guide, I have a JSON file with information about animals. Several example entries are shown below. The actual file has one entry per line, making it easy to load into Documents.

{  "id": "alpaca",  "text": "alpacas are domesticated mammals valued for their soft wool and friendly demeanor.",  "metadata": {  "type": "mammal",  "number_of_legs": 4,  "keywords": ["wool", "domesticated", "friendly"],  "origin": "south america"  } } {  "id": "caribou",  "text": "caribou, also known as reindeer, are migratory mammals found in arctic regions.",  "metadata": {  "type": "mammal",  "number_of_legs": 4,  "keywords": ["migratory", "arctic", "herbivore", "tundra"],  "diet": "herbivorous"  } } {  "id": "cassowary",  "text": "cassowaries are flightless birds known for their colorful necks and powerful legs.",  "metadata": {  "type": "bird",  "number_of_legs": 2,  "keywords": ["flightless", "colorful", "powerful"],  "habitat": "rainforest"  } }

Fetching Animal Data

from graph_rag_example_helpers.datasets.animals import fetch_documents animals = fetch_documents()

Populating the Vector Store¶

The following shows how to populate a variety of vector stores with the animal data.

AstraApache CassandraOpenSearchChroma

from dotenv import load_dotenv from langchain_astradb import AstraDBVectorStore from langchain_openai import OpenAIEmbeddings  load_dotenv() vector_store = AstraDBVectorStore.from_documents(  collection_name="animals",  documents=animals,  embedding=OpenAIEmbeddings(), )

from langchain_community.vectorstores.cassandra import Cassandra from langchain_openai import OpenAIEmbeddings from langchain_graph_retriever.transformers import ShreddingTransformer  shredder = ShreddingTransformer() # (1)! vector_store = Cassandra.from_documents(  documents=list(shredder.transform_documents(animals)),  embedding=OpenAIEmbeddings(),  table_name="animals", )

Since Cassandra doesn't index items in lists for querying, it is necessary to shred metadata containing list to be queried. By default, the ShreddingTransformer shreds all keys. It may be configured to only shred those metadata keys used as edge targets.

from langchain_community.vectorstores import OpenSearchVectorSearch from langchain_openai import OpenAIEmbeddings  vector_store = OpenSearchVectorSearch.from_documents(  opensearch_url=OPEN_SEARCH_URL,  index_name="animals",  embedding=OpenAIEmbeddings(),  engine="faiss",  documents=animals,  bulk_size=500, # (1)! )

There is currently a bug in the OpenSearchVectorStore implementation that requires this extra parameter.

from langchain_chroma.vectorstores import Chroma from langchain_openai import OpenAIEmbeddings from langchain_graph_retriever.transformers import ShreddingTransformer  shredder = ShreddingTransformer() # (1)! vector_store = Chroma.from_documents(  documents=list(shredder.transform_documents(animals)),  embedding=OpenAIEmbeddings(),  collection_name="animals", )

Since Chroma doesn't index items in lists for querying, it is necessary to shred metadata containing list to be queried. By default, the ShreddingTransformer shreds all keys. It may be configured to only shred those metadata keys used as edge targets.

Simple Traversal¶

For our first retrieval and graph traversal, we're going to start with a single animal best matching the query, and then traverse to other animals with the same habitat and/or origin.

AstraApache CassandraOpenSearchChroma

from graph_retriever.strategies import Eager from langchain_graph_retriever import GraphRetriever  simple = GraphRetriever(  store = vector_store,  edges = [("habitat", "habitat"), ("origin", "origin"), ("keywords", "keywords")],  strategy = Eager(k=10, start_k=1, max_depth=2), )

from graph_retriever.strategies import Eager from langchain_graph_retriever import GraphRetriever from langchain_graph_retriever.adapters.cassandra import CassandraAdapter  simple = GraphRetriever(  store = CassandraAdapter(vector_store, shredder, {"keywords"}),,  edges = [("habitat", "habitat"), ("origin", "origin"), ("keywords", "keywords")],  strategy = Eager(k=10, start_k=1, max_depth=2), )

from graph_retriever.strategies import Eager from langchain_graph_retriever import GraphRetriever  simple = GraphRetriever(  store = vector_store,  edges = [("habitat", "habitat"), ("origin", "origin"), ("keywords", "keywords")],  strategy = Eager(k=10, start_k=1, max_depth=2), )

from graph_retriever.strategies import Eager from langchain_graph_retriever import GraphRetriever from langchain_graph_retriever.adapters.chroma import ChromaAdapter  simple = GraphRetriever(  store = ChromaAdapter(vector_store, shredder, {"keywords"}),  edges = [("habitat", "habitat"), ("origin", "origin"), ("keywords", "keywords")],  strategy = Eager(k=10, start_k=1, max_depth=2), )

Shredding

The above code is exactly the same for all stores, however adapters for shredded stores (Chroma and Apache Cassandra) require configuration to specify which metadata fields need to be rewritten when issuing queries.

The above creates a graph traversing retriever that starts with the nearest animal (start_k=1), retrieves 10 documents (k=10) and limits the search to documents that are at most 2 steps away from the first animal (max_depth=2).

The edges define how metadata values can be used for traversal. In this case, every animal is connected to other animals with the same habitat and/or same origin.

simple_results = simple.invoke("what mammals could be found near a capybara")  for doc in simple_results:  print(f"{doc.id}: {doc.page_content}")

Visualizing¶

langchain-graph-retrievers includes code for converting the document graph into a networkx graph, for rendering and other analysis. See @fig-document-graph

Graph retrieved documents

import networkx as nx import matplotlib.pyplot as plt from langchain_graph_retriever.document_graph import create_graph  document_graph = create_graph(  documents=simple_results,  edges = simple.edges, )  nx.draw(document_graph, with_labels=True) plt.show()