Get Started¶
This page demonstrates how to combine Graph Traversal and Vector Search using langchain-graph-retriever
with langchain
.
Pre-requisites¶
We assume you already have a working langchain
installation, including an LLM and embedding model as well as a supported vector store.
In that case, you only need to install langchain-graph-retriever
:
Preparing Data¶
Loading data is exactly the same as for whichever vector store you use. The main thing to consider is what structured information you wish to include in the metadata to support traversal.
For this guide, I have a JSON file with information about animals. Several example entries are shown below. The actual file has one entry per line, making it easy to load into Document
s.
{ "id": "alpaca", "text": "alpacas are domesticated mammals valued for their soft wool and friendly demeanor.", "metadata": { "type": "mammal", "number_of_legs": 4, "keywords": ["wool", "domesticated", "friendly"], "origin": "south america" } } { "id": "caribou", "text": "caribou, also known as reindeer, are migratory mammals found in arctic regions.", "metadata": { "type": "mammal", "number_of_legs": 4, "keywords": ["migratory", "arctic", "herbivore", "tundra"], "diet": "herbivorous" } } { "id": "cassowary", "text": "cassowaries are flightless birds known for their colorful necks and powerful legs.", "metadata": { "type": "bird", "number_of_legs": 2, "keywords": ["flightless", "colorful", "powerful"], "habitat": "rainforest" } }
from graph_rag_example_helpers.datasets.animals import fetch_documents animals = fetch_documents()
Populating the Vector Store¶
The following shows how to populate a variety of vector stores with the animal data.
from langchain_community.vectorstores.cassandra import Cassandra from langchain_openai import OpenAIEmbeddings from langchain_graph_retriever.transformers import ShreddingTransformer shredder = ShreddingTransformer() # (1)! vector_store = Cassandra.from_documents( documents=list(shredder.transform_documents(animals)), embedding=OpenAIEmbeddings(), table_name="animals", )
- Since Cassandra doesn't index items in lists for querying, it is necessary to shred metadata containing list to be queried. By default, the
ShreddingTransformer
shreds all keys. It may be configured to only shred those metadata keys used as edge targets.
from langchain_community.vectorstores import OpenSearchVectorSearch from langchain_openai import OpenAIEmbeddings vector_store = OpenSearchVectorSearch.from_documents( opensearch_url=OPEN_SEARCH_URL, index_name="animals", embedding=OpenAIEmbeddings(), engine="faiss", documents=animals, bulk_size=500, # (1)! )
- There is currently a bug in the OpenSearchVectorStore implementation that requires this extra parameter.
from langchain_chroma.vectorstores import Chroma from langchain_openai import OpenAIEmbeddings from langchain_graph_retriever.transformers import ShreddingTransformer shredder = ShreddingTransformer() # (1)! vector_store = Chroma.from_documents( documents=list(shredder.transform_documents(animals)), embedding=OpenAIEmbeddings(), collection_name="animals", )
- Since Chroma doesn't index items in lists for querying, it is necessary to shred metadata containing list to be queried. By default, the
ShreddingTransformer
shreds all keys. It may be configured to only shred those metadata keys used as edge targets.
Simple Traversal¶
For our first retrieval and graph traversal, we're going to start with a single animal best matching the query, and then traverse to other animals with the same habitat
and/or origin
.
from graph_retriever.strategies import Eager from langchain_graph_retriever import GraphRetriever from langchain_graph_retriever.adapters.cassandra import CassandraAdapter simple = GraphRetriever( store = CassandraAdapter(vector_store, shredder, {"keywords"}),, edges = [("habitat", "habitat"), ("origin", "origin"), ("keywords", "keywords")], strategy = Eager(k=10, start_k=1, max_depth=2), )
from graph_retriever.strategies import Eager from langchain_graph_retriever import GraphRetriever from langchain_graph_retriever.adapters.chroma import ChromaAdapter simple = GraphRetriever( store = ChromaAdapter(vector_store, shredder, {"keywords"}), edges = [("habitat", "habitat"), ("origin", "origin"), ("keywords", "keywords")], strategy = Eager(k=10, start_k=1, max_depth=2), )
Shredding
The above code is exactly the same for all stores, however adapters for shredded stores (Chroma and Apache Cassandra) require configuration to specify which metadata fields need to be rewritten when issuing queries.
The above creates a graph traversing retriever that starts with the nearest animal (start_k=1
), retrieves 10 documents (k=10
) and limits the search to documents that are at most 2 steps away from the first animal (max_depth=2
).
The edges define how metadata values can be used for traversal. In this case, every animal is connected to other animals with the same habitat and/or same origin.
simple_results = simple.invoke("what mammals could be found near a capybara") for doc in simple_results: print(f"{doc.id}: {doc.page_content}")
Visualizing¶
langchain-graph-retrievers
includes code for converting the document graph into a networkx
graph, for rendering and other analysis. See @fig-document-graph