Recently, I have been exploring Retrieval-Augmented Generation (RAG) and building a chatbot for fun. The reason became I interested in RAG because i use AI every day, especially Generative AI, such as large language models (LLM). LLM so helpful and often provide impressive responses.
However, LLM-generated response have a challenge. These response need to include with factual information. When you use an LLM to generate a response, it relies only on the data on which it was trained.
For example, when you ask, "What food should I eat when I do X?", the result will likely be a impressive, grammatically correct and logical response, but because it isn't based on factual or contextual data, it maybe inaccurate.
In contrast, you can use a data or knowledge with some relevant, factual context and information to generate a contextualized, relevant, and accurate response.
The data or knowledge can be any document of relevant data to provide relevant information. For example, you could use data from a daily food you eat to help answer the prompt "What food should i eat when do X?" so that the response includes relevant response.
To ensure that an LLM provides accurate and domain-specific responses, you can use Retrieval-Augmented Generation (RAG).
RAG is a process of optimizing the output of a large language model using retrieving information technique that is relevant to the user's prompt.
RAG have three steps:
- Retrieve data based on the user prompt.
- Augment the prompt with knowledge data.
- Use a language model to generate a response.
By retrieving context from specified data, you ensure that the LLM uses relevant information when responding, rather than relying solely on its training data
A standard RAG app usually has two main parts:
- Indexing: a process that takes data from a source, organizes it, and stores it in an index.
- Retrieval and generation: the RAG workflow itself, where a user’s query is received, the system looks up the most relevant information from the index, and then sends that context to the model to generate a response.
In my explorations, I will build a simple chatbot that can generate answers based on a provided document using the following tools:
- Langchain
- Bedrock Embedding
- Amazon titan embedding model
- Amazon nova for chat model
- Chroma as Vector DB
So, we will start by indexing. Let's go!
Setup
First, we will setup tools that will we use. Install this packages:
pip install -qU "langchain[aws]" langchain-aws langchain-chroma PyMuPDF langchain-text-splitters langchain-community langgraph
Indexing
After we setup the project, next we will indexing. Indexing data have this part:
- Load: Start by loading data.
- Split: Break big documents into smaller pieces.
- Store: Save those in a storage system that supports indexing and searching. This is often done with a Vector Store combined with an embeddings model.
Write following code.
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0") vector_store = Chroma( collection_name=COLLECTION, embedding_function=embeddings, persist_directory=DB_DIR, ) splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100) def index_documents(): print("📂 Loading and indexing documents...") loader = PyMuPDFLoader(PDF_PATH, mode="page") docs = loader.load() chunks = splitter.split_documents(docs) vector_store.add_documents(chunks) vector_store.persist() print(f"✅ Indexed {len(chunks)} chunks into {DB_DIR}")
First, we set up embeddings using Amazon’s Titan model. Embeddings as a way to turn text into numbers so the computer can understand and compare meaning.
Then, we create a vector database with Chroma that can store and search those embeddings. We also set up a text splitter so large documents can be cut into smaller, manageable chunks of around 1,000 characters each, with a bit of overlap to keep context intact.
The function index_documents()
, creates a loader that reads a PDF file (located at PDF_PATH). The mode="page" means it treats each page of the PDF as a separate unit.
Then, split the documents into smaller part (chunking) and saves those chunks into the vector database.
At the end, you’ll have your document neatly indexed, so later when a user asks a question, the system can quickly find the right pieces of text to feed into AI model.
Retrieval and Generation
Now, it’s time to build the core application. The goal is to make a simple app that takes a user’s question, finds the most relevant documents, sends both the question and those documents to a model, and then gives back an answer.
For generation, we will use the chat model with Amazon Nova. Write following code.
llm = init_chat_model("us.amazon.nova-lite-v1:0", model_provider="bedrock_converse")
Then, write RAG pipeline. We will write state for application. Write this code.
from langchain_core.documents import Document from typing_extensions import List, TypedDict class State(TypedDict): question: str context: List[Document] answer: str
We will keep track of the question, the context (docs we retrieve), and the answer.
Next, write this function.
def retrieve(state: State): retrieved_docs = vector_store.similarity_search(state["question"], k=4) return {"context": retrieved_docs}
When a user asks a question, this part looks inside the vector database and pulls out the top 4 most relevant documents. Those docs will become the context for the model to use.
After that, write system prompt and create generate function.
SYSTEM_PROMPT = "Your system prompt" def generate(state: State): if not state["context"]: return {"answer": "Sorry, i can't answers."} docs_content = "\n\n".join(doc.page_content for doc in state["context"]) user_payload = f"""Context: {docs_content} Question: {state['question']} Answer:""" messages = [ SystemMessage(content=SYSTEM_PROMPT), HumanMessage(content=user_payload) ] response = llm.invoke(messages) return {"answer": response.content}
Lastly, we compile our application into a single graph object. In this case, we are just connecting the retrieval and generation steps into a single sequence.
graph_builder = StateGraph(State) graph_builder.add_sequence([retrieve, generate]) graph_builder.add_edge(START, "retrieve") graph = graph_builder.compile()
This is the complete code.
import argparse import os import textwrap import langchain from typing_extensions import List, TypedDict from langchain_aws import BedrockEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_core.documents import Document from langchain_community.document_loaders import PyMuPDFLoader from langchain_chroma import Chroma from langchain.chat_models import init_chat_model from langchain.schema import SystemMessage, HumanMessage from langgraph.graph import START, StateGraph # =================================================================== # Global Config # =================================================================== langchain.verbose = False langchain.debug = False langchain.llm_cache = None DB_DIR = "./chroma_langchain_db" COLLECTION = "example_collection" PDF_PATH = "docs/ilovepdf_merged.pdf" embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0") vector_store = Chroma( collection_name=COLLECTION, embedding_function=embeddings, persist_directory=DB_DIR, ) llm = init_chat_model("us.amazon.nova-lite-v1:0", model_provider="bedrock_converse") splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100) # =================================================================== # Indexing Function # =================================================================== def index_documents(): print("📂 Loading and indexing documents...") loader = PyMuPDFLoader(PDF_PATH, mode="page") docs = loader.load() chunks = splitter.split_documents(docs) vector_store.add_documents(chunks) vector_store.persist() print(f"✅ Indexed {len(chunks)} chunks into {DB_DIR}") # =================================================================== # RAG Pipeline # =================================================================== class State(TypedDict): question: str context: List[Document] answer: str def retrieve(state: State): retrieved_docs = vector_store.similarity_search(state["question"], k=4) return {"context": retrieved_docs} SYSTEM_PROMPT = "Your system prompts". def generate(state: State): if not state["context"]: return {"answer": "Sorry, i can't answers."} docs_content = "\n\n".join(doc.page_content for doc in state["context"]) user_payload = f"""Context: {docs_content} Question: {state['question']} Answer:""" messages = [ SystemMessage(content=SYSTEM_PROMPT), HumanMessage(content=user_payload) ] response = llm.invoke(messages) return {"answer": response.content} graph_builder = StateGraph(State) graph_builder.add_sequence([retrieve, generate]) graph_builder.add_edge(START, "retrieve") graph = graph_builder.compile() def query_pipeline(question: str): response = graph.invoke({"question": question}) print(f"Question: {question}\nAnswer: {response['answer']}") # =================================================================== # CLI Entrypoint # =================================================================== if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--index", action="store_true", help="Index documents into vector DB") parser.add_argument("--query", type=str, help="Run a query against the DB") args = parser.parse_args() if args.index: index_documents() elif args.query: query_pipeline(args.query) else: parser.print_help()
You can test give a prompt and ask a questions that relate with information on the documents.
If you encounter error, make sure you have a access to the Amazon Bedrock and the Amazon Titan and Nova models.
Thanks for reading.
Want to connect? Arsy Opraza, Curriculum Developer at Dicoding. https://www.linkedin.com/in/arasopraza/ https://github.com/arasopraza https://twitter.com/arsyopraza
Top comments (0)