DEV Community

Alain Airom
Alain Airom

Posted on

🍥 Hands-on Experience with LightRAG

I tested LightRAG with my local Ollama

Image from LightRAG Repository
Image from LightRAG Repository

Introduction

What is LightRAG

LightRAG is an open-source framework designed to build sophisticated and modular Retrieval-Augmented Generation (RAG) pipelines quickly and efficiently.

Image from LightRAG Repository
Image from LightRAG Repository

It has been a while that I was sparing some time in order to try this solution. Why? Because sometimes when I find some repositories on the net, I get curious to figure out if I can make it work with the tests which I set-up from scratch.

Test Phase

Running LightRAG is quite straightfowrd, just do the following;

pip install "lightrag-hku[api]" cp env.example .env # ---> I'll come back to this later! very rich, lots of parametes lightrag-server 
Enter fullscreen mode Exit fullscreen mode
LightRAG log file: /Users/alainairom/Devs/lightrag-ollama/lightrag.log WARNING:root:>> Forcing workers=1 in uvicorn mode(Ignoring workers=2) ╔══════════════════════════════════════════════════════════════╗ ║ LightRAG Server vv1.4.9.4/0244 ║ ║ Fast, Lightweight RAG Server Implementation ║ ╚══════════════════════════════════════════════════════════════╝ 📡 Server Configuration: ├─ Host: 0.0.0.0 ├─ Port: 9621 ├─ Workers: 1 ├─ Timeout: 300 ├─ CORS Origins: * ├─ SSL Enabled: False ├─ Ollama Emulating Model: lightrag:latest ├─ Log Level: INFO ├─ Verbose Debug: False ├─ API Key: Not Set └─ JWT Auth: Disabled 📂 Directory Configuration: ├─ Working Directory: /Users/alainairom/Devs/lightrag-ollama/rag_storage └─ Input Directory: /Users/alainairom/Devs/lightrag-ollama/inputs 🤖 LLM Configuration: ├─ Binding: ollama ├─ Host: http://localhost:11434 ├─ Model: granite4:latest ├─ Max Async for LLM: 4 ├─ Summary Context Size: 12000 ├─ LLM Cache Enabled: True └─ LLM Cache for Extraction Enabled: True 📊 Embedding Configuration: ├─ Binding: ollama ├─ Host: http://localhost:11434 ├─ Model: granite-embedding:latest └─ Dimensions: 1024 ⚙️ RAG Configuration: ├─ Summary Language: English ├─ Entity Types: ['Person', 'Creature', 'Organization', 'Location', 'Event', 'Concept', 'Method', 'Content', 'Data', 'Artifact', 'NaturalObject'] ├─ Max Parallel Insert: 2 ├─ Chunk Size: 1200 ├─ Chunk Overlap Size: 100 ├─ Cosine Threshold: 0.2 ├─ Top-K: 40 └─ Force LLM Summary on Merge: 8 💾 Storage Configuration: ├─ KV Storage: JsonKVStorage ├─ Vector Storage: NanoVectorDBStorage ├─ Graph Storage: NetworkXStorage ├─ Document Status Storage: JsonDocStatusStorage └─ Workspace: - ✨ Server starting up... 🌐 Server Access Information: ├─ WebUI (local): http://localhost:9621 ├─ Remote Access: http://<your-ip-address>:9621 ├─ API Documentation (local): http://localhost:9621/docs └─ Alternative Documentation (local): http://localhost:9621/redoc 📝 Note: Since the server is running on 0.0.0.0: - Use 'localhost' or '127.0.0.1' for local access - Use your machine's IP address for remote access - To find your IP address: • Windows: Run 'ipconfig' in terminal • Linux/Mac: Run 'ifconfig' or 'ip addr' in terminal INFO: Ollama LLM Options: {'num_ctx': 32768} INFO: Ollama Embedding Options: {'num_ctx': 8192} INFO: Reranking is disabled INFO: [_] Created new empty graph file: /Users/alainairom/Devs/lightrag-ollama/rag_storage/graph_chunk_entity_relation.graphml Starting Uvicorn server in single-process mode on 0.0.0.0:9621 INFO: Started server process [86847] INFO: Waiting for application startup. INFO: [_] Process 86847 KV load full_docs with 2 records INFO: [_] Process 86847 KV load text_chunks with 0 records INFO: [_] Process 86847 KV load full_entities with 0 records INFO: [_] Process 86847 KV load full_relations with 0 records INFO: [_] Process 86847 KV load entity_chunks with 0 records INFO: [_] Process 86847 KV load relation_chunks with 0 records INFO: [_] Process 86847 KV load llm_response_cache with 0 records INFO: [_] Process 86847 doc status load doc_status with 2 records Server is ready to accept connections! 🚀 
Enter fullscreen mode Exit fullscreen mode

Running the server command gives you access to a nice React-Based UI at http://0.0.0.0:9621/webui/.

The UI

The UI is quite nice, however I did’t really use it, because I was really focused on coding!

Once the environment is ready, you can try and run the provided sample (excerpt from readme);

import os import asyncio from lightrag import LightRAG, QueryParam from lightrag.llm.openai import gpt_4o_mini_complete, gpt_4o_complete, openai_embed from lightrag.kg.shared_storage import initialize_pipeline_status from lightrag.utils import setup_logger setup_logger("lightrag", level="INFO") WORKING_DIR = "./rag_storage" if not os.path.exists(WORKING_DIR): os.mkdir(WORKING_DIR) async def initialize_rag(): rag = LightRAG( working_dir=WORKING_DIR, embedding_func=openai_embed, llm_model_func=gpt_4o_mini_complete, ) # IMPORTANT: Both initialization calls are required!  await rag.initialize_storages() # Initialize storage backends  await initialize_pipeline_status() # Initialize processing pipeline  return rag async def main(): try: # Initialize RAG instance  rag = await initialize_rag() await rag.ainsert("Your text") # Perform hybrid search  mode = "hybrid" print( await rag.aquery( "What are the top themes in this story?", param=QueryParam(mode=mode) ) ) except Exception as e: print(f"An error occurred: {e}") finally: if rag: await rag.finalize_storages() if __name__ == "__main__": asyncio.run(main()) 
Enter fullscreen mode Exit fullscreen mode

Among examples provided there is also this one using Gemmini which I picked, runnable easliy ☺️;

# pip install -q -U google-genai to use gemini as a client  import os import numpy as np from google import genai from google.genai import types from dotenv import load_dotenv from lightrag.utils import EmbeddingFunc from lightrag import LightRAG, QueryParam from sentence_transformers import SentenceTransformer from lightrag.kg.shared_storage import initialize_pipeline_status import asyncio import nest_asyncio # Apply nest_asyncio to solve event loop issues nest_asyncio.apply() load_dotenv() gemini_api_key = os.getenv("GEMINI_API_KEY") WORKING_DIR = "./dickens" if os.path.exists(WORKING_DIR): import shutil shutil.rmtree(WORKING_DIR) os.mkdir(WORKING_DIR) async def llm_model_func( prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs ) -> str: # 1. Initialize the GenAI Client with your Gemini API Key  client = genai.Client(api_key=gemini_api_key) # 2. Combine prompts: system prompt, history, and user prompt  if history_messages is None: history_messages = [] combined_prompt = "" if system_prompt: combined_prompt += f"{system_prompt}\n" for msg in history_messages: # Each msg is expected to be a dict: {"role": "...", "content": "..."}  combined_prompt += f"{msg['role']}: {msg['content']}\n" # Finally, add the new user prompt  combined_prompt += f"user: {prompt}" # 3. Call the Gemini model  response = client.models.generate_content( model="gemini-1.5-flash", contents=[combined_prompt], config=types.GenerateContentConfig(max_output_tokens=500, temperature=0.1), ) # 4. Return the response text  return response.text async def embedding_func(texts: list[str]) -> np.ndarray: model = SentenceTransformer("all-MiniLM-L6-v2") embeddings = model.encode(texts, convert_to_numpy=True) return embeddings async def initialize_rag(): rag = LightRAG( working_dir=WORKING_DIR, llm_model_func=llm_model_func, embedding_func=EmbeddingFunc( embedding_dim=384, max_token_size=8192, func=embedding_func, ), ) await rag.initialize_storages() await initialize_pipeline_status() return rag def main(): # Initialize RAG instance  rag = asyncio.run(initialize_rag()) file_path = "story.txt" with open(file_path, "r") as file: text = file.read() rag.insert(text) response = rag.query( query="What is the main theme of the story?", param=QueryParam(mode="hybrid", top_k=5, response_type="single line"), ) print(response) if __name__ == "__main__": main() 
Enter fullscreen mode Exit fullscreen mode

My Tests

For my test as usual I utilize Ollama to gain better control over the pipeline and directly access the models available on my system.

ollama list NAME ID SIZE MODIFIED bge-m3:latest 790764642607 1.2 GB 12 hours ago granite3.3:latest fd429f23b909 4.9 GB 44 hours ago llama3.2-vision:latest 6f2f9757ae97 7.8 GB 2 days ago granite3.2-vision:latest 3be41a661804 2.4 GB 2 days ago mxbai-embed-large:latest 468836162de7 669 MB 3 days ago all-minilm:latest 1b226e2802db 45 MB 3 days ago embeddinggemma:latest 85462619ee72 621 MB 3 days ago granite-embedding:latest eb4c533ba6f7 62 MB 7 days ago qwen3-vl:235b-cloud 7fc468f95411 - 10 days ago granite4:micro-h ba791654cc27 1.9 GB 3 weeks ago granite4:latest 4235724a127c 2.1 GB 3 weeks ago granite-embedding:278m 1a37926bf842 562 MB 3 weeks ago nomic-embed-text:latest 0a109f422b47 274 MB 2 months ago tinyllama:latest 2644915ede35 637 MB 3 months ago 
Enter fullscreen mode Exit fullscreen mode

The “.env” file should be adapted to be use alongside with one’s set of toolings, so for Ollama here is what I set-up (really rich) ⬇️

### This is sample file of .env ########################### ### Server Configuration ########################### HOST=0.0.0.0 PORT=9621 WEBUI_TITLE='My Graph KB' WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System" # WORKERS=2 ### gunicorn worker timeout(as default LLM request timeout if LLM_TIMEOUT is not set) # TIMEOUT=150 # CORS_ORIGINS=http://localhost:3000,http://localhost:8080 ### Optional SSL Configuration # SSL=true # SSL_CERTFILE=/path/to/cert.pem # SSL_KEYFILE=/path/to/key.pem ### Directory Configuration (defaults to current working directory) ### Default value is ./inputs and ./rag_storage # INPUT_DIR=<absolute_path_for_doc_input_dir> # WORKING_DIR=<absolute_path_for_working_dir> ### Tiktoken cache directory (Store cached files in this folder for offline deployment) # TIKTOKEN_CACHE_DIR=/app/data/tiktoken ### Ollama Emulating Model and Tag # OLLAMA_EMULATING_MODEL_NAME=lightrag OLLAMA_EMULATING_MODEL_TAG=latest ### Max nodes return from graph retrieval in webui # MAX_GRAPH_NODES=1000 ### Logging level # LOG_LEVEL=INFO # VERBOSE=False # LOG_MAX_BYTES=10485760 # LOG_BACKUP_COUNT=5 ### Logfile location (defaults to current working directory) # LOG_DIR=/path/to/log/directory ##################################### ### Login and API-Key Configuration ##################################### # AUTH_ACCOUNTS='admin:admin123,user1:pass456' # TOKEN_SECRET=Your-Key-For-LightRAG-API-Server # TOKEN_EXPIRE_HOURS=48 # GUEST_TOKEN_EXPIRE_HOURS=24 # JWT_ALGORITHM=HS256 ### API-Key to access LightRAG Server API # LIGHTRAG_API_KEY=your-secure-api-key-here # WHITELIST_PATHS=/health,/api/* ###################################################################################### ### Query Configuration ### ### How to control the context length sent to LLM: ### MAX_ENTITY_TOKENS + MAX_RELATION_TOKENS < MAX_TOTAL_TOKENS ### Chunk_Tokens = MAX_TOTAL_TOKENS - Actual_Entity_Tokens - Actual_Relation_Tokens ###################################################################################### # LLM response cache for query (Not valid for streaming response) ENABLE_LLM_CACHE=true # COSINE_THRESHOLD=0.2 ### Number of entities or relations retrieved from KG # TOP_K=40 ### Maximum number or chunks for naive vector search # CHUNK_TOP_K=20 ### control the actual entities send to LLM # MAX_ENTITY_TOKENS=6000 ### control the actual relations send to LLM # MAX_RELATION_TOKENS=8000 ### control the maximum tokens send to LLM (include entities, relations and chunks) # MAX_TOTAL_TOKENS=30000 ### chunk selection strategies ### VECTOR: Pick KG chunks by vector similarity, delivered chunks to the LLM aligning more closely with naive retrieval ### WEIGHT: Pick KG chunks by entity and chunk weight, delivered more solely KG related chunks to the LLM ### If reranking is enabled, the impact of chunk selection strategies will be diminished. # KG_CHUNK_PICK_METHOD=VECTOR ######################################################### ### Reranking configuration ### RERANK_BINDING type: null, cohere, jina, aliyun ### For rerank model deployed by vLLM use cohere binding ######################################################### RERANK_BINDING=null ### Enable rerank by default in query params when RERANK_BINDING is not null # RERANK_BY_DEFAULT=True ### rerank score chunk filter(set to 0.0 to keep all chunks, 0.6 or above if LLM is not strong enough) # MIN_RERANK_SCORE=0.0 ### For local deployment with vLLM # RERANK_MODEL=BAAI/bge-reranker-v2-m3 # RERANK_BINDING_HOST=http://localhost:8000/v1/rerank # RERANK_BINDING_API_KEY=your_rerank_api_key_here ### Default value for Cohere AI # RERANK_MODEL=rerank-v3.5 # RERANK_BINDING_HOST=https://api.cohere.com/v2/rerank # RERANK_BINDING_API_KEY=your_rerank_api_key_here ### Default value for Jina AI # RERANK_MODEL=jina-reranker-v2-base-multilingual # RERANK_BINDING_HOST=https://api.jina.ai/v1/rerank # RERANK_BINDING_API_KEY=your_rerank_api_key_here ### Default value for Aliyun # RERANK_MODEL=gte-rerank-v2 # RERANK_BINDING_HOST=https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank # RERANK_BINDING_API_KEY=your_rerank_api_key_here ######################################## ### Document processing configuration ######################################## ENABLE_LLM_CACHE_FOR_EXTRACT=true ### Document processing output language: English, Chinese, French, German ... SUMMARY_LANGUAGE=English ### Entity types that the LLM will attempt to recognize # ENTITY_TYPES='["Person", "Creature", "Organization", "Location", "Event", "Concept", "Method", "Content", "Data", "Artifact", "NaturalObject"]' ### Chunk size for document splitting, 500~1500 is recommended # CHUNK_SIZE=1200 # CHUNK_OVERLAP_SIZE=100 ### Number of summary segments or tokens to trigger LLM summary on entity/relation merge (at least 3 is recommended) # FORCE_LLM_SUMMARY_ON_MERGE=8 ### Max description token size to trigger LLM summary # SUMMARY_MAX_TOKENS = 1200 ### Recommended LLM summary output length in tokens # SUMMARY_LENGTH_RECOMMENDED_=600 ### Maximum context size sent to LLM for description summary # SUMMARY_CONTEXT_SIZE=12000 ### control the maximum chunk_ids stored in vector and graph db # MAX_SOURCE_IDS_PER_ENTITY=300 # MAX_SOURCE_IDS_PER_RELATION=300 ### control chunk_ids limitation method: FIFO, KEEP ### FIFO: First in first out ### KEEP: Keep oldest (less merge action and faster) # SOURCE_IDS_LIMIT_METHOD=FIFO # Maximum number of file paths stored in entity/relation file_path field (For displayed only, does not affect query performance) # MAX_FILE_PATHS=100 ### maximum number of related chunks per source entity or relation ### The chunk picker uses this value to determine the total number of chunks selected from KG(knowledge graph) ### Higher values increase re-ranking time # RELATED_CHUNK_NUMBER=5 ############################### ### Concurrency Configuration ############################### ### Max concurrency requests of LLM (for both query and document processing) MAX_ASYNC=4 ### Number of parallel processing documents(between 2~10, MAX_ASYNC/3 is recommended) MAX_PARALLEL_INSERT=2 ### Max concurrency requests for Embedding # EMBEDDING_FUNC_MAX_ASYNC=8 ### Num of chunks send to Embedding in single request # EMBEDDING_BATCH_NUM=10 ########################################################### ### LLM Configuration ### LLM_BINDING type: openai, ollama, lollms, azure_openai, aws_bedrock ########################################################### ### LLM request timeout setting for all llm (0 means no timeout for Ollma) # LLM_TIMEOUT=180 LLM_BINDING=ollama LLM_MODEL=granite4:latest LLM_BINDING_HOST=http://localhost:11434 #LLM_BINDING_API_KEY=your_api_key ### Optional for Azure # AZURE_OPENAI_API_VERSION=2024-08-01-preview # AZURE_OPENAI_DEPLOYMENT=gpt-4o ### Openrouter example # LLM_MODEL=google/gemini-2.5-flash # LLM_BINDING_HOST=https://openrouter.ai/api/v1 # LLM_BINDING_API_KEY=your_api_key # LLM_BINDING=openai ### OpenAI Compatible API Specific Parameters ### Increased temperature values may mitigate infinite inference loops in certain LLM, such as Qwen3-30B. # OPENAI_LLM_TEMPERATURE=0.9 ### Set the max_tokens to mitigate endless output of some LLM (less than LLM_TIMEOUT * llm_output_tokens/second, i.e. 9000 = 180s * 50 tokens/s) ### Typically, max_tokens does not include prompt content, though some models, such as Gemini Models, are exceptions ### For vLLM/SGLang deployed models, or most of OpenAI compatible API provider # OPENAI_LLM_MAX_TOKENS=9000 ### For OpenAI o1-mini or newer modles #OPENAI_LLM_MAX_COMPLETION_TOKENS=9000 #### OpenAI's new API utilizes max_completion_tokens instead of max_tokens # OPENAI_LLM_MAX_COMPLETION_TOKENS=9000 ### use the following command to see all support options for OpenAI, azure_openai or OpenRouter ### lightrag-server --llm-binding openai --help ### OpenAI Specific Parameters # OPENAI_LLM_REASONING_EFFORT=minimal ### OpenRouter Specific Parameters # OPENAI_LLM_EXTRA_BODY='{"reasoning": {"enabled": false}}' ### Qwen3 Specific Parameters deploy by vLLM # OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}' ### use the following command to see all support options for Ollama LLM ### If LightRAG deployed in Docker uses host.docker.internal instead of localhost in LLM_BINDING_HOST ### lightrag-server --llm-binding ollama --help ### Ollama Server Specific Parameters ### OLLAMA_LLM_NUM_CTX must be provided, and should at least larger than MAX_TOTAL_TOKENS + 2000 OLLAMA_LLM_NUM_CTX=32768 ### Set the max_output_tokens to mitigate endless output of some LLM (less than LLM_TIMEOUT * llm_output_tokens/second, i.e. 9000 = 180s * 50 tokens/s) # OLLAMA_LLM_NUM_PREDICT=9000 ### Stop sequences for Ollama LLM # OLLAMA_LLM_STOP='["</s>", "<|EOT|>"]' ### Bedrock Specific Parameters # BEDROCK_LLM_TEMPERATURE=1.0 #################################################################################### ### Embedding Configuration (Should not be changed after the first file processed) ### EMBEDDING_BINDING: ollama, openai, azure_openai, jina, lollms, aws_bedrock #################################################################################### # EMBEDDING_TIMEOUT=30 EMBEDDING_BINDING=ollama EMBEDDING_MODEL=granite-embedding:latest EMBEDDING_DIM=1024 EMBEDDING_BINDING_API_KEY=your_api_key # If LightRAG deployed in Docker uses host.docker.internal instead of localhost EMBEDDING_BINDING_HOST=http://localhost:11434 ### OpenAI compatible (VoyageAI embedding openai compatible) # EMBEDDING_BINDING=openai # EMBEDDING_MODEL=text-embedding-3-large # EMBEDDING_DIM=3072 # EMBEDDING_BINDING_HOST=https://api.openai.com/v1 # EMBEDDING_BINDING_API_KEY=your_api_key ### Optional for Azure # AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large # AZURE_EMBEDDING_API_VERSION=2023-05-15 # AZURE_EMBEDDING_ENDPOINT=your_endpoint # AZURE_EMBEDDING_API_KEY=your_api_key ### Jina AI Embedding # EMBEDDING_BINDING=jina # EMBEDDING_BINDING_HOST=https://api.jina.ai/v1/embeddings # EMBEDDING_MODEL=jina-embeddings-v4 # EMBEDDING_DIM=2048 # EMBEDDING_BINDING_API_KEY=your_api_key ### Optional for Ollama embedding OLLAMA_EMBEDDING_NUM_CTX=8192 ### use the following command to see all support options for Ollama embedding ### lightrag-server --embedding-binding ollama --help #################################################################### ### WORKSPACE sets workspace name for all storage types ### for the purpose of isolating data from LightRAG instances. ### Valid workspace name constraints: a-z, A-Z, 0-9, and _ #################################################################### # WORKSPACE=space1 ############################ ### Data storage selection ############################ ### Default storage (Recommended for small scale deployment) # LIGHTRAG_KV_STORAGE=JsonKVStorage # LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage # LIGHTRAG_GRAPH_STORAGE=NetworkXStorage # LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage ### Redis Storage (Recommended for production deployment) # LIGHTRAG_KV_STORAGE=RedisKVStorage # LIGHTRAG_DOC_STATUS_STORAGE=RedisDocStatusStorage ### Vector Storage (Recommended for production deployment) # LIGHTRAG_VECTOR_STORAGE=MilvusVectorDBStorage # LIGHTRAG_VECTOR_STORAGE=QdrantVectorDBStorage # LIGHTRAG_VECTOR_STORAGE=FaissVectorDBStorage ### Graph Storage (Recommended for production deployment) # LIGHTRAG_GRAPH_STORAGE=Neo4JStorage # LIGHTRAG_GRAPH_STORAGE=MemgraphStorage ### PostgreSQL # LIGHTRAG_KV_STORAGE=PGKVStorage # LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage # LIGHTRAG_GRAPH_STORAGE=PGGraphStorage # LIGHTRAG_VECTOR_STORAGE=PGVectorStorage ### MongoDB (Vector storage only available on Atlas Cloud) # LIGHTRAG_KV_STORAGE=MongoKVStorage # LIGHTRAG_DOC_STATUS_STORAGE=MongoDocStatusStorage # LIGHTRAG_GRAPH_STORAGE=MongoGraphStorage # LIGHTRAG_VECTOR_STORAGE=MongoVectorDBStorage ### PostgreSQL Configuration POSTGRES_HOST=localhost POSTGRES_PORT=5432 POSTGRES_USER=your_username POSTGRES_PASSWORD='your_password' POSTGRES_DATABASE=your_database POSTGRES_MAX_CONNECTIONS=12 # POSTGRES_WORKSPACE=forced_workspace_name ### PostgreSQL Vector Storage Configuration ### Vector storage type: HNSW, IVFFlat POSTGRES_VECTOR_INDEX_TYPE=HNSW POSTGRES_HNSW_M=16 POSTGRES_HNSW_EF=200 POSTGRES_IVFFLAT_LISTS=100 ### PostgreSQL Connection Retry Configuration (Network Robustness) ### Number of retry attempts (1-10, default: 3) ### Initial retry backoff in seconds (0.1-5.0, default: 0.5) ### Maximum retry backoff in seconds (backoff-60.0, default: 5.0) ### Connection pool close timeout in seconds (1.0-30.0, default: 5.0) # POSTGRES_CONNECTION_RETRIES=3 # POSTGRES_CONNECTION_RETRY_BACKOFF=0.5 # POSTGRES_CONNECTION_RETRY_BACKOFF_MAX=5.0 # POSTGRES_POOL_CLOSE_TIMEOUT=5.0 ### PostgreSQL SSL Configuration (Optional) # POSTGRES_SSL_MODE=require # POSTGRES_SSL_CERT=/path/to/client-cert.pem # POSTGRES_SSL_KEY=/path/to/client-key.pem # POSTGRES_SSL_ROOT_CERT=/path/to/ca-cert.pem # POSTGRES_SSL_CRL=/path/to/crl.pem ### PostgreSQL Server Settings (for Supabase Supavisor) # Use this to pass extra options to the PostgreSQL connection string. # For Supabase, you might need to set it like this: # POSTGRES_SERVER_SETTINGS="options=reference%3D[project-ref]" # Default is 100 set to 0 to disable # POSTGRES_STATEMENT_CACHE_SIZE=100 ### Neo4j Configuration NEO4J_URI=neo4j+s://xxxxxxxx.databases.neo4j.io NEO4J_USERNAME=neo4j NEO4J_PASSWORD='your_password' NEO4J_DATABASE=neo4j NEO4J_MAX_CONNECTION_POOL_SIZE=100 NEO4J_CONNECTION_TIMEOUT=30 NEO4J_CONNECTION_ACQUISITION_TIMEOUT=30 NEO4J_MAX_TRANSACTION_RETRY_TIME=30 NEO4J_MAX_CONNECTION_LIFETIME=300 NEO4J_LIVENESS_CHECK_TIMEOUT=30 NEO4J_KEEP_ALIVE=true # NEO4J_WORKSPACE=forced_workspace_name ### MongoDB Configuration MONGO_URI=mongodb://root:root@localhost:27017/ #MONGO_URI=mongodb+srv://xxxx MONGO_DATABASE=LightRAG # MONGODB_WORKSPACE=forced_workspace_name ### Milvus Configuration MILVUS_URI=http://localhost:19530 MILVUS_DB_NAME=lightrag # MILVUS_USER=root # MILVUS_PASSWORD=your_password # MILVUS_TOKEN=your_token # MILVUS_WORKSPACE=forced_workspace_name ### Qdrant QDRANT_URL=http://localhost:6333 # QDRANT_API_KEY=your-api-key # QDRANT_WORKSPACE=forced_workspace_name ### Redis REDIS_URI=redis://localhost:6379 REDIS_SOCKET_TIMEOUT=30 REDIS_CONNECT_TIMEOUT=10 REDIS_MAX_CONNECTIONS=100 REDIS_RETRY_ATTEMPTS=3 # REDIS_WORKSPACE=forced_workspace_name ### Memgraph Configuration MEMGRAPH_URI=bolt://localhost:7687 MEMGRAPH_USERNAME= MEMGRAPH_PASSWORD= MEMGRAPH_DATABASE=memgraph # MEMGRAPH_WORKSPACE=forced_workspace_name 
Enter fullscreen mode Exit fullscreen mode

Based on the Gemmini sample, I wrote the code below with some hard-coded text.

# preparing the environment python3 -m venv venv source venv/bin/activate pip install --upgrade pip pip install "lightrag-hku[api]" pip install ollama 
Enter fullscreen mode Exit fullscreen mode
import os import asyncio from functools import partial from datetime import datetime from lightrag import LightRAG, QueryParam try: from ollama import AsyncClient except ImportError: print("Warning: The 'ollama' Python package is required. Please run: pip install ollama") class AsyncClient: def __init__(self, host): pass async def chat(self, **kwargs): raise NotImplementedError("ollama package not installed.") from lightrag.llm.ollama import ollama_embed from lightrag.utils import setup_logger, EmbeddingFunc from lightrag.kg.shared_storage import initialize_pipeline_status OLLAMA_BASE_URL = "http://localhost:11434" LLM_MODEL = "granite4:latest" EMBEDDING_MODEL = "granite-embedding:latest" WORKING_DIR = "./rag_storage_ollama" EMBEDDING_DIMENSION = 384 OUTPUT_DIR = "./output" setup_logger("lightrag", level="INFO") if not os.path.exists(WORKING_DIR): os.mkdir(WORKING_DIR) if not os.path.exists(OUTPUT_DIR): os.mkdir(OUTPUT_DIR) async def custom_ollama_llm_complete(prompt: str, system_prompt: str = None, **kwargs): """ A custom function that handles the Ollama client initialization and model/base_url parameters that are injected via functools.partial, while robustly filtering out unwanted internal keywords. """ model = kwargs.pop('model') base_url = kwargs.pop('base_url') client = AsyncClient(host=base_url) messages = [] if system_prompt: messages.append({"role": "system", "content": system_prompt}) messages.append({"role": "user", "content": prompt}) keys_to_filter = { 'host', 'hashing_kv', 'llm_model_name', 'history_messages', 'keyword_extraction', 'enable_cot', 'is_system_prompt_only', 'prompt_config' } cleaned_kwargs = {k: v for k, v in kwargs.items() if k not in keys_to_filter} try: response = await client.chat( model=model, messages=messages, **cleaned_kwargs ) return response['message']['content'] except Exception as e: raise e async def initialize_rag(): """Initializes the LightRAG instance using standard Ollama configuration.""" configured_ollama_complete = partial( custom_ollama_llm_complete, model=LLM_MODEL, base_url=OLLAMA_BASE_URL, ) configured_ollama_embed = partial( ollama_embed, embed_model=EMBEDDING_MODEL, base_url=OLLAMA_BASE_URL ) wrapped_embedding_func = EmbeddingFunc( embedding_dim=EMBEDDING_DIMENSION, func=configured_ollama_embed, ) rag = LightRAG( working_dir=WORKING_DIR, llm_model_func=configured_ollama_complete, embedding_func=wrapped_embedding_func, ) await rag.initialize_storages() await initialize_pipeline_status() return rag async def main(): rag = None query = "How does RAG solve the problem of LLM hallucination and what are its main use cases?" try: print("Checking if required Ollama models are pulled...") # the knowledge source  sample_text = """ The concept of Retrieval-Augmented Generation (RAG) is a critical development in the field of large language models (LLMs). It addresses the 'hallucination' problem by grounding LLM responses in external, verified knowledge sources. Instead of relying solely on the LLM's static training data, RAG first retrieves relevant documents from a knowledge base (often a vector store) and then feeds these documents, alongside the user's query, to the LLM for generation. This two-step process significantly improves the accuracy, relevance, and transparency of the generated output. Popular applications include enterprise search, customer support, and domain-specific QA systems. """ print(f"--- 1. Initializing RAG with Ollama Models ---") rag = await initialize_rag() print(f"\n--- 2. Inserting Sample Text ({len(sample_text.split())} words) ---") await rag.ainsert(sample_text) print("Insertion complete. Data is ready for retrieval.") mode = "hybrid" print(f"\n--- 3. Querying the RAG System (Mode: {mode}) ---") print(f"Query: '{query}'") rag_result = await rag.aquery( query, param=QueryParam(mode=mode) ) response_text = None if hasattr(rag_result, 'get_response_text'): response_text = rag_result.get_response_text() elif isinstance(rag_result, str): response_text = rag_result print("\n" + "="*50) print("FINAL RAG RESPONSE") print("="*50) output_content = "" # Prepare string for file output  if response_text and not str(response_text).strip().startswith("Error:"): print(response_text) output_content += f"# RAG Query Result\n\n" output_content += f"## Query\n\n> {query}\n\n" output_content += f"## LLM/Cache Response\n\n{response_text}\n\n" print("\n" + "="*50) print("\n--- Context Retrieved (Sources) ---") output_content += f"## Retrieved Context (Sources)\n\n" if not isinstance(rag_result, str) and rag_result.retriever_output and rag_result.retriever_output.docs: for i, doc in enumerate(rag_result.retriever_output.docs): source_text = doc.text print(f"Source {i+1}: {source_text[:100]}...") output_content += f"### Source {i+1}\n\n" output_content += f"``` {% endraw %} text\n{source_text}\n {% raw %} ```\n" else: print("No context documents were retrieved (or result was a cache hit string).") output_content += "No context documents were retrieved (or result was a cache hit string).\n" else: error_message = "LLM failed to generate a response (Check Ollama logs for details)." print(error_message) output_content += f"# RAG Query Result\n\n## Error\n\n{error_message}\n\n" if response_text: print(f"\nError String from LightRAG: {response_text}") output_content += f"**Error Detail:** {response_text}\n" timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") filename = f"rag_query_output_{timestamp}.md" output_filepath = os.path.join(OUTPUT_DIR, filename) with open(output_filepath, 'w', encoding='utf-8') as f: f.write(output_content) print(f"\n--- Output Written to File ---") print(f"Successfully wrote output to: {output_filepath}") except Exception as e: if "'str' object has no attribute 'retriever_output'" in str(e): print("\n--- ERROR BYPASS: Detected Cache Hit String Result ---") print("The response was successfully retrieved from the cache and written to the output file.") else: # For all other (real) exceptions, print the detailed error block  print("\n" + "="*50) print("AN ERROR OCCURRED DURING RAG PROCESS") print("="*50) print(f"Error: {e}") print(f"Please ensure Ollama is running and accessible at {OLLAMA_BASE_URL}, and the models '{LLM_MODEL}' and '{EMBEDDING_MODEL}' are pulled locally.") print(f"To pull: 'ollama pull {LLM_MODEL}' and 'ollama pull {EMBEDDING_MODEL}'") print("="*50 + "\n") finally: if rag: print("\n--- Finalizing storages ---") await rag.finalize_storages() if __name__ == "__main__": asyncio.run(main()) 
Enter fullscreen mode Exit fullscreen mode

Parameters in the “.env” file are provided to personnalize the “input” folder, but I used the code.

The outputs I got, both on the console and in a markdown format (which I implemented as always);

 Checking if required Ollama models are pulled... --- 1. Initializing RAG with Ollama Models --- INFO: [_] Loaded graph from ./rag_storage_ollama/graph_chunk_entity_relation.graphml with 6 nodes, 3 edges INFO:nano-vectordb:Load (6, 384) data INFO:nano-vectordb:Init {'embedding_dim': 384, 'metric': 'cosine', 'storage_file': './rag_storage_ollama/vdb_entities.json'} 6 data INFO:nano-vectordb:Load (3, 384) data INFO:nano-vectordb:Init {'embedding_dim': 384, 'metric': 'cosine', 'storage_file': './rag_storage_ollama/vdb_relationships.json'} 3 data INFO:nano-vectordb:Load (1, 384) data INFO:nano-vectordb:Init {'embedding_dim': 384, 'metric': 'cosine', 'storage_file': './rag_storage_ollama/vdb_chunks.json'} 1 data INFO: [_] Process 6534 KV load full_docs with 1 records INFO: [_] Process 6534 KV load text_chunks with 1 records INFO: [_] Process 6534 KV load full_entities with 1 records INFO: [_] Process 6534 KV load full_relations with 1 records INFO: [_] Process 6534 KV load entity_chunks with 5 records INFO: [_] Process 6534 KV load relation_chunks with 3 records INFO: [_] Process 6534 KV load llm_response_cache with 4 records INFO: [_] Process 6534 doc status load doc_status with 1 records --- 2. Inserting Sample Text (94 words) --- WARNING: Ignoring document ID (already exists): doc-0cf311441f5fb26ee1b1b59ca382b793 (unknown_source) WARNING: No new unique documents were found. INFO: No documents to process Insertion complete. Data is ready for retrieval. --- 3. Querying the RAG System (Mode: hybrid) --- Query: 'How does RAG solve the problem of LLM hallucination and what are its main use cases?' INFO: Query nodes: information retrieval, document generation, fact verification, knowledge graphs, query answering, retrieval augmented generation, chatbots, conversational AI, domain-specific knowledge, real-time query processing (top_k:40, cosine:0.2) INFO: Embedding func: 8 new workers initialized (Timeouts: Func: 30s, Worker: 60s, Health Check: 75s) INFO: Local query: 6 entites, 3 relations INFO: Query edges: RAG, LLM hallucination, solving problems, main use cases (top_k:40, cosine:0.2) INFO: Global query: 5 entites, 3 relations INFO: Raw search results: 6 entities, 3 relations, 0 vector chunks INFO: After truncation: 6 entities, 3 relations INFO: Selecting 1 from 1 entity-related chunks by vector similarity INFO: Find no additional relations-related chunks from 3 relations INFO: Round-robin merged chunks: 1 -> 1 (deduplicated 0) WARNING: Rerank is enabled but no rerank model is configured. Please set up a rerank model or set enable_rerank=False in query parameters. INFO: Final context: 6 entities, 3 relations, 1 chunks INFO: Final chunks S+F/O: E6/1 INFO: == LLM cache == Query cache hit, using cached response as query result ================================================== FINAL RAG RESPONSE ================================================== LLM failed to generate a response (Check Ollama logs for details). Error String from LightRAG: **RAG Solves the Hallucination Problem** - **Definition**: The *hallucination problem* occurs when an LLM generates content that is factually incorrect or unsupported by available data. - **Solution via RAG**: Retrieval-Augmented Generation (RAG) mitigates this issue by first **retrieving relevant, verified documents** from a knowledge base—often using a **vector store** to efficiently index and search the data—and then feeding those retrieved documents alongside the user’s query into the LLM. - **Key Benefits**: This two‑step process ensures that the generated response is grounded in actual information rather than fabricated or speculative content, reducing instances of hallucination. **Main Use Cases** 1. **Enterprise Search & Information Retrieval** - Businesses leverage RAG to provide precise answers to internal queries across large datasets (e.g., product catalogs, policy documents). 2. **Customer Support Systems** - By retrieving relevant support articles and troubleshooting steps before generating a response, RAG reduces the likelihood of incorrect or unsupported advice. 3. **Domain‑Specific Question Answering (QA)** - In specialized fields such as healthcare, finance, or scientific research, RAG ensures that answers are backed by authoritative sources, enhancing trustworthiness. ### References - [1] Document Title One --- Finalizing storages --- INFO: Successfully finalized 12 storages 
Enter fullscreen mode Exit fullscreen mode
# RAG Query Result ## Query  > How does RAG solve the problem of LLM hallucination and what are its main use cases? ## LLM/Cache Response **RAG Solves the Hallucination Problem**  - **Definition**: The *hallucination problem* occurs when an LLM generates content that is factually incorrect or unsupported by available data. - **Solution via RAG**: Retrieval-Augmented Generation (RAG) mitigates this issue by first **retrieving relevant, verified documents** from a knowledge base—often using a **vector store** to efficiently index and search the data—and then feeding those retrieved documents alongside the user’s query into the LLM. - **Key Benefits**: This two‑step process ensures that the generated response is grounded in actual information rather than fabricated or speculative content, reducing instances of hallucination. **Main Use Cases**  1. **Enterprise Search & Information Retrieval**  - Businesses leverage RAG to provide precise answers to internal queries across large datasets (e.g., product catalogs, policy documents). 2. **Customer Support Systems**  - By retrieving relevant support articles and troubleshooting steps before generating a response, RAG reduces the likelihood of incorrect or unsupported advice. 3. **Domain‑Specific Question Answering (QA)**  - In specialized fields such as healthcare, finance, or scientific research, RAG ensures that answers are backed by authoritative sources, enhancing trustworthiness. ### References  - [1] Document Title One ## Retrieved Context (Sources) No context documents were retrieved (or result was a cache hit string). 
Enter fullscreen mode Exit fullscreen mode

The next step was to put some documents in markdown format (other formats are accepted, but I didn’t test them) and use them as my own created RAG and use again Ollama and Granite to test it in a bit less hard-coded way.

Parameters in the “.env” file are provided to personnalize the “input” folder, but I used the code.

import os import asyncio from functools import partial from datetime import datetime from lightrag import LightRAG, QueryParam import glob try: from ollama import AsyncClient except ImportError: print("Warning: The 'ollama' Python package is required. Please run: pip install ollama") class AsyncClient: def __init__(self, host): pass async def chat(self, **kwargs): raise NotImplementedError("ollama package not installed.") from lightrag.llm.ollama import ollama_embed from lightrag.utils import setup_logger, EmbeddingFunc from lightrag.kg.shared_storage import initialize_pipeline_status OLLAMA_BASE_URL = "http://localhost:11434" LLM_MODEL = "granite4:latest" EMBEDDING_MODEL = "granite-embedding:latest" WORKING_DIR = "./rag_storage_ollama" EMBEDDING_DIMENSION = 384 DOCUMENTS_DIR = "./documents" # Directory to read source files from OUTPUT_DIR = "./output" # Directory to write RAG results to  setup_logger("lightrag", level="INFO") if not os.path.exists(WORKING_DIR): os.mkdir(WORKING_DIR) print(f"Created working directory: {WORKING_DIR}") if not os.path.exists(OUTPUT_DIR): os.mkdir(OUTPUT_DIR) print(f"Created output directory: {OUTPUT_DIR}") if not os.path.exists(DOCUMENTS_DIR): os.mkdir(DOCUMENTS_DIR) print(f"Created documents directory: {DOCUMENTS_DIR}") async def custom_ollama_llm_complete(prompt: str, system_prompt: str = None, **kwargs): """ A custom function that handles the Ollama client initialization and model/base_url parameters that are injected via functools.partial, while robustly filtering out unwanted internal keywords. """ model = kwargs.pop('model') base_url = kwargs.pop('base_url') client = AsyncClient(host=base_url) messages = [] if system_prompt: messages.append({"role": "system", "content": system_prompt}) messages.append({"role": "user", "content": prompt}) keys_to_filter = { 'host', 'hashing_kv', 'llm_model_name', 'history_messages', 'keyword_extraction', 'enable_cot', 'is_system_prompt_only', 'prompt_config' } cleaned_kwargs = {k: v for k, v in kwargs.items() if k not in keys_to_filter} try: response = await client.chat( model=model, messages=messages, **cleaned_kwargs ) return response['message']['content'] except Exception as e: raise e async def initialize_rag(): """Initializes the LightRAG instance using standard Ollama configuration.""" configured_ollama_complete = partial( custom_ollama_llm_complete, model=LLM_MODEL, base_url=OLLAMA_BASE_URL, ) configured_ollama_embed = partial( ollama_embed, embed_model=EMBEDDING_MODEL, base_url=OLLAMA_BASE_URL ) wrapped_embedding_func = EmbeddingFunc( embedding_dim=EMBEDDING_DIMENSION, func=configured_ollama_embed, ) rag = LightRAG( working_dir=WORKING_DIR, llm_model_func=configured_ollama_complete, embedding_func=wrapped_embedding_func, ) await rag.initialize_storages() await initialize_pipeline_status() return rag async def load_and_insert_documents(rag: LightRAG): """ Reads files from the DOCUMENTS_DIR and inserts their content into the RAG system. Fixed to use a more compatible method for document insertion. """ file_paths = glob.glob(os.path.join(DOCUMENTS_DIR, '*.[mM][dD]')) + \ glob.glob(os.path.join(DOCUMENTS_DIR, '*.[tT][xX][tT]')) if not file_paths: print("\n--- WARNING: No documents found in './documents' directory. ---") print("Please add some Markdown (.md) or Text (.txt) files to populate the knowledge base.") return False print(f"\n--- 2. Inserting Documents ({len(file_paths)} file(s) found) ---") insertion_succeeded = 0 for file_path in file_paths: filename = os.path.basename(file_path) try: with open(file_path, 'r', encoding='utf-8') as f: content = f.read() await rag.ainsert(content, doc_meta={'doc_id': filename}) print(f" > Successfully inserted: {filename} ({len(content.split())} words)") insertion_succeeded += 1 except TypeError as te: if "'doc_id'" in str(te) or "'doc_meta'" in str(te): print(f" > FAILED (Type Error): {filename}. Attempting insertion without metadata to check compatibility.") try: await rag.ainsert(content) print(f" > Successfully inserted (no metadata): {filename}") insertion_succeeded += 1 except Exception as e: print(f" > FAILED (General Error): {filename} - {e}") else: print(f" > FAILED to read or insert {filename} (Type Error): {te}") except Exception as e: print(f" > FAILED to read or insert {filename} (General Error): {e}") if insertion_succeeded == 0: print("Insertion complete, but no documents were successfully inserted. Please check LightRAG documentation for the correct argument name for source IDs.") return False print("Insertion complete. Data is ready for retrieval.") return True async def main(): rag = None query = "Describe Quantum-Safe cryptography?" try: print("Checking if required Ollama models are pulled...") print(f"--- 1. Initializing RAG with Ollama Models ---") rag = await initialize_rag() documents_inserted = await load_and_insert_documents(rag) if not documents_inserted: return mode = "hybrid" print(f"\n--- 3. Querying the RAG System (Mode: {mode}) ---") print(f"Query: '{query}'") rag_result = await rag.aquery( query, param=QueryParam(mode=mode) ) response_text = None if hasattr(rag_result, 'get_response_text'): response_text = rag_result.get_response_text() elif isinstance(rag_result, str): response_text = rag_result print("\n" + "="*50) print("FINAL RAG RESPONSE") print("="*50) output_content = "" # Prepare string for file output  if response_text and not str(response_text).strip().startswith("Error:"): print(response_text) output_content += f"# RAG Query Result\n\n" output_content += f"## Query\n\n> {query}\n\n" output_content += f"## LLM/Cache Response\n\n{response_text}\n\n" print("\n" + "="*50) print("\n--- Context Retrieved (Sources) ---") output_content += f"## Retrieved Context (Sources)\n\n" if not isinstance(rag_result, str) and rag_result.retriever_output and rag_result.retriever_output.docs: unique_sources = set() for i, doc in enumerate(rag_result.retriever_output.docs): source_text = doc.text source_id = doc.doc_id if hasattr(doc, 'doc_id') and doc.doc_id else ( doc.doc_meta.get('doc_id') if hasattr(doc, 'doc_meta') and isinstance(doc.doc_meta, dict) else 'Unknown Source' ) unique_sources.add(source_id) print(f"Source {i+1} (File: {source_id}): {source_text[:100]}...") output_content += f"### Source {i+1} (File: `{source_id}`)\n\n" output_content += f"``` {% endraw %} text\n{source_text}\n {% raw %} ```\n" print(f"\nAnswer Grounded in: {', '.join(sorted(list(unique_sources)))}") else: print("No context documents were retrieved (or result was a cache hit string).") output_content += "No context documents were retrieved (or result was a cache hit string).\n" else: error_message = "LLM failed to generate a response (Check Ollama logs for details)." print(error_message) output_content += f"# RAG Query Result\n\n## Error\n\n{error_message}\n\n" if response_text: print(f"\nError String from LightRAG: {response_text}") output_content += f"**Error Detail:** {response_text}\n" timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") filename = f"rag_query_output_{timestamp}.md" output_filepath = os.path.join(OUTPUT_DIR, filename) with open(output_filepath, 'w', encoding='utf-8') as f: f.write(output_content) print(f"\n--- Output Written to File ---") print(f"Successfully wrote output to: {output_filepath}") except Exception as e: if "'str' object has no attribute 'retriever_output'" in str(e): print("\n--- ERROR BYPASS: Detected Cache Hit String Result ---") print("The response was successfully retrieved from the cache and written to the output file.") else: print("\n" + "="*50) print("AN ERROR OCCURRED DURING RAG PROCESS") print("="*50) print(f"Error: {e}") print(f"Please ensure Ollama is running and accessible at {OLLAMA_BASE_URL}, and the models '{LLM_MODEL}' and '{EMBEDDING_MODEL}' are pulled locally.") print(f"To pull: 'ollama pull {LLM_MODEL}' and 'ollama pull {EMBEDDING_MODEL}'") print("="*50 + "\n") finally: if rag: print("\n--- Finalizing storages ---") await rag.finalize_storages() if __name__ == "__main__": asyncio.run(main()) 
Enter fullscreen mode Exit fullscreen mode

And hereafther the output;

# RAG Query Result ## Query  > Describe Quantum-Safe cryptography? ## LLM/Cache Response ### What is Quantum-Safe Cryptography? Quantum-safe cryptography, also known as post‑quantum cryptography (PQC), refers to cryptographic algorithms and protocols designed to remain secure even against attacks by a sufficiently powerful quantum computer. Traditional public-key cryptosystems like RSA, ECC, Diffie‑Hellman, and elliptic curve variants are vulnerable to Shor’s algorithm, which could efficiently factor large integers and compute discrete logarithms—tasks that form the basis of these cryptographic schemes. **Key Characteristics**  - **Resistance to Quantum Attacks**: Quantum-safe algorithms rely on mathematical problems that remain hard for both classical and quantum computers. They include:  - **Lattice‑Based Cryptography**: Uses high‑dimensional lattice structures (e.g., LWE, Ring-LWE). The hardness of solving shortest vector problems in lattices is believed to be resilient against quantum attacks.  - **Hash-Based Signatures**: Leverages cryptographic hash functions for digital signatures (e.g., SPHINCS+, XMSS).  - **Code‑Based Cryptography**: Based on error-correcting code theory, such as McEliece cryptosystems.  - **Multivariate Polynomial Cryptography**: Involves solving systems of multivariate quadratic equations.  - **Isogeny-Based Cryptography**: Relies on the hardness of computing isogenies between elliptic curves.  - **Adaptation to Quantum Threats**: PQC aims to provide long‑term security for data and communications that are expected to remain confidential or authentic even after quantum computers capable of breaking current public-key cryptography become available (often estimated within 10–20 years). ### Why Is Quantum-Safe Cryptography Needed?  1. **Threat from Shor’s Algorithm**: If a sufficiently large, fault‑tolerant quantum computer were built, it could execute Shor’s algorithm efficiently, rendering RSA and ECC insecure. 2. **Supply Chain Risks**: Many cryptographic keys are embedded in hardware modules used across industries (e.g., IoT devices). Once compromised, they can’t be reissued or upgraded without disrupting operations. 3. **Long‑Term Security Needs**: Applications requiring centuries of confidentiality (banking systems, national security archives) must consider quantum resilience. ### How Quantum-Safe Cryptography Works  1. **Key Generation**  - Generates a public/private key pair using a post‑quantum algorithm.  - May involve generating random seeds for lattice problems or cryptographic hash functions.  2. **Encryption/Decryption**  - **Symmetric Encryption**: Even with quantum-safe algorithms, symmetric encryption (e.g., AES) remains secure and often used alongside PQC for bulk data protection.  - **Public-Key Operations**: The selected algorithm (e.g., LWE-based encryption) encrypts messages using the public key, ensuring that only the holder of the corresponding private key can decrypt them.  3. **Digital Signatures**  - Uses hash‑based or lattice‑based signature schemes to verify authenticity and integrity.  - Signature generation is typically performed with a private key; verification uses the associated public key.  4. **Integration into Protocols**  - Adapts existing protocols (TLS, SSH) by replacing quantum‑vulnerable components (e.g., RSA for key exchange in TLS) with PQC alternatives while maintaining interoperability.  - Often requires careful handling of key sizes due to differences in security levels (e.g., 256 bits vs. 3072+ bits). ### Current State and Standards  - **NIST Post‑Quantum Cryptography Standardization Process**: The National Institute of Standards and Technology (NIST) has been running a multi‑year standardization process, evaluating dozens of algorithms across categories like encryption, signatures, key encapsulation, and hash functions.  - *Selected Candidates*: In September 2022, NIST announced four post‑quantum cryptographic algorithms as “PQC finalists”:  - **CRYSTALS-Kyber** (Key Encapsulation Mechanism)  - **SIKE** (Supersingular Isogeny Key Encapsulation)  - **FrodoKEM** and **NTRU** (other KEM candidates)  - *Expected Timeline*: The process is expected to finalize a standard by around 2030.  - **Adoption in Protocols**:  - **TLS 1.3** already includes draft extensions for PQC key exchange.  - **Cloud Providers and Enterprises**: Initiatives like IBM’s Quantum System, Google’s Cirq framework, and Azure’s quantum‑ready services are testing PQC implementations on hardware.  - **Challenges**  - **Performance Overhead**: Some post‑quantum algorithms (especially lattice-based ones) introduce higher computational complexity.  - **Implementation Risks**: Vulnerabilities in software libraries or side‑channel attacks can undermine security even of a theoretically secure algorithm.  - **Key Management**: Transitioning to larger keys and new key formats requires robust management systems. ### Practical Use Cases  1. **Secure Messaging Platforms**  - Employ post‑quantum encryption for end‑to‑end message protection, ensuring that conversations remain confidential even if quantum computers become powerful enough to break current RSA or ECC.  2. **Financial Transactions**  - Use lattice-based signatures for transaction signing, providing long-term integrity without needing frequent key rotations.  3. **IoT Devices**  - Deploy lightweight hash‑based signature schemes to authenticate firmware updates and sensor data while preserving functionality on resource-constrained devices.  4. **Government Communications**  - Adopt post‑quantum encryption for classified communications where quantum resistance is a regulatory requirement. ### Future Outlook  - **Hybrid Approaches**: Many organizations are experimenting with hybrid cryptography—using PQC algorithms alongside existing ones—for backward compatibility until fully transitioned. - **Quantum Key Distribution (QKD)**: As the infrastructure for QKD grows, it may complement or eventually replace traditional key management in some scenarios. - **Continuous Research**: The field of quantum-safe cryptography is dynamic; ongoing research on new mathematical primitives and cryptanalytic attacks will shape future standards. ### Conclusion Quantum-safe cryptography represents a proactive strategy to safeguard digital assets against the eventual emergence of powerful quantum computers. By leveraging algorithms resistant to quantum computation, organizations can maintain confidentiality, integrity, and authenticity over extended periods, preparing for a post‑quantum era while ensuring compatibility with existing cryptographic protocols. As standardization bodies finalize PQC standards and hardware vendors integrate these solutions into their products, the transition will gradually become mainstream, reinforcing digital security across all sectors. ### References  - [1] Document Title One ## Retrieved Context (Sources) No context documents were retrieved (or result was a cache hit string). 
Enter fullscreen mode Exit fullscreen mode

LightRAG provides out-of-the-box generation of Knowledge Graphs. 👏

Bonus 💎

In order to display the graph files, I wrote a simple app using Streamlit which is provided hereafter (and for sure should be enhanced) 🙂‍↔️

… and which you won’t need if you rely on the LightRAG’s UI integrated functionnality…🤷‍♂️


pip install networkx matplotlib pip install streamlit ... streamlit run streamlit_visualize_graph.py 
Enter fullscreen mode Exit fullscreen mode
import streamlit as st import networkx as nx import matplotlib.pyplot as plt import io import itertools from io import StringIO def visualize_graph(G: nx.Graph, layout_name: str, layout_params: dict): """ Generates a Matplotlib plot of the NetworkX graph with custom styling. """ node_labels = nx.get_node_attributes(G, 'label') if not node_labels: node_labels = nx.get_node_attributes(G, 'text') edge_labels = nx.get_edge_attributes(G, 'text') node_types = nx.get_node_attributes(G, 'type') type_color_map = { 'Entity': '#1f78b4', # Blue  'Chunk': '#b2df8a', # Light Green  'Relation': '#33a02c', # Dark Green  'Unknown': '#a6cee3' # Light Blue  } node_colors = [type_color_map.get(node_types.get(node, 'Unknown'), type_color_map['Unknown']) for node in G.nodes()] if layout_name == 'Spring Layout': pos = nx.spring_layout(G, **layout_params) elif layout_name == 'Circular Layout': pos = nx.circular_layout(G) elif layout_name == 'Spectral Layout': pos = nx.spectral_layout(G) elif layout_name == 'Kamada-Kawai Layout': pos = nx.kamada_kawai_layout(G) else: pos = nx.spring_layout(G, **layout_params) fig, ax = plt.subplots(figsize=(16, 10)) nx.draw_networkx_nodes( G, pos, node_size=2500, node_color=node_colors, alpha=0.9 ) # Draw edges  nx.draw_networkx_edges( G, pos, ax=ax, edge_color='gray', style='dashed', arrowstyle='->', arrowsize=25 ) nx.draw_networkx_labels( G, pos, labels=node_labels, font_size=11, font_color='black', font_weight='bold', ) nx.draw_networkx_edge_labels( G, pos, edge_labels=edge_labels, font_color='red', font_size=9, bbox={"boxstyle": "round,pad=0.4", "fc": "white", "alpha": 0.7, "ec": "none"} ) ax.set_title(f"Visualized Graph: {G.number_of_nodes()} Nodes, {G.number_of_edges()} Edges", fontsize=16) plt.axis('off') plt.tight_layout() st.pyplot(fig) def app(): st.set_page_config(layout="wide", page_title="GraphML Viewer") st.title("GraphML Visualization App") st.markdown("A tool to visualize GraphML (e.g., LightRAG) outputs using NetworkX and Streamlit.") st.sidebar.header("Data Upload & Layout Controls") uploaded_file = st.sidebar.file_uploader( "Upload your .graphml file", type=["graphml"] ) graph_data = None if uploaded_file is not None: try: graph_data = uploaded_file.read().decode("utf-8") st.sidebar.success("File uploaded successfully! Graph loading...") except Exception as e: st.sidebar.error(f"Error reading file: {e}") graph_data = None else: st.info("Please upload a GraphML file in the sidebar to visualize your knowledge graph.") st.sidebar.subheader("Layout Algorithm") layout_name = st.sidebar.selectbox( "Choose Graph Layout:", ('Spring Layout', 'Kamada-Kawai Layout', 'Circular Layout', 'Spectral Layout') ) layout_params = {} if layout_name == 'Spring Layout': st.sidebar.caption("Fine-tune the Spring Layout forces:") k_val = st.sidebar.slider("k (Node Spacing)", 0.01, 1.0, 0.15, 0.01) iters = st.sidebar.slider("Iterations", 10, 100, 50, 10) layout_params = {'k': k_val, 'iterations': iters} if graph_data: try: G = nx.read_graphml(StringIO(graph_data)) st.header("Knowledge Graph Visualization") st.write(f"Graph loaded: {G.number_of_nodes()} Nodes, {G.number_of_edges()} Edges") visualize_graph(G, layout_name, layout_params) except Exception as e: st.error(f"An error occurred while processing the graph: {e}") st.code(f"Error details: {e}") st.warning("Please check if the GraphML file is correctly formatted and contains valid data.") if __name__ == '__main__': app() 
Enter fullscreen mode Exit fullscreen mode

That’s a wrap 🧑‍💻

Conclusion

In conclusion, LightRAG provides a sophisticated and modular foundation for constructing advanced Retrieval-Augmented Generation systems. It moves beyond conventional vector-only retrieval by emphasizing the integration of Knowledge Graphs (KG), which structure raw text into traceable entities and relationships. This hybrid approach — combining semantic vector search with relationship-based KG querying — ensures the Large Language Model is grounded in the most contextually rich and verifiable information possible.

The project benefits from robust, high-quality documentation that meticulously outlines LightRAG’s integration with a wide variety of tools and services. While this detailed catalog is an invaluable resource, my immediate attention was limited to successfully configuring the environment with Ollama (in a basic way though), allowing me to bypass the broader ecosystem documentation for the time being.

>>> Thanks for reading 🔟 <<<

Links

Top comments (0)