Challenges in Retriever Augmented Generation Systems

Explore top LinkedIn content from expert professionals.

Santiago Valdarrama

Computer scientist and writer. I teach hard-core Machine Learning at ml.school.

119,499 followers 1y
Report this post
Some challenges in building LLM-powered applications (including RAG systems) for large companies: 1. Hallucinations are very damaging to the brand. It only takes one for people to lose faith in the tool completely. Contrary to popular belief, RAG doesn't fix hallucinations. 2. Chunking a knowledge base is not straightforward. This leads to poor context retrieval, which leads to bad answers from a model powering a RAG system. 3. As information changes, you also need to change your chunks and embeddings. Depending on the complexity of the information, this can become a nightmare. 4. Models are black boxes. We only have access to modify their inputs (prompts), but it's hard to determine cause-effect when troubleshooting (e.g., Why is "Produce concise answers" working better than "Reply in short sentences"?) 5. Prompts are too brittle. Every new version of a model can cause your previous prompts to stop working. Unfortunately, you don't know why or how to fix them (see #4 above.) 6. It is not yet clear how to reliably evaluate production systems. 7. Costs and latency are still significant issues. The best models out there cost a lot of money and are very slow. Cheap and fast models have very limited applicability. 8. There are not enough qualified people to deal with these issues. I cannot highlight this problem enough. You may encounter one or more of these problems in a project at once. Depending on your requirements, some of these issues may be showstoppers (hallucinating direction instructions for a robot) or simple nuances (support agent hallucinating an incorrect product description.) There's still a lot of work to do until these systems mature to a point where they are viable for most use cases.

8 Comments
Like Comment
Sohrab Rahimi

Partner at McKinsey & Company | Head of Data Science Guild in North America

19,975 followers 1y
Report this post
Many companies have started experimenting with simple RAG systems, probably as their first use case, to test the effectiveness of generative AI in extracting knowledge from unstructured data like PDFs, text files, and PowerPoint files. If you've used basic RAG architectures with tools like LlamaIndex or LangChain, you might have already encountered three key problems: 𝟭. 𝗜𝗻𝗮𝗱𝗲𝗾𝘂𝗮𝘁𝗲 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗠𝗲𝘁𝗿𝗶𝗰𝘀: Existing metrics fail to catch subtle errors like unsupported claims or hallucinations, making it hard to accurately assess and enhance system performance. 𝟮. 𝗗𝗶𝗳𝗳𝗶𝗰𝘂𝗹𝘁𝘆 𝗛𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀: Standard RAG methods often struggle to find and combine information from multiple sources effectively, leading to slower responses and less relevant results. 𝟯. 𝗦𝘁𝗿𝘂𝗴𝗴𝗹𝗶𝗻𝗴 𝘁𝗼 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗮𝗻𝗱 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝘀: Basic RAG approaches often miss the deeper relationships between information pieces, resulting in incomplete or inaccurate answers that don't fully meet user needs. In this post I will introduce three useful papers to address these gaps: 𝟭. 𝗥𝗔𝗚𝗖𝗵𝗲𝗸𝗲𝗿: introduces a new framework for evaluating RAG systems with a focus on fine-grained, claim-level metrics. It proposes a comprehensive set of metrics: claim-level precision, recall, and F1 score to measure the correctness and completeness of responses; claim recall and context precision to evaluate the effectiveness of the retriever; and faithfulness, noise sensitivity, hallucination rate, self-knowledge reliance, and context utilization to diagnose the generator's performance. Consider using these metrics to help identify errors, enhance accuracy, and reduce hallucinations in generated outputs. 𝟮. 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗥𝗔𝗚: It uses a labeler and filter mechanism to identify and retain only the most relevant parts of retrieved information, reducing the need for repeated large language model calls. This iterative approach refines search queries efficiently, lowering latency and costs while maintaining high accuracy for complex, multi-hop questions. 𝟯. 𝗚𝗿𝗮𝗽𝗵𝗥𝗔𝗚: By leveraging structured data from knowledge graphs, GraphRAG methods enhance the retrieval process, capturing complex relationships and dependencies between entities that traditional text-based retrieval methods often miss. This approach enables the generation of more precise and context-aware content, making it particularly valuable for applications in domains that require a deep understanding of interconnected data, such as scientific research, legal documentation, and complex question answering. For example, in tasks such as query-focused summarization, GraphRAG demonstrates substantial gains by effectively leveraging graph structures to capture local and global relationships within documents. It's encouraging to see how quickly gaps are identified and improvements are made in the GenAI world.
No more previous content

No more next content

Sohrab Rahimi

Partner at McKinsey & Company | Head of Data Science Guild in North America

Many companies have started experimenting with simple RAG systems, probably as their first use case, to test the effectiveness of generative AI in extracting knowledge from unstructured data like PDFs, text files, and PowerPoint files. If you've used basic RAG architectures with tools like LlamaIndex or LangChain, you might have already encountered three key problems: 𝟭. 𝗜𝗻𝗮𝗱𝗲𝗾𝘂𝗮𝘁𝗲 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗠𝗲𝘁𝗿𝗶𝗰𝘀: Existing metrics fail to catch subtle errors like unsupported claims or hallucinations, making it hard to accurately assess and enhance system performance. 𝟮. 𝗗𝗶𝗳𝗳𝗶𝗰𝘂𝗹𝘁𝘆 𝗛𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀: Standard RAG methods often struggle to find and combine information from multiple sources effectively, leading to slower responses and less relevant results. 𝟯. 𝗦𝘁𝗿𝘂𝗴𝗴𝗹𝗶𝗻𝗴 𝘁𝗼 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗮𝗻𝗱 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝘀: Basic RAG approaches often miss the deeper relationships between information pieces, resulting in incomplete or inaccurate answers that don't fully meet user needs. In this post I will introduce three useful papers to address these gaps: 𝟭. 𝗥𝗔𝗚𝗖𝗵𝗲𝗸𝗲𝗿: introduces a new framework for evaluating RAG systems with a focus on fine-grained, claim-level metrics. It proposes a comprehensive set of metrics: claim-level precision, recall, and F1 score to measure the correctness and completeness of responses; claim recall and context precision to evaluate the effectiveness of the retriever; and faithfulness, noise sensitivity, hallucination rate, self-knowledge reliance, and context utilization to diagnose the generator's performance. Consider using these metrics to help identify errors, enhance accuracy, and reduce hallucinations in generated outputs. 𝟮. 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗥𝗔𝗚: It uses a labeler and filter mechanism to identify and retain only the most relevant parts of retrieved information, reducing the need for repeated large language model calls. This iterative approach refines search queries efficiently, lowering latency and costs while maintaining high accuracy for complex, multi-hop questions. 𝟯. 𝗚𝗿𝗮𝗽𝗵𝗥𝗔𝗚: By leveraging structured data from knowledge graphs, GraphRAG methods enhance the retrieval process, capturing complex relationships and dependencies between entities that traditional text-based retrieval methods often miss. This approach enables the generation of more precise and context-aware content, making it particularly valuable for applications in domains that require a deep understanding of interconnected data, such as scientific research, legal documentation, and complex question answering. For example, in tasks such as query-focused summarization, GraphRAG demonstrates substantial gains by effectively leveraging graph structures to capture local and global relationships within documents. It's encouraging to see how quickly gaps are identified and improvements are made in the GenAI world.

14 Comments

Like Comment
14 Comments
Like Comment
Daniel Svonava

Vector Compute @ Superlinked | xYouTube

37,222 followers 10mo
Report this post
RAG is the future, but that doesn't mean we should forget tried and tested techniques. Expert systems and knowledge infras wrestled current RAG challenges for decades. Let's see why a hybrid approach could open up new opportunities 📚💡. RAG's challenges aren’t new ⚠️: 1️⃣ Data Ingestion ▪️ Splitting documents into smaller chunks can lead to a loss of context, affecting system's performance. ▪️ Data structure and format significantly impact tokenization and the quality of generated output. 2️⃣ Querying ▪️ User behaviors often deviate from even the most meticulous system designs. ▪️ Imagine users inputting unstructured keywords instead of a clear question. Or using pronouns like "it" or "that" without clear antecedents. 3️⃣ Data Context Challenges ▪️ LLMs' limited context windows force document splitting, often disrupting inherent context and relationships. ▪️ Training on predominantly short web pages' datasets like Common Crawl creates a mismatch when applied to lengthy, real-world documents. ▪️ Poor segmentation or unusual structures can skew tokenization, leading to more generation errors. 4️⃣ Retrieval Metric Issues ▪️ Traditional binary relevance metrics aren't well-suited for evaluating embedding models' differences in similarity scores. ▪️ Embedding models trained on general-purpose datasets often underperform on specialized content. ▪️ The concept of "similarity" is subjective and can differ between users and embedding models. These challenges require a more flexible approach to RAG system design. Here are some key considerations: 🔍 Hybrid Indexing ▪️ Combining keyword-based search with embedding-based retrieval can leverage the strengths of both. ▪️ Pierre successfully implemented a hybrid strategy that led to 90% of relevant resources appearing in the top ten search results. 📈 Context-Aware Processing ▪️ Techniques like title hierarchy and graph-based representations preserve contextual understanding while improving search accuracy and relevance. ⚙️ Domain Adaptation and Fine-tuning ▪️ Adapting pre-trained models to specific domains and fine-tuning them on relevant data can significantly improve performance on specialized tasks. 📊 Dynamic Context Window Management ▪️ Exploring techniques to adjust context windows based on document structure and content can help capture relevant information that would otherwise be cut off. 📊 Repurposing Classic Evaluation Metrics ▪️ Jo Kristian Bergum demonstrated the effectiveness of repurposing classic metrics like precision at k and recall to evaluate search system performance. RAG is the future, but that doesn't mean we should forget tried-and-tested techniques that have been honed over decades 🔄. Combining approaches allows you to build a system that leverages the strengths of both for superior results 🎯📈.
No more previous content

No more next content

Daniel Svonava

Vector Compute @ Superlinked | xYouTube

RAG is the future, but that doesn't mean we should forget tried and tested techniques. Expert systems and knowledge infras wrestled current RAG challenges for decades. Let's see why a hybrid approach could open up new opportunities 📚💡. RAG's challenges aren’t new ⚠️: 1️⃣ Data Ingestion ▪️ Splitting documents into smaller chunks can lead to a loss of context, affecting system's performance. ▪️ Data structure and format significantly impact tokenization and the quality of generated output. 2️⃣ Querying ▪️ User behaviors often deviate from even the most meticulous system designs. ▪️ Imagine users inputting unstructured keywords instead of a clear question. Or using pronouns like "it" or "that" without clear antecedents. 3️⃣ Data Context Challenges ▪️ LLMs' limited context windows force document splitting, often disrupting inherent context and relationships. ▪️ Training on predominantly short web pages' datasets like Common Crawl creates a mismatch when applied to lengthy, real-world documents. ▪️ Poor segmentation or unusual structures can skew tokenization, leading to more generation errors. 4️⃣ Retrieval Metric Issues ▪️ Traditional binary relevance metrics aren't well-suited for evaluating embedding models' differences in similarity scores. ▪️ Embedding models trained on general-purpose datasets often underperform on specialized content. ▪️ The concept of "similarity" is subjective and can differ between users and embedding models. These challenges require a more flexible approach to RAG system design. Here are some key considerations: 🔍 Hybrid Indexing ▪️ Combining keyword-based search with embedding-based retrieval can leverage the strengths of both. ▪️ Pierre successfully implemented a hybrid strategy that led to 90% of relevant resources appearing in the top ten search results. 📈 Context-Aware Processing ▪️ Techniques like title hierarchy and graph-based representations preserve contextual understanding while improving search accuracy and relevance. ⚙️ Domain Adaptation and Fine-tuning ▪️ Adapting pre-trained models to specific domains and fine-tuning them on relevant data can significantly improve performance on specialized tasks. 📊 Dynamic Context Window Management ▪️ Exploring techniques to adjust context windows based on document structure and content can help capture relevant information that would otherwise be cut off. 📊 Repurposing Classic Evaluation Metrics ▪️ Jo Kristian Bergum demonstrated the effectiveness of repurposing classic metrics like precision at k and recall to evaluate search system performance. RAG is the future, but that doesn't mean we should forget tried-and-tested techniques that have been honed over decades 🔄. Combining approaches allows you to build a system that leverages the strengths of both for superior results 🎯📈.

19 Comments

Like Comment
19 Comments
Like Comment
Jaimin Shah

Machine Learning Engineer @ Laboratory for Laser Energetics | Building Fine-Tuned LLMs and RAG Chatbot

6,011 followers 7mo
Report this post
The Harsh Reality of Building a Production-Ready RAG Pipeline Building an AI chatbot with a RAG pipeline sounds simple—just watch a few YouTube tutorials, throw in an off-the-shelf LLM API, and boom, you have your own AI assistant. But anyone who has ventured beyond the tutorials knows that a real-world, production-level RAG pipeline is a completely different beast. It’s almost a month into my journey at LLE, where I’ve been working on developing an in-house RAG pipeline using foundational models—not just for efficiency but also to prevent data breaches and ensure enterprise-grade robustness. And let me tell you, the challenges are far from what the simplified tutorials portray. A Few Hard-Hitting Lessons I’ve Learned: ✅ Chunking is not just splitting text You can use pymupdf to extract chunks, but it fails when you need adaptive chunking—especially for scientific documents where preserving tables, equations, and formatting is critical. This is where Visual Transformer models that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language comes into play. ✅ Query Refinement is Everything A chatbot is only as good as the data it retrieves. Rewriting follow-up queries effectively is key to ensuring the LLM understands intent correctly. Precision in query structuring directly impacts retrieval efficiency and model response quality. ✅ Optimizing Retrieval = Speed + Relevance It's not just about retrieving data faster; it’s about retrieving the right data. Reducing chunks improves retrieval efficiency, but that’s not enough—multi-tiered storage strategies ensure queries target the right system for lightning-fast and relevant responses. These are just a few of the many challenges that separate a toy RAG implementation from a real-world, scalable, and secure pipeline. The deeper I dive, the clearer it becomes: production-ready AI isn’t just about making things work, it’s about making them work at scale, securely, and efficiently. Would love to hear from others working in this space—what are some of the biggest roadblocks you’ve faced while building a RAG pipeline? 🚀

44 Comments
Like Comment

Challenges in Retriever Augmented Generation Systems

More in Retrieval Augmented Generation Guide

Explore categories