The biggest lesson I learned after building my own LLM-powered product... 👉 LLMs hallucinate, and you are responsible for what happens next. Let me tell you exactly how... If you're building anything customer-facing with GenAI, it's not enough to get good outputs most of the time. These are 4 things I implementing early on: 1. Structured output checks 🧱 I used regex and simple schema validation to catch when the LLM went off-script, especially for things like JSON outputs or bullet lists that needed to feed into the UI. 2. Fallback logic 🔁 If the model failed validation or returned something unusable, I defaulted to templated messages or prompts with tighter constraints. Even a basic retry with a more constrained prompt can go a long way. 3. Guardrails 🛡️ I didn’t build full-on moderation pipelines, but I did include intent checks and topic restrictions to avoid unsupported questions or off-topic use cases. It helped keep the product focused and safer. 4. Input sanitization 🧼 User inputs were cleaned and constrained before going into prompts. You’d be surprised how much hallucination you can reduce just by being more deliberate about what context you inject. It's not just about designing good prompts and letting the LLM do the rest, (especially when the stakes are high) It’s about building systems that expect failure and recover gracefully. If you are curious to know more about what tool I built, it's called Applio.ai -- Image by Amazon (AWS Blogs) #AIEngineering #AI #GenAI #DataScience
LLM Error Handling and Validation Techniques
Explore top LinkedIn content from expert professionals.
Summary
LLM error handling and validation techniques refer to methods used to catch, correct, and prevent mistakes made by large language models (LLMs), especially when these systems generate outputs for customer-facing or data-driven applications. These approaches make LLMs more reliable by checking outputs, grounding responses in verified data, and creating ways for the model to learn from its errors.
- Set clear rules: Define exactly what valid outputs look like and use structured checks or validation to catch mistakes early.
- Ground in real data: Connect your LLM to trusted sources and use retrieval-augmented generation to keep responses accurate and relevant.
- Create feedback loops: Give models context about previous errors so they can adapt and improve outputs instead of repeating the same mistakes.
-
-
Tired of your LLM just repeating the same mistakes when retries fail? Simple retry strategies often just multiply costs without improving reliability when models fail in consistent ways. You've built validation for structured LLM outputs, but when validation fails and you retry the exact same prompt, you're essentially asking the model to guess differently. Without feedback about what went wrong, you're wasting compute and adding latency while hoping for random success. A smarter approach feeds errors back to the model, creating a self-correcting loop. Effective AI Engineering #13: Error Reinsertion for Smarter LLM Retries 👇 The Problem ❌ Many developers implement basic retry mechanisms that blindly repeat the same prompt after a failure: [Code example - see attached image] Why this approach falls short: - Wasteful Compute: Repeatedly sending the same prompt when validation fails just multiplies costs without improving chances of success. - Same Mistakes: LLMs tend to be consistent - if they misunderstand your requirements the first time, they'll likely make the same errors on retry. - Longer Latency: Users wait through multiple failed attempts with no adaptation strategy.Beyond Blind Repetition: Making Your LLM Retries Smarter with Error Feedback. - No Learning Loop: The model never receives feedback about what went wrong, missing the opportunity to improve. The Solution: Error Reinsertion for Adaptive Retries ✅ A better approach is to reinsert error information into subsequent retry attempts, giving the model context to improve its response: [Code example - see attached image] Why this approach works better: - Adaptive Learning: The model receives feedback about specific validation failures, allowing it to correct its mistakes. - Higher Success Rate: By feeding error context back to the model, retry attempts become increasingly likely to succeed. - Resource Efficiency: Instead of hoping for random variation, each retry has a higher probability of success, reducing overall attempt count. - Improved User Experience: Faster resolution of errors means less waiting for valid responses. The Takeaway Stop treating LLM retries as mere repetition and implement error reinsertion to create a feedback loop. By telling the model exactly what went wrong, you create a self-correcting system that improves with each attempt. This approach makes your AI applications more reliable while reducing unnecessary compute and latency.
-
LLMs are great for data processing, but using new techniques doesn't mean you get to abandon old best practices. The precision and accuracy of LLMs still need to be monitored and maintained, just like with any other AI model. Tips for maintaining accuracy and precision with LLMs: • Define within your team EXACTLY what the desired output looks like. Any area of ambiguity should be resolved with a concrete answer. Even if the business "doesn't care," you should define a behavior. Letting the LLM make these decisions for you leads to high variance/low precision models that are difficult to monitor. • Understand that the most gorgeously-written, seemingly clear and concise prompts can still produce trash. LLMs are not people and don't follow directions like people do. You have to test your prompts over and over and over, no matter how good they look. • Make small prompt changes and carefully monitor each change. Changes should be version tracked and vetted by other developers. • A small change in one part of the prompt can cause seemingly-unrelated regressions (again, LLMs are not people). Regression tests are essential for EVERY change. Organize a list of test case inputs, including those that demonstrate previously-fixed bugs and test your prompt against them. • Test cases should include "controls" where the prompt has historically performed well. Any change to the control output should be studied and any incorrect change is a test failure. • Regression tests should have a single documented bug and clearly-defined success/failure metrics. "If the output contains A, then pass. If output contains B, then fail." This makes it easy to quickly mark regression tests as pass/fail (ideally, automating this process). If a different failure/bug is noted, then it should still be fixed, but separately, and pulled out into a separate test. Any other tips for working with LLMs and data processing?
-
How can we further increase the accuracy of LLM-powered question answering systems? Ontologies to the rescue! That is the conclusion of the latest research coming from the data.world AI Lab with Dean Allemang. Based on our previous Knowledge Graph LLM Accuracy benchmark research, our intuition is that accuracy can be further increased by 1) leveraging the ontology of the knowledge graph to check for errors in the generated queries and 2) using the LLM to repair incorrect queries. We ask ourselves the following two research questions 1️⃣ To what extent can the accuracy increase by leveraging the ontology of a knowledge graph to detect errors of a SPARQL query and an LLM to repair the errors? 2️⃣ What types of errors are most commonly presented in SPARQL queries generated by an LLM? 🧪 Our hypothesis: An ontology can increase the accuracy of an LLM powered question answering system that answers a natural language question over a knowledge graph. 📏 Our approach consists of - Ontology-based Query Check (OBQC): checks deterministically if the query is valid by applying rules based on the semantics of the ontology. The rules check the the body of the query (i.e. WHERE clause) and the head of query (i.e. the SELECT clause). If a check does not pass, it returns an explanation. - LLM Repair: repair the SPARQL query generated by the LLM. It takes as input the incorrect query and the explanation and sends a zero-shot prompt to the LLM. The result is a new query which can then be passed back to the OBQC. 🏅Results: Using our chat with the data benchmark and GPT-4 - Our OBQC and LLM Repair approach increased the accuracy to 72.55%. If the repairs were not successful after three iterations, an unknown result was returned, which occurred 8% of the time. Thus the final error rate is 19.44%. “I don’t know” is a valid answer which reduces the error rate. - Low complex questions on low complex schemas achieves an error rate of 10.46%, which is now arguably at levels deemed to be acceptable by users. - All questions on high complex schemas substantially increased the accuracy. - 70% of the repairs where done by rules checking the body of the query. The majority were rules related to the domain of a property. Putting this all together with our previous work, LLM Question Answering accuracy that leverages Knowledge Graphs and Ontologies is over 4x the SQL accuracy! These results support the main conclusion of our research: investment in metadata, semantics, ontologies and Knowledge Graph are preconditions to achieve higher accuracy for LLM powered question answering systems. Link to paper in comments. We are honored that we get to work with strategic customers to push the barrier of the data catalog and knowledge graph industry, and the data.world product. We are proud that our research results are a core part of the the data.world AI Context Engine. Thanks for all the valuable feedback we have received from colleagues across industry and academia
-
Are your LLM apps still hallucinating? Zep used to as well—a lot. Here’s how we worked to solve Zep's hallucinations. We've spent a lot of cycles diving into why LLMs hallucinate and experimenting with the most effective techniques to prevent it. Some might sound familiar, but it's the combined approach that really moves the needle. First, why do hallucinations happen? A few core reasons: 🔍 LLMs rely on statistical patterns, not true understanding. 🎲 Responses are based on probabilities, not verified facts. 🤔 No innate ability to differentiate truth from plausible fiction. 📚 Training datasets often include biases, outdated info, or errors. Put simply: LLMs predict the next likely word—they don’t actually "understand" or verify what's accurate. When prompted beyond their knowledge, they creatively fill gaps with plausible (but incorrect) info. ⚠️ Funny if you’re casually chatting—problematic if you're building enterprise apps. So, how do you reduce hallucinations effectively? The #1 technique: grounding the LLM in data. - Use Retrieval-Augmented Generation (RAG) to anchor responses in verified data. - Use long-term memory systems like Zep to ensure the model is always grounded in personalization data: user context, preferences, traits etc - Fine-tune models on domain-specific datasets to improve response consistency and style, although fine-tuning alone typically doesn't add substantial new factual knowledge. - Explicit, clear prompting—avoid ambiguity or unnecessary complexity. - Encourage models to self-verify conclusions when accuracy is essential. - Structure complex tasks with chain-of-thought prompting (COT) to improve outputs or force "none"/unknown responses when necessary. - Strategically tweak model parameters (e.g., temperature, top-p) to limit overly creative outputs. - Post-processing verification for mission-critical outputs, for example, matching to known business states. One technique alone rarely solves hallucinations. For maximum ROI, we've found combining RAG with a robust long-term memory solution (like ours at Zep) is the sweet spot. Systems that ground responses in factual, evolving knowledge significantly outperform. Did I miss any good techniques? What are you doing in your apps?
-
LLM hallucinations present a major roadblock to GenAI adoption (here’s how to manage them) Hallucinations occur when LLMs return a response that is incorrect, inappropriate, or just way off. LLMs are designed to always respond, even when they don’t have the correct answer. When they can’t find the right answer, they’ll just make something up. This is different from past AI and computer systems we’ve dealt with, and it is something new for businesses to accept and manage as they look to deploy LLM-powered services and products. We are early in the risk management process for LLMs, but some tactics are starting to emerge: 1 -- Guardrails: Implementing filters for inputs and outputs to catch inappropriate or sensitive content is a common practice to mitigate risks associated with LLM outputs. 2 -- Context Grounding: Retrieval-Augmented Generation (RAG) is a popular method that involves searching a corpus of relevant data to provide context, thereby reducing the likelihood of hallucinations. (See my RAG explainer video in comments) 3 -- Fine-Tuning: Training LLMs on specific datasets can help align their outputs with desired outcomes, although this process can be resource-intensive. 4 -- Incorporating a Knowledge Graph: Using structured data to inform LLMs can improve their ability to reason about relationships and facts, reducing the chance of hallucinations. That said, none of these measures are foolproof. This is one of the challenges of working with LLMs—reframing our expectations of AI systems to always anticipate some level of hallucination. The appropriate framing here is that we need to manage the risk effectively by implementing tactics like the ones mentioned above. In addition to the above tactics, longer testing cycles and robust monitoring mechanisms for when these LLMs are in production can help spot and address issues as they arise. Just as human intelligence is prone to mistakes, LLMs will hallucinate. However, by putting in place good tactics, we can minimize this risk as much as possible.
-
LLMs are just stateless functions. If you want something dependable, it’s all about how you engineer the wrapper. In this issue, I share some lessons and patterns I’ve found (or learned from others) that help LLM-based agents go from flaky to functional: --- Why the “magic” is in the loop, not the model --- How to think about tool use, error handling, and context windows --- The value of owning your control flow --- Why smaller, focused agents usually win ... I also included one of our open-sourced projects: GenAI Agents Infrastructure - a lightweight setup we’ve been using internally to run LLM agents. You’ll find the GitHub link inside, give it a try and let me know how it goes! ___________ Welcome to the Learn AI Together newsletter — 90% buzzword-free and focused on learning materials & news in AI. Let’s grow together! Alex Wang
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development