Inexpensive token generation and agentic workflows for LLMs open up new possibilities for training LLMs on synthetic data. Pretraining an LLM on its own directly generated responses to prompts doesn't help. But if an agentic workflow implemented with the LLM results in higher quality output than the LLM can generate directly, then training on that output becomes potentially useful. Just as humans can learn from their own thinking, perhaps LLMs can, too. Imagine a math student learning to write mathematical proofs. By solving a few problems — even without external input — they can reflect on what works and learn to generate better proofs. LLM training involves (i) pretraining (learning from unlabeled text data to predict the next work) followed by (ii) instruction fine-tuning (learning to follow instructions) and (iii) RLHF/DPO to align to human values. Step (i) requires orders of magnitude more data than the others. For example, Llama 3 was pretrained on over 15 trillion tokens. LLM developers are still hungry for more data. Where can we get more text to train on? Many developers train smaller models on the output of larger models, so a smaller model learns to mimic a larger model’s behavior on a particular task. But an LLM can’t learn much by training on data it generated directly. Indeed, training a model repeatedly on the output of an earlier version of itself can result in model collapse. But, an LLM wrapped in an agentic workflow can produce higher-quality output than it can generate directly. This output might be useful as pretraining data. Efforts like these have precedents: - When using reinforcement learning to play a game like chess, a model might learn a function that evaluates board positions. If we apply game tree search along with a low-accuracy evaluation function, the model can come up with more accurate evaluations. Then we can train that evaluation function to mimic these more accurate values. - During alignment, Anthropic’s constitutional AI uses RLAIF (RL from AI Feedback) to judge LLM output quality, substituting feedback generated by an AI model for human feedback. A significant barrier to using agentic workflows to produce LLM training data is the cost of generating tokens. Say we want to generate 1 trillion tokens to extend a pre-existing dataset. At current retail prices, 1 trillion tokens from GPT-4-turbo ($30 per million output tokens), Claude 3 Opus ($75), Gemini 1.5 Pro ($21), and Llama-3-70B on Groq ($0.79) would cost, respectively, $30M, $75M, $21M and $790K. Of course, an agentic workflow would require generating more than one token per final output token. But budgets for training cutting-edge LLMs easily surpass $100M, so spending a few million dollars more for data to boost performance is feasible. That’s why agentic workflows might opening up new opportunities for high-quality synthetic data generation. [Original text: https://lnkd.in/gFF2AsZ9 ]
Developing Training for New Technologies
Explore top LinkedIn content from expert professionals.
-
-
Are you tired of LLMs providing you generic answers with little or sometimes no contextual alignment and thus poor domain adoption? University of California, Berkeley researchers have pioneered a novel training method, RAFT (Retrieval Augmented Fine-Tuning), which bolsters a language model’s ability to respond to domain-specific queries using an “open-book” technique. This prompts us to ponder - Do language models truly comprehend and infer from provided documents, or do they simply memorize and echo information? The Challenge of Domain-Specific Question Answering - The task of tailoring large language models (LLMs) to respond to queries in specialized areas, such as biomedical research or API documentation, is an expanding yet demanding endeavor. Conventional techniques encompass:- 👉Retrieval-augmented generation (RAG): Supplying pertinent documents to the model during inference. 👉Supervised fine-tuning on domain-specific data. 📍Nonetheless, RAG in isolation doesn’t fully leverage the potential for in-domain learning, while standard fine-tuning doesn’t coach the model to effectively utilize retrieved documents. 💫The Fusion of the Best Approaches. RAFT overcomes these shortcomings by fine-tuning the model to respond to queries using a blend of relevant & irrelevant documents. Its key attributes include:- 👩🏽💻Training on a mix of question-document pairs, some with the “oracle” document that holds the answer and some with only “distractor” documents. Generating responses in a chain-of-thought style that cites the pertinent sections of the reference documents. ✨Remarkable Outcomes Across Diverse Domains - The researchers tested RAFT on multiple question-answering datasets, covering Wikipedia articles, biomedical papers, and API documentation. In all these specialized domains, RAFT consistently surpassed both standard fine-tuning and RAG benchmarks. 🌟Significantly, RAFT achieved substantial improvements of up to 35% on the HotpotQA Wikipedia dataset and 76% on the Torch Hub API documentation dataset compared to the base RAG model. These outcomes validate RAFT’s capacity to genuinely comprehend and infer from domain-specific documents. ⚡️Way forward towards Efficient Domain Adaptation - RAFT represents a thrilling progression towards more proficient and effective customization of language models to specialized domains. By learning to selectively read and cite pertinent information from domain-specific documents, RAFT lays the groundwork for compact, dedicated models that can compete with much larger generic language models on niche question-answering tasks. As the need for deploying LLMs to domain-specific applications continues to surge, methods like RAFT is likely to be vital for facilitating practical, cost-efficient solutions! Kudos to Tianjun Zhang Shishir Patil @Naman Jain Sheng Shen Matei Zaharia Ion Stoica Joseph E. Gonzalez for this amazing work! #llm #ai #aiadoption #genai
-
Researchers from Oxford University just achieved a 14% performance boost in mathematical reasoning by making LLMs work together like specialists in a company. In their new MALT (Multi-Agent LLM Training) paper, they introduced a novel approach where three specialized LLMs - a generator, verifier, and refinement model - collaborate to solve complex problems, similar to how a programmer, tester, and supervisor work together. The breakthrough lies in their training method: (1) Tree-based exploration - generating thousands of reasoning trajectories by having models interact (2) Credit attribution - identifying which model is responsible for successes or failures (3) Specialized training - using both correct and incorrect examples to train each model for its specific role Using this approach on 8B parameter models, MALT achieved relative improvements of 14% on the MATH dataset, 9% on CommonsenseQA, and 7% on GSM8K. This represents a significant step toward more efficient and capable AI systems, showing that well-coordinated smaller models can match the performance of much larger ones. Paper https://lnkd.in/g6ag9rP4 — Join thousands of world-class researchers and engineers from Google, Stanford, OpenAI, and Meta staying ahead on AI http://aitidbits.ai
-
For years, fine-tuning LLMs has required large amounts of data and human oversight. Small improvements can disrupt existing systems, requiring humans to go through and flag errors in order to fit the model to pre-existing workflows. This might work for smaller use cases, but it is clearly unsustainable at scale. However, recent research suggests that everything may be about to change. I have been particularly excited about two papers from Anthropic and Massachusetts Institute of Technology, which propose new methods that enable LLMs to reflect on their own outputs and refine performance without waiting for humans. Instead of passively waiting for correction, these models create an internal feedback loop, learning from their own reasoning in a way that could match, or even exceed, traditional supervised training in certain tasks. If these approaches mature, they could fundamentally reshape enterprise AI adoption. From chatbots that continually adjust their tone to better serve customers to research assistants that independently refine complex analyses, the potential applications are vast. In today’s AI Atlas, I explore how these breakthroughs work, where they could make the most immediate impact, and what limitations we still need to overcome.
-
The Neglected Symbiosis Why Military Technology and Tactics Must Evolve Together The recent surge in defence spending across the UK and Europe has predominantly focused on acquiring cutting-edge technology - advanced weapons systems, sophisticated software, and next-generation platforms. Yet a critical oversight threatens to undermine this massive investment: the parallel development of Tactics, Techniques, and Procedures (TTPs) has been largely neglected. This disconnect creates a dangerous paradigm where technology, rather than operational need, begins to dictate the character of warfare. History has repeatedly shown that technology alone cannot win conflicts - it must be integrated within a coherent and adaptive operational framework. ➡️ Technology Without Tactical Evolution: A Recipe for Failure When examining historical precedents, we see this pattern repeating. The French military's investment in the Maginot Line without adapting their mobile defence doctrine, the US military's initial struggles in Vietnam despite technological superiority, and more recently, the challenges faced in asymmetric conflicts despite overwhelming technological advantages - all demonstrate that hardware without corresponding tactical innovation leads to suboptimal outcomes. ➡️ The Symbiotic Relationship Military effectiveness emerges from the symbiosis between technology and tactics. New capabilities demand new methods of employment, while tactical innovations often drive technological requirements. This relationship must be cultivated deliberately, not left to chance. Consider the revolution in drone warfare. The platforms themselves provide capabilities, but their transformative impact stems from how they're integrated into operations - from reconnaissance to targeting to swarming tactics. Without corresponding TTPs, these technological assets deliver only a fraction of their potential value. ➡️ The Way Forward Defence ministries and military commands must institute formal mechanisms for parallel development: ⚡️ Involve operators in technology acquisition decisions from the outset ⚡️Allocate specific funding for TTP development alongside procurement ⚡️Create rapid experimentation units to explore new tactical applications ⚡️Incorporate realistic technology integration challenges in training exercises ⚡️Develop feedback loops between equipment developers and field units The current imbalance in funding and attention between technology and tactics creates not just inefficiency but genuine strategic vulnerability. Our adversaries study these gaps and will exploit them. As defence spending continues to increase, we must ensure we're not just buying better tools but developing better ways to use them. The character of future warfare will be determined not by who has the most advanced technology, but by who most effectively integrates that technology into their operational art. Richard Gwilliam Benjamin Moody Ches Clark MA (Hons)
-
One of the biggest barriers to deploying LLM-based agents in real workflows is their poor performance on long-horizon reasoning. Agents often generate coherent short responses but struggle when a task requires planning, tool use, or multi-step decision-making. The issue is not just accuracy at the end, but the inability to reason through the middle. Without knowing which intermediate steps helped or hurt, agents cannot learn to improve. This makes long-horizon reasoning one of the hardest and most unsolved problems for LLM generalization. It is relatively easy for a model to retrieve a document, answer a factual question, or summarize a short email. It is much harder to solve a billing dispute that requires searching, interpreting policy rules, applying edge cases, and adjusting the recommendation based on prior steps. Today’s agents can generate answers, but they often fail to reflect, backtrack, or reconsider earlier assumptions. A new paper from Google DeepMind and Stanford addresses this gap with a method called SWiRL: Step-Wise Reinforcement Learning. Rather than training a model to get the final answer right, SWiRL trains the model to improve each step in a reasoning chain. It does this by generating synthetic multi-step problem-solving traces, scoring every individual step using a reward model (Gemini 1.5 Pro), and fine-tuning the base model to favor higher-quality intermediate steps. This approach fundamentally changes the way we train reasoning agents. Instead of optimizing for final outcomes, the model is updated based on how good each reasoning step was in context. For example, if the model generates a search query or a math step that is useful, even if the final answer is wrong, that step is rewarded and reinforced. Over time, the agent learns not just to answer, but to reason more reliably. This is a major departure from standard RLHF, which only gives feedback at the end. SWiRL improves performance by 9.2 percent on HotPotQA, 16.9 percent on GSM8K when trained on HotPotQA, and 11 to 15 percent on other multi-hop and math datasets like MuSiQue, BeerQA, and CofCA. It generalizes across domains, works without golden labels, and outperforms both supervised fine-tuning and single-step RL methods. The implications are substantial: we can now train models to reason better by scoring and optimizing their intermediate steps. Better reward models, iterative reflection, tool-assisted reasoning, and trajectory-level training will lead to more robust performance in multi-step tasks. This is not about mere performance improvement. It shows how we can begin to train agents not to mimic outputs, but to improve the quality of their thought process. That’s essential if we want to build agents that work through problems, adapt to new tasks, and operate autonomously in open-ended environments.
-
Fine-tuning for making expert, domain-specific models? Not so fast! I often get asked whether companies should fine-tune LLMs to internalize the knowledge required for their particular use case or domain. The answer I give is probably not…. There is research suggesting that large language models struggle to acquire new factual knowledge through fine-tuning. Novel knowledge is learned more slowly than knowledge consistent with what the model already knows. This same research also showed that when knowledge is eventually learned from novel examples, there is a linear increase in the model's tendency to hallucinate. Ouch! So what can you do? What should you do? RAG is one approach, but that comes with complexity and its own challenges: RAG pipelines are more complex, with larger storage costs, higher memory and compute requirements (due to longer contexts demanded by the additional context) and higher latency, due to the need to query an external index. In the long term, storing knowledge natively in the model's parameters may also provide generalization advantages, as the model can relate different pieces of knowledge in its parameters. This is particularly apparent for complex or indirect queries, where simple retrieval augmentation may fall short. A very exciting recent paper from Meta introduced a new approach called Active Reading. This approach leverages synthetic data to have LLMs generate a range of diverse training data based on a closed body of knowledge. By having the LLMs read and restructure the data in many and varied ways and training on that enlarged, restructured corpus, you can significantly improve the model's retention of the contained facts. Active Reading applies the same principles observed in human studying, allowing the model itself to propose multiple study strategies — e.g., paraphrasing, knowledge linking, active recall, etc. — and instantiates these different strategies on a document-by-document basis. This process results in a highly diverse and contextually grounded signal which can then be trained on. The authors demonstrate huge gains vs. vanilla fine-tuning: +313% and +160% (relative improvement over vanilla fine-tuning) on SimpleQA and FinanceBench respectively. They also trained a SOTA 8B model for factual QA, demonstrating the utility of the technique at pre-training scale (1T tokens). It should be noted that the Active Reading paper focuses on knowledge acquisition; that traditional fine tuning can still be useful for instilling style, format, reasoning patterns, or other behaviors. Learning Facts at Scale with Active Reading https://lnkd.in/e7FCAq-3 Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? https://lnkd.in/e_REAVZB
-
💡 New RAG research from Nvidia shows that fine-tuning a single LLM for both ranking and generation can significantly improve performance, even outperforming LLMs fine-tuned with 10× more ranking data. 👉 RAG pipelines typically utilize the top-k contexts from a retriever which is generally a separate ranking model. In this paper, the authors propose a instruction fine-tuning framework, RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. 👉 Turns out that the instruction-tuned LLMs work surprisingly well by adding a small fraction of ranking data into the training blend and outperform existing expert ranking models, including the same LLM exclusively fine-tuned on a large amount of ranking data. 📊 Empirical Results: 👉 Llama3-RankRAG outperforms strong baselines, including GPT-4 and ChatQA-1.5, on various knowledge-intensive benchmarks. 👉Demonstrates excellent generalization to new domains, performing comparably to GPT-4 on biomedical RAG benchmarks without domain-specific fine-tuning. 👉 Emphasizes the mutual enhancement of context ranking and answer generation abilities within a single LLM, improving overall RAG performance. Link: https://lnkd.in/esCqhxj2
-
🌐 The Race for the U.S. Army’s Next XR Headset: Rivet vs. Anduril + Meta The U.S. Army is on the brink of a major technology shift, exploring cutting-edge XR (extended reality) headsets for soldiers in the field. Two key competitors have emerged: Rivet and the Anduril + Meta team. 🏁 Current Status Rivet holds a ~$195M contract to develop prototypes under the Soldier Borne Mission Command (SBMC) program. The solution focuses on military-first hardware, integration with Palantir intelligence tools, and advanced optics. Anduril + Meta are developing a joint solution leveraging Meta’s AR/VR hardware expertise and Anduril’s defensive systems integration experience. They are also part of the prototype testing phase. The Army plans to test hundreds of units in realistic operational conditions before deciding on large-scale deployment. ⚖️ Factors That Will Decide the Winner Field performance: battery life, durability, visual quality, latency. Reliability & robustness: resistance to dust, humidity, and shocks. Data security & architecture: compliance with defense-grade standards. Interoperability: with drones, sensors, and command-and-control systems. Cost & production scale: total cost of ownership, supply chain, maintenance. Regulatory compliance: export restrictions, government approvals. 🔍 Key Strengths Rivet: Hardware designed for military tasks from the ground up, strong optics, integrated intelligence tools. Anduril + Meta: Advanced AI integration, large-scale production capabilities, proven defensive systems expertise. 💡 Conclusion At this stage, there is no clear winner. The Army may select one solution, use both for different roles, or request further iterations. The final decision will depend on real-world testing, cost-efficiency, and operational reliability, not just brand recognition. This race is a clear example of how AI, XR, and defense innovation intersect, and how careful prototyping ensures soldiers receive reliable, safe, and effective technology. ❓ What do you think? Should defense procurement favor “military-first” hardware or leverage tech giants’ AI and scalability? 🔖 Tags #XR #AR #VR #DefenseTech #AI #MilitaryInnovation #USArmy #Rivet #Anduril #Meta #ExtendedReality #Technology #DefenseInnovation
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning