Large language models (LLMs) are typically optimized to answer peoples’ questions. But there is a trend toward models also being optimized to fit into agentic workflows. This will give a huge boost to agentic performance! Following ChatGPT’s breakaway success at answering questions, a lot of LLM development focused on providing a good consumer experience. So LLMs were tuned to answer questions (“Why did Shakespeare write Macbeth?”) or follow human-provided instructions (“Explain why Shakespeare wrote Macbeth”). A large fraction of the datasets for instruction tuning guide models to provide more helpful responses to human-written questions and instructions of the sort one might ask a consumer-facing LLM like those offered by the web interfaces of ChatGPT, Claude, or Gemini. But agentic workloads call on different behaviors. Rather than directly generating responses for consumers, AI software may use a model in part of an iterative workflow to reflect on its own output, use tools, write plans, and collaborate in a multi-agent setting. Major model makers are increasingly optimizing models to be used in AI agents as well. Take tool use (or function calling). If an LLM is asked about the current weather, it won’t be able to derive the information needed from its training data. Instead, it might generate a request for an API call to get that information. Even before GPT-4 natively supported function calls, application developers were already using LLMs to generate function calls, but by writing more complex prompts (such as variations of ReAct prompts) that tell the LLM what functions are available and then have the LLM generate a string that a separate software routine parses (perhaps with regular expressions) to figure out if it wants to call a function. Generating such calls became much more reliable after GPT-4 and then many other models natively supported function calling. Today, LLMs can decide to call functions to search for information for retrieval augmented generation (RAG), execute code, send emails, place orders online, and much more. Recently, Anthropic released a version of its model that is capable of computer use, using mouse-clicks and keystrokes to operate a computer (usually a virtual machine). I’ve enjoyed playing with the demo. While other teams have been prompting LLMs to use computers to build a new generation of RPA (robotic process automation) applications, native support for computer use by a major LLM provider is a great step forward. This will help many developers! [Reached length limit; full text: https://lnkd.in/gHmiM3Tx ]
Large Language Models Insights
Explore top LinkedIn content from expert professionals.
-
-
For the last couple of years, Large Language Models (LLMs) have dominated AI, driving advancements in text generation, search, and automation. But 2025 marks a shift—one that moves beyond token-based predictions to a deeper, more structured understanding of language. Meta’s Large Concept Models (LCMs), launched in December 2024, redefine AI’s ability to reason, generate, and interact by focusing on concepts rather than individual words. Unlike LLMs, which rely on token-by-token generation, LCMs operate at a higher abstraction level, processing entire sentences and ideas as unified concepts. This shift enables AI to grasp deeper meaning, maintain coherence over longer contexts, and produce more structured outputs. Attached is a fantastic graphic created by Manthan Patel How LCMs Work: 🔹 Conceptual Processing – Instead of breaking sentences into discrete words, LCMs encode entire ideas, allowing for higher-level reasoning and contextual depth. 🔹 SONAR Embeddings – A breakthrough in representation learning, SONAR embeddings capture the essence of a sentence rather than just its words, making AI more context-aware and language-agnostic. 🔹 Diffusion Techniques – Borrowing from the success of generative diffusion models, LCMs stabilize text generation, reducing hallucinations and improving reliability. 🔹 Quantization Methods – By refining how AI processes variations in input, LCMs improve robustness and minimize errors from small perturbations in phrasing. 🔹 Multimodal Integration – Unlike traditional LLMs that primarily process text, LCMs seamlessly integrate text, speech, and other data types, enabling more intuitive, cross-lingual AI interactions. Why LCMs Are a Paradigm Shift: ✔️ Deeper Understanding: LCMs go beyond word prediction to grasp the underlying intent and meaning behind a sentence. ✔️ More Structured Outputs: Instead of just generating fluent text, LCMs organize thoughts logically, making them more useful for technical documentation, legal analysis, and complex reports. ✔️ Improved Reasoning & Coherence: LLMs often lose track of long-range dependencies in text. LCMs, by processing entire ideas, maintain context better across long conversations and documents. ✔️ Cross-Domain Applications: From research and enterprise AI to multilingual customer interactions, LCMs unlock new possibilities where traditional LLMs struggle. LCMs vs. LLMs: The Key Differences 🔹 LLMs predict text at the token level, often leading to word-by-word optimizations rather than holistic comprehension. 🔹 LCMs process entire concepts, allowing for abstract reasoning and structured thought representation. 🔹 LLMs may struggle with context loss in long texts, while LCMs excel in maintaining coherence across extended interactions. 🔹 LCMs are more resistant to adversarial input variations, making them more reliable in critical applications like legal tech, enterprise AI, and scientific research.
-
My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling. And finally, we also load openly available pretrained weights into our scratch-built model architecture. Along with this pretraining tutorial, I also have bonus material on speeding up the LLM training. These apply not just to LLMs but also to other transformer-based models like vision transformers: 1. Instead of saving the causal mask, this creates the causal mask on the fly to reduce memory usage (here it has minimal effect, but it can add up in long-context size models like Llama 3.2 with 131k-input-tokens support) 2. Use tensor cores (only works for Ampere GPUs like A100 and newer) 3. Use the fused CUDA kernels for `AdamW` by setting 4. Pre-allocate and re-use GPU memory via the pinned memory setting in the data loader 5. Switch from 32-bit float to 16-bit brain float (bfloat16) precision 6. Replace from-scratch implementations of attention mechanisms, layer normalizations, and activation functions with PyTorch counterparts that have optimized CUDA kernels 7. Use FlashAttention for more efficient memory read and write operations 8. Compile the model 9. Optimize the vocabulary size 10. After saving memory with the steps above, increase the batch size Video tutorial: https://lnkd.in/gDRycWea PyTorch speed-ups: https://lnkd.in/gChvGCJH
-
Andrej Karpathy has released one of the most comprehensive guides on LLMs In just 3.5 hours, he dives deep into the architecture, training, and applications of LLMs. Here’s what makes this video a must-watch: 1. Evolution of Language Models Karpathy traces the journey from simple statistical methods to advanced neural networks like Transformers. He explains how these models are trained on vast datasets, enabling them to generate human-like text and perform tasks like translation and code generation. 2. Inner Workings Unveiled A significant part of the video breaks down complex concepts such as attention mechanisms, tokenization, and large-scale data in model training. Karpathy also addresses common challenges like model bias and ethical considerations, emphasizing the importance of fine-tuning models for specific applications. 3. Practical Applications Karpathy highlights how LLMs are transforming various industries, including healthcare, finance, and entertainment. He provides examples of how these models improve services, enhance user experiences, and drive innovation. 4. Clear Explanations Karpathy’s ability to simplify complex topics makes this video accessible to both newcomers and seasoned professionals. His thorough analysis offers valuable insights into the future of artificial intelligence. For those looking to deepen their understanding of LLMs, this video is an invaluable resource. Watch the full video to learn from one of the leading experts in the field: https://lnkd.in/dswuqDhm
-
💡 RAG and fine-tuning are often viewed as mutually exclusive choices, but a combined approach can often benefit many applications! For instance, this paper introduces a fine-tuning method using a dataset that focuses on numerical key-value retrieval tasks. The results show that fine-tuning large language models on this dataset significantly improves their ability to find information and make decisions in longer contexts. Details: 👉 The paper proposes a novel approach of fine-tuning LLMs using a synthetic dataset designed for numerical key-value retrieval tasks. This dataset aims to address the limitations of LLMs in handling long-context tasks effectively. 👉 Fine-tuning LLMs on the synthetic dataset, including models like GPT-3.5 Turbo and Mistral 7B, significantly enhances their information retrieval and reasoning capabilities in longer-context settings. Results: 👉Analysis shows a notable transfer of skills from synthetic to real task evaluations. For instance, GPT-3.5 Turbo demonstrates a 10.5% improvement on MDQA at position 10. 👉 Fine-tuned models maintain stable performance on general benchmarks like MMLU and HellaSwag, indicating minimal degradation in overall model capabilities. 👉 In contrast to fine-tuning on other baseline long-context augmentation data, which may induce hallucinations and performance drops (e.g., on TriviaQA), the synthetic dataset shows either no degradation or minimal performance impact. A point to note is that the synthetic dataset used in this study does not include factual information, reducing the risk of hallucinations found in previous research. This makes it a safer option for improving LLMs' abilities in retrieval and reasoning. Link: https://lnkd.in/eyJ3B2SP
-
All new models come with a larger Context Window...but do you know what it is? Here's my quick guide: The Definition - Context window = amount of text an AI model can process at once - Larger windows allow AI to handle more information simultaneously - For instance, if the context window is 1024 tokens, the model can utilize up to 1024 tokens of prior text to understand and generate a response. Why It Matters - Enhanced Understanding: Larger context windows allow the model to retain more information from the ongoing conversation or document, leading to more coherent and contextual responses. - Complex Tasks: With a bigger context, models can tackle more complex tasks like long-form document analysis, multi-turn conversations, or summarizing lengthy articles without losing track. Reduced Fragmentation: A larger context window reduces the need to break down input into smaller chunks, leading to more natural and uninterrupted interactions. What to Expect - More Insightful Outputs: As AI models continue to evolve, expect richer and more insightful outputs, especially in applications like content generation, chatbots, and customer support. - Increased Productivity: Businesses leveraging these models can achieve higher productivity by allowing AI to handle more sophisticated tasks with less human intervention. Alternative to Large Context Windows: 1. Chunking: Breaks large text into smaller chunks, processing them independently. - Pros: Memory efficient, scalable. - Cons: Risk of losing context, complex to stitch results together. 2. RAG: Retrieves relevant information from external sources during generation. - Pros: Accesses vast knowledge, improves accuracy, works with smaller context windows. - Cons: Complex to set up, potential latency, depends on data quality. Things to Be Careful With: - Context Loss: Whether chunking or using RAG, losing the overall context is a risk. Ensuring that each chunk or retrieved information is relevant and seamlessly integrated is crucial. - Latency: Larger context windows and RAG systems can increase processing time, affecting real-time applications like chatbots or live interactions. - Memory and Computational Overhead: Larger context windows demand more memory and computational power, which can be a limitation for some systems. - Complexity of Implementation: Both alternatives, especially RAG, require a more complex setup, including retrieval systems and databases. This can increase the cost and time needed for development. - Data Relevance: In RAG, the quality of the generated output is highly dependent on the relevance and accuracy of the retrieved data. Ensuring the retrieval system is well-tuned, and the knowledge base is up-to-date is essential. Choose the right approach based on your specific use case!
-
AI is no longer just about smarter models, it’s about building entire ecosystems of intelligence. This year we’ve seeing a wave of new ideas that go beyond simple automation. We have autonomous agents that can reason and work together, as well as AI governance frameworks that ensure trust and accountability. These concepts are laying the groundwork for how AI will be developed, used, and integrated into our daily lives. This year is less about asking “what can AI do?” and more about “how do we shape AI responsibly, collaboratively, and at scale?” Here’s a closer look at the most important trends : 🔹 Agentic AI & Multi-Agent Collaboration, AI agents now work together, coordinate tasks, and act with autonomy. 🔹 Protocols & Frameworks (A2A, MCP, LLMOps), these are standards for agent communication, universal context-sharing, and operations frameworks for managing large language models. 🔹 Generative & Research Agents, these self-directed agents create, code, and even conduct research, acting as AI scientists. 🔹 Memory & Tool-Using Agents, persistent memory provides long-term context, while tool-using models can call APIs and external functions on demand. 🔹 Advanced Orchestration, this involves coordinating multiple agents, retrieval 2.0 pipelines, and autonomous coding agents that build software without human help. 🔹 Governance & Responsible AI, AI governance frameworks ensure ethics, compliance, and explainability stay important as adoption increases. 🔹 Next-Gen AI Capabilities, these include goal-driven reasoning, multi-modal LLMs, emotional context AI, and real-time adaptive systems that learn continuously. 🔹 Infrastructure & Ecosystems, featuring AI-native clouds, simulation training, synthetic data ecosystems, and self-updating knowledge graphs. 🔹 AI in Action, applications range from robotics and swarm intelligence to personalized AI companions, negotiators, and compliance engines, making possibilities endless. This is the year when AI shifts from tools to ecosystems, forming a network of intelligent, autonomous, and adaptive systems. Wonder what’s coming next. #GenAI
-
Two more Large Language Models were just released. And they change what we can do with them in a big way! First, the unfortunate reality: Most open-source models only support working with around 1,500 words at a time, which limits the number of actual applications you can solve using them. In technical terms, we call this the "context window" of a model and measure it in "tokens," where one token corresponds to roughly 0.75 words. The context window defines how many words the model can consider when generating a response. For example, you can't ask a model to process a 5,000-word PDF document if your context window isn't that large. You might have noticed ChatGPT tends to forget things after a few prompts. This happens because you've exceeded the number of words it can keep in context. For reference, here is the context window of the three most popular models in the market right now: • Claude 2: 100,000 tokens (75,000 words) • GPT-4: 8,000 tokens (~4,000 words) • GPT-3.5: 4,000 tokens (~3,000 words) (A version of GPT-4 supports 32,000 tokens, but OpenAI hasn't released it to everyone yet.) A typical application most companies want to solve is to use these models to process their knowledge base. Unfortunately, you can't do much if you can only use a few words at a time. That's one of the reasons people aren't embracing open-source alternatives to ChatGPT and Claude. Fortunately, we now have something new in the open-source world: Yesterday, the @abacusai team released two versions of Giraffe, an open-source model based on Llama. One version supports 3,000 words, and the other 12,000 words! They are not only open-sourcing the models but the evaluation datasets and performance experiments. Here is the Git repository: https://lnkd.in/drrU2ZhP. You can use the model directly from HuggingFace. Here is the 4k (~3,000 words) version: https://lnkd.in/dKYJTjzu. And here is the 16k (~12,000 words) version: https://lnkd.in/dwtz7TU2. You can also read more about Giraffe at this link: https://lnkd.in/deEKJkbk. The team is working on a Llama 2 variant of Giraffe and plans to release the weights for that one as well. Can't wait!
-
LLM pro tip to reduce hallucinations and improve performance: instruct the language model to ask clarifying questions in your prompt. Add a directive like "If any part of the question/task is unclear or lacks sufficient context, ask clarifying questions before providing an answer" to your system prompt. This will: (1) Reduce ambiguity - forcing the model to acknowledge knowledge gaps rather than filling them with hallucinations (2) Improve accuracy - enabling the model to gather necessary details before committing to an answer (3) Enhance interaction - creating a more natural, iterative conversation flow similar to human exchanges This approach was validated in the 2023 CALM paper, which showed that selectively asking clarifying questions for ambiguous inputs increased question-answering accuracy without negatively affecting responses to unambiguous queries https://lnkd.in/gnAhZ5zM
-
AI models are reasoning, creating, and evolving. The evidence is no longer theoretical; it's peer-reviewed, measurable, and, in some domains, superhuman. In the last 18 months, we’ve seen LLMs move far beyond next-token prediction. They’re beginning to demonstrate real reasoning, hypothesis generation, long-horizon planning, and even scientific creativity. Here are six breakthroughs that redefine what these models can do: Superhuman Clinical Reasoning (Nature Medicine, 2025) In a rigorous test across 12 specialties, GPT-4 scored 89% on the NEJM Knowledge+ medical reasoning exam, outperforming the average physician score of 74%. This wasn’t just Q&A; it involved multi-hop reasoning, risk evaluation, and treatment planning. That’s structured decision-making in high-stakes domains. Creative Research Ideation (Zhou et al., 2024 – arXiv:2412.10849) Across 10 fields from physics to economics, GPT-4 and Claude generated research questions rated more creative than human-generated ones in 53% of cases. This wasn’t trivia; domain experts blindly compared ideas from AI and researchers. In over half the cases, the AI won. Falsifiable Hypotheses from Raw Data (Nemati et al., 2024) GPT-4o was fed raw experimental tables from biology and materials science and asked to propose novel hypotheses. 46% of them were judged publishable by experts, outperforming PhD students (29%) on the same task. That’s not pattern matching, that’s creative scientific reasoning from scratch. Self-Evolving Agents (2024) LLM agents that reflect, revise memory, and re-prompt themselves improved their performance on coding benchmarks from 21% → 34% in just four self-corrective cycles, without retraining. This is meta-cognition in action: learning from failure, iterating, and adapting over time. Long-Term Agent Memory (A-MEM, 2025) Agents equipped with dynamic long-term memory (inspired by Zettelkasten) achieved 2× higher success on complex web tasks, planning across multiple steps with context continuity. Emergent Social Reasoning (AgentSociety, 2025) In a simulation of 1,000 LLM-driven agents, researchers observed emergent social behaviors: rumor spreading, collaborative planning, and even economic trade. No hardcoding. Just distributed reasoning, goal propagation, and learning-by-interaction. These findings span healthcare, science, software engineering, and multi-agent simulations. They reveal systems that generate, reason, and coordinate, not just predict. So when some argue that “AI is only simulating thought,” we should ask: Are the tests capturing how real reasoning happens? The Tower of Hanoi isn’t where science, medicine, or innovation happens. The real test is: 1. Can a model make a novel discovery? 2. Can it self-correct across steps? 3. Can it outperform domain experts in structured judgment? And increasingly, the answer is: yes. Let’s not confuse symbolic puzzles with intelligence. Reasoning is already here, and it’s evolving.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development