Performance Optimization Techniques

Explore top LinkedIn content from expert professionals.

Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

40,981 followers 2y
Report this post
In the last three months alone, over ten papers outlining novel prompting techniques were published, boosting LLMs’ performance by a substantial margin. Two weeks ago, a groundbreaking paper from Microsoft demonstrated how a well-prompted GPT-4 outperforms Google’s Med-PaLM 2, a specialized medical model, solely through sophisticated prompting techniques. Yet, while our X and LinkedIn feeds buzz with ‘secret prompting tips’, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results. https://lnkd.in/g7_6eP6y In this AI Tidbits Deep Dive, I outline six of the best and recent prompting methods: (1) EmotionPrompt - inspired by human psychology, this method utilizes emotional stimuli in prompts to gain performance enhancements (2) Optimization by PROmpting (OPRO) - a DeepMind innovation that refines prompts automatically, surpassing human-crafted ones. This paper discovered the “Take a deep breath” instruction that improved LLMs’ performance by 9%. (3) Chain-of-Verification (CoVe) - Meta's novel four-step prompting process that drastically reduces hallucinations and improves factual accuracy (4) System 2 Attention (S2A) - also from Meta, a prompting method that filters out irrelevant details prior to querying the LLM (5) Step-Back Prompting - encouraging LLMs to abstract queries for enhanced reasoning (6) Rephrase and Respond (RaR) - UCLA's method that lets LLMs rephrase queries for better comprehension and response accuracy Understanding the spectrum of available prompting strategies and how to apply them in your app can mean the difference between a production-ready app and a nascent project with untapped potential. Full blog post https://lnkd.in/g7_6eP6y
No more previous content

No more next content

Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

In the last three months alone, over ten papers outlining novel prompting techniques were published, boosting LLMs’ performance by a substantial margin. Two weeks ago, a groundbreaking paper from Microsoft demonstrated how a well-prompted GPT-4 outperforms Google’s Med-PaLM 2, a specialized medical model, solely through sophisticated prompting techniques. Yet, while our X and LinkedIn feeds buzz with ‘secret prompting tips’, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results. https://lnkd.in/g7_6eP6y In this AI Tidbits Deep Dive, I outline six of the best and recent prompting methods: (1) EmotionPrompt - inspired by human psychology, this method utilizes emotional stimuli in prompts to gain performance enhancements (2) Optimization by PROmpting (OPRO) - a DeepMind innovation that refines prompts automatically, surpassing human-crafted ones. This paper discovered the “Take a deep breath” instruction that improved LLMs’ performance by 9%. (3) Chain-of-Verification (CoVe) - Meta's novel four-step prompting process that drastically reduces hallucinations and improves factual accuracy (4) System 2 Attention (S2A) - also from Meta, a prompting method that filters out irrelevant details prior to querying the LLM (5) Step-Back Prompting - encouraging LLMs to abstract queries for enhanced reasoning (6) Rephrase and Respond (RaR) - UCLA's method that lets LLMs rephrase queries for better comprehension and response accuracy Understanding the spectrum of available prompting strategies and how to apply them in your app can mean the difference between a production-ready app and a nascent project with untapped potential. Full blog post https://lnkd.in/g7_6eP6y

31 Comments

Like Comment
31 Comments
Like Comment
Luke Yun

building AI computer fixer | AI Researcher @ Harvard Medical School, Oxford

32,839 followers 8mo
Report this post
Stanford researchers just introduced a new way to optimize AI models using text-based feedback instead of traditional backpropagation! Deep learning has long relied on numerical gradients to fine-tune neural networks. But, optimizing generative AI systems has been much harder because they interact using natural language, not numbers. 𝗧𝗲𝘅𝘁𝗚𝗿𝗮𝗱 𝗶𝘀 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 𝘁𝗼 𝗯𝗮𝗰𝗸𝗽𝗿𝗼𝗽𝗮𝗴𝗮𝘁𝗲 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹 𝗳𝗲𝗲𝗱𝗯𝗮𝗰𝗸, 𝗲𝗻𝗮𝗯𝗹𝗶𝗻𝗴 𝗔𝗜 𝘁𝗼 𝗶𝘁𝗲𝗿𝗮𝘁𝗶𝘃𝗲𝗹𝘆 𝗿𝗲𝗳𝗶𝗻𝗲 𝗶𝘁𝘀 𝗼𝘂𝘁𝗽𝘂𝘁𝘀 𝗮𝗰𝗿𝗼𝘀𝘀 𝗱𝗶𝘃𝗲𝗿𝘀𝗲 𝘁𝗮𝘀𝗸𝘀. 1. Improved AI performance in PhD-level science Q&A, raising accuracy from 51.0% to 55.0% on GPQA and from 91.2% to 95.1% on MMLU physics. 2. Optimized medical treatment plans, outperforming human-designed radiotherapy plans by better balancing tumor targeting and organ protection. 3. Enhanced AI-driven drug discovery by iteratively refining molecular structures, generating high-affinity compounds faster than traditional methods. 4. Boosted complex AI agents like Chameleon, increasing multimodal reasoning accuracy by 7.7% through iterative feedback refinement. 𝗧𝗵𝗲 𝘂𝘀𝗲 𝗼𝗳 "𝘁𝗲𝘅𝘁𝘂𝗮𝗹 𝗴𝗿𝗮𝗱𝗶𝗲𝗻𝘁𝘀" 𝗶𝗻𝘀𝘁𝗲𝗮𝗱 𝗼𝗳 𝗻𝘂𝗺𝗲𝗿𝗶𝗰𝗮𝗹 𝗴𝗿𝗮𝗱𝗶𝗲𝗻𝘁𝘀 𝗶𝘀 𝗽𝗿𝗲𝘁𝘁𝘆 𝗱𝗮𝗿𝗻 𝗰𝗼𝗼𝗹. It treats LLM feedback as “textual gradients” which are collected from every use of a variable in the system. By aggregating critiques from different contexts and iteratively updating variables (using a process analogous to numerical gradient descent), the method smooths out individual inconsistencies. 𝗜'𝗺 𝗰𝘂𝗿𝗶𝗼𝘂𝘀 𝗮𝗯𝗼𝘂𝘁 𝗵𝗼𝘄 𝗳𝗼𝗿𝗺𝗮𝗹𝗶𝘇𝗶𝗻𝗴 𝗺𝗲𝘁𝗵𝗼𝗱𝘀 𝘁𝗼 𝘃𝗮𝗹𝗶𝗱𝗮𝘁𝗲 𝗮𝗻𝗱 𝗰𝗼𝗻𝘀𝘁𝗿𝗮𝗶𝗻 𝘁𝗲𝘅𝘁𝘂𝗮𝗹 𝗴𝗿𝗮𝗱𝗶𝗲𝗻𝘁𝘀 beyond formalization of the propagation and update process via the equations could be developed to enhance robustness. Perhaps training secondary models to evaluate the quality and consistency of textual gradients or an ensemble approach of generating multiple textual gradients using different LLMs or multiple prompts? Just throwing some ideas out there; this stuff is pretty cool. Here's the awesome work: https://lnkd.in/gX8ABsdM Congrats to Mert Yuksekgonul, Federico Bianchi, Joseph Boen, James Zou, and co! I post my takes on the latest developments in health AI – 𝗰𝗼𝗻𝗻𝗲𝗰𝘁 𝘄𝗶𝘁𝗵 𝗺𝗲 𝘁𝗼 𝘀𝘁𝗮𝘆 𝘂𝗽𝗱𝗮𝘁𝗲𝗱! Also, check out my health AI blog here: https://lnkd.in/g3nrQFxW
No more previous content

No more next content

Luke Yun

building AI computer fixer | AI Researcher @ Harvard Medical School, Oxford

Stanford researchers just introduced a new way to optimize AI models using text-based feedback instead of traditional backpropagation! Deep learning has long relied on numerical gradients to fine-tune neural networks. But, optimizing generative AI systems has been much harder because they interact using natural language, not numbers. 𝗧𝗲𝘅𝘁𝗚𝗿𝗮𝗱 𝗶𝘀 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 𝘁𝗼 𝗯𝗮𝗰𝗸𝗽𝗿𝗼𝗽𝗮𝗴𝗮𝘁𝗲 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹 𝗳𝗲𝗲𝗱𝗯𝗮𝗰𝗸, 𝗲𝗻𝗮𝗯𝗹𝗶𝗻𝗴 𝗔𝗜 𝘁𝗼 𝗶𝘁𝗲𝗿𝗮𝘁𝗶𝘃𝗲𝗹𝘆 𝗿𝗲𝗳𝗶𝗻𝗲 𝗶𝘁𝘀 𝗼𝘂𝘁𝗽𝘂𝘁𝘀 𝗮𝗰𝗿𝗼𝘀𝘀 𝗱𝗶𝘃𝗲𝗿𝘀𝗲 𝘁𝗮𝘀𝗸𝘀. 1. Improved AI performance in PhD-level science Q&A, raising accuracy from 51.0% to 55.0% on GPQA and from 91.2% to 95.1% on MMLU physics. 2. Optimized medical treatment plans, outperforming human-designed radiotherapy plans by better balancing tumor targeting and organ protection. 3. Enhanced AI-driven drug discovery by iteratively refining molecular structures, generating high-affinity compounds faster than traditional methods. 4. Boosted complex AI agents like Chameleon, increasing multimodal reasoning accuracy by 7.7% through iterative feedback refinement. 𝗧𝗵𝗲 𝘂𝘀𝗲 𝗼𝗳 "𝘁𝗲𝘅𝘁𝘂𝗮𝗹 𝗴𝗿𝗮𝗱𝗶𝗲𝗻𝘁𝘀" 𝗶𝗻𝘀𝘁𝗲𝗮𝗱 𝗼𝗳 𝗻𝘂𝗺𝗲𝗿𝗶𝗰𝗮𝗹 𝗴𝗿𝗮𝗱𝗶𝗲𝗻𝘁𝘀 𝗶𝘀 𝗽𝗿𝗲𝘁𝘁𝘆 𝗱𝗮𝗿𝗻 𝗰𝗼𝗼𝗹. It treats LLM feedback as “textual gradients” which are collected from every use of a variable in the system. By aggregating critiques from different contexts and iteratively updating variables (using a process analogous to numerical gradient descent), the method smooths out individual inconsistencies. 𝗜'𝗺 𝗰𝘂𝗿𝗶𝗼𝘂𝘀 𝗮𝗯𝗼𝘂𝘁 𝗵𝗼𝘄 𝗳𝗼𝗿𝗺𝗮𝗹𝗶𝘇𝗶𝗻𝗴 𝗺𝗲𝘁𝗵𝗼𝗱𝘀 𝘁𝗼 𝘃𝗮𝗹𝗶𝗱𝗮𝘁𝗲 𝗮𝗻𝗱 𝗰𝗼𝗻𝘀𝘁𝗿𝗮𝗶𝗻 𝘁𝗲𝘅𝘁𝘂𝗮𝗹 𝗴𝗿𝗮𝗱𝗶𝗲𝗻𝘁𝘀 beyond formalization of the propagation and update process via the equations could be developed to enhance robustness. Perhaps training secondary models to evaluate the quality and consistency of textual gradients or an ensemble approach of generating multiple textual gradients using different LLMs or multiple prompts? Just throwing some ideas out there; this stuff is pretty cool. Here's the awesome work: https://lnkd.in/gX8ABsdM Congrats to Mert Yuksekgonul, Federico Bianchi, Joseph Boen, James Zou, and co! I post my takes on the latest developments in health AI – 𝗰𝗼𝗻𝗻𝗲𝗰𝘁 𝘄𝗶𝘁𝗵 𝗺𝗲 𝘁𝗼 𝘀𝘁𝗮𝘆 𝘂𝗽𝗱𝗮𝘁𝗲𝗱! Also, check out my health AI blog here: https://lnkd.in/g3nrQFxW

96 Comments

Like Comment
96 Comments
Like Comment
Matt Wood Matt Wood is an Influencer

CTIO, PwC

75,593 followers 2y
Report this post
LLM field notes: Where multiple models are stronger than the sum of their parts, an AI diaspora is emerging as a strategic strength... Combining the strengths of different LLMs in a thoughtful, combined architecture can enable capabilities beyond what any individual model can achieve alone, and gives more flexibility today (when new models are arriving virtually every day), and in the long term. Let's dive in. 🌳 By combining multiple, specialized LLMs, the overall system is greater than the sum of its parts. More advanced functions can emerge from the combination and orchestration of customized models. 🌻 Mixing and matching different LLMs allows creating solutions tailored to specific goals. The optimal ensemble can be designed for each use case; ready access to multiple models will make it easier to adopt and adapt to new use cases more quickly. 🍄 With multiple redundant models, the system is not reliant on any one component. Failure of one LLM can be compensated for by others. 🌴 Different models have varying computational demands. A combined diasporic system makes it easier to allocate resources strategically, and find the right price/performance balance per use case. 🌵 As better models emerge, the diaspora can be updated by swapping out components without needing to retrain from scratch. This is going to be the new normal for the next few years as whole new models arrive. 🎋 Accelerated development - Building on existing LLMs as modular components speeds up the development process vs monolithic architectures. 🫛 Model diversity - Having an ecosystem of models creates more opportunities for innovation from many sources, not just a single provider. 🌟 Perhaps the biggest benefit is scale - of operation and capability. Each model can focus on its specific capability rather than trying to do everything. This plays to the models' strengths. Models don't get bogged down trying to perform tasks outside their specialty. This avoids inefficient use of compute resources. The workload can be divided across models based on their capabilities and capacity for parallel processing. Takes a bit to build this way (plan and execute on multiple models, orchestration, model management, evaluation, etc), but that upfront cost will pay off time and again, for every incremental capability you are able to add quickly. Plan accordingly. #genai #ai #aws #artificialintelligence
No more previous content

No more next content

Matt Wood Matt Wood is an Influencer

CTIO, PwC

LLM field notes: Where multiple models are stronger than the sum of their parts, an AI diaspora is emerging as a strategic strength... Combining the strengths of different LLMs in a thoughtful, combined architecture can enable capabilities beyond what any individual model can achieve alone, and gives more flexibility today (when new models are arriving virtually every day), and in the long term. Let's dive in. 🌳 By combining multiple, specialized LLMs, the overall system is greater than the sum of its parts. More advanced functions can emerge from the combination and orchestration of customized models. 🌻 Mixing and matching different LLMs allows creating solutions tailored to specific goals. The optimal ensemble can be designed for each use case; ready access to multiple models will make it easier to adopt and adapt to new use cases more quickly. 🍄 With multiple redundant models, the system is not reliant on any one component. Failure of one LLM can be compensated for by others. 🌴 Different models have varying computational demands. A combined diasporic system makes it easier to allocate resources strategically, and find the right price/performance balance per use case. 🌵 As better models emerge, the diaspora can be updated by swapping out components without needing to retrain from scratch. This is going to be the new normal for the next few years as whole new models arrive. 🎋 Accelerated development - Building on existing LLMs as modular components speeds up the development process vs monolithic architectures. 🫛 Model diversity - Having an ecosystem of models creates more opportunities for innovation from many sources, not just a single provider. 🌟 Perhaps the biggest benefit is scale - of operation and capability. Each model can focus on its specific capability rather than trying to do everything. This plays to the models' strengths. Models don't get bogged down trying to perform tasks outside their specialty. This avoids inefficient use of compute resources. The workload can be divided across models based on their capabilities and capacity for parallel processing. Takes a bit to build this way (plan and execute on multiple models, orchestration, model management, evaluation, etc), but that upfront cost will pay off time and again, for every incremental capability you are able to add quickly. Plan accordingly. #genai #ai #aws #artificialintelligence

17 Comments

Like Comment
17 Comments
Like Comment
Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer

599,076 followers 5mo
Report this post
Most people still think of LLMs as “just a model.” But if you’ve ever shipped one in production, you know it’s not that simple. Behind every performant LLM system, there’s a stack of decisions, about pretraining, fine-tuning, inference, evaluation, and application-specific tradeoffs. This diagram captures it well: LLMs aren’t one-dimensional. They’re systems. And each dimension introduces new failure points or optimization levers. Let’s break it down: 🧠 Pre-Training Start with modality. → Text-only models like LLaMA, UL2, PaLM have predictable inductive biases. → Multimodal ones like GPT-4, Gemini, and LaVIN introduce more complex token fusion, grounding challenges, and cross-modal alignment issues. Understanding the data diet matters just as much as parameter count. 🛠 Fine-Tuning This is where most teams underestimate complexity: → PEFT strategies like LoRA and Prefix Tuning help with parameter efficiency, but can behave differently under distribution shift. → Alignment techniques- RLHF, DPO, RAFT, aren’t interchangeable. They encode different human preference priors. → Quantization and pruning decisions will directly impact latency, memory usage, and downstream behavior. ⚡️ Efficiency Inference optimization is still underexplored. Techniques like dynamic prompt caching, paged attention, speculative decoding, and batch streaming make the difference between real-time and unusable. The infra layer is where GenAI products often break. 📏 Evaluation One benchmark doesn’t cut it. You need a full matrix: → NLG (summarization, completion), NLU (classification, reasoning), → alignment tests (honesty, helpfulness, safety), → dataset quality, and → cost breakdowns across training + inference + memory. Evaluation isn’t just a model task, it’s a systems-level concern. 🧾 Inference & Prompting Multi-turn prompts, CoT, ToT, ICL, all behave differently under different sampling strategies and context lengths. Prompting isn’t trivial anymore. It’s an orchestration layer in itself. Whether you’re building for legal, education, robotics, or finance, the “general-purpose” tag doesn’t hold. Every domain has its own retrieval, grounding, and reasoning constraints. ------- Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg
No more previous content

No more next content

Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer

Most people still think of LLMs as “just a model.” But if you’ve ever shipped one in production, you know it’s not that simple. Behind every performant LLM system, there’s a stack of decisions, about pretraining, fine-tuning, inference, evaluation, and application-specific tradeoffs. This diagram captures it well: LLMs aren’t one-dimensional. They’re systems. And each dimension introduces new failure points or optimization levers. Let’s break it down: 🧠 Pre-Training Start with modality. → Text-only models like LLaMA, UL2, PaLM have predictable inductive biases. → Multimodal ones like GPT-4, Gemini, and LaVIN introduce more complex token fusion, grounding challenges, and cross-modal alignment issues. Understanding the data diet matters just as much as parameter count. 🛠 Fine-Tuning This is where most teams underestimate complexity: → PEFT strategies like LoRA and Prefix Tuning help with parameter efficiency, but can behave differently under distribution shift. → Alignment techniques- RLHF, DPO, RAFT, aren’t interchangeable. They encode different human preference priors. → Quantization and pruning decisions will directly impact latency, memory usage, and downstream behavior. ⚡️ Efficiency Inference optimization is still underexplored. Techniques like dynamic prompt caching, paged attention, speculative decoding, and batch streaming make the difference between real-time and unusable. The infra layer is where GenAI products often break. 📏 Evaluation One benchmark doesn’t cut it. You need a full matrix: → NLG (summarization, completion), NLU (classification, reasoning), → alignment tests (honesty, helpfulness, safety), → dataset quality, and → cost breakdowns across training + inference + memory. Evaluation isn’t just a model task, it’s a systems-level concern. 🧾 Inference & Prompting Multi-turn prompts, CoT, ToT, ICL, all behave differently under different sampling strategies and context lengths. Prompting isn’t trivial anymore. It’s an orchestration layer in itself. Whether you’re building for legal, education, robotics, or finance, the “general-purpose” tag doesn’t hold. Every domain has its own retrieval, grounding, and reasoning constraints. ------- Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg

73 Comments

Like Comment
73 Comments
Like Comment
Brij kishore Pandey Brij kishore Pandey is an Influencer

AI Architect | AI Engineer | Generative AI | Agentic AI

693,434 followers 8mo
Report this post
LLMs are no longer just fancy autocomplete engines. We’re seeing a clear shift—from single-shot prompting to techniques that mimic 𝗮𝗴𝗲𝗻𝗰𝘆: reasoning, retrieving, taking action, and even coordinating across steps. In this visual, I’ve laid out five core prompting strategies: - 𝗥𝗔𝗚 – Brings in external knowledge, enhancing factual accuracy - 𝗥𝗲𝗔𝗰𝘁 – Enables reasoning 𝗮𝗻𝗱 acting, the essence of agentic behavior - 𝗗𝗦𝗣 – Adds directional hints through policy models - 𝗧𝗼𝗧 (𝗧𝗿𝗲𝗲-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁) – Simulates branching reasoning paths, like a mini debate inside the LLM - 𝗖𝗼𝗧 (𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁) – Breaks down complex thinking into step-by-step logic While not all of these are fully agentic on their own, techniques like 𝗥𝗲𝗔𝗰𝘁 and 𝗧𝗼𝗧 are clear stepping stones to 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 — where autonomous agents can 𝗿𝗲𝗮𝘀𝗼𝗻, 𝗽𝗹𝗮𝗻, 𝗮𝗻𝗱 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁 𝘄𝗶𝘁𝗵 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀. The big picture? We’re slowly moving from "𝘱𝘳𝘰𝘮𝘱𝘵 𝘦𝘯𝘨𝘪𝘯𝘦𝘦𝘳𝘪𝘯𝘨" to "𝘤𝘰𝘨𝘯𝘪𝘵𝘪𝘷𝘦 𝘢𝘳𝘤𝘩𝘪𝘵𝘦𝘤𝘵𝘶𝘳𝘦 𝘥𝘦𝘴𝘪𝘨𝘯." And that’s where the real innovation lies.
No more previous content

No more next content

Brij kishore Pandey Brij kishore Pandey is an Influencer

AI Architect | AI Engineer | Generative AI | Agentic AI

LLMs are no longer just fancy autocomplete engines. We’re seeing a clear shift—from single-shot prompting to techniques that mimic 𝗮𝗴𝗲𝗻𝗰𝘆: reasoning, retrieving, taking action, and even coordinating across steps. In this visual, I’ve laid out five core prompting strategies: - 𝗥𝗔𝗚 – Brings in external knowledge, enhancing factual accuracy - 𝗥𝗲𝗔𝗰𝘁 – Enables reasoning 𝗮𝗻𝗱 acting, the essence of agentic behavior - 𝗗𝗦𝗣 – Adds directional hints through policy models - 𝗧𝗼𝗧 (𝗧𝗿𝗲𝗲-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁) – Simulates branching reasoning paths, like a mini debate inside the LLM - 𝗖𝗼𝗧 (𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁) – Breaks down complex thinking into step-by-step logic While not all of these are fully agentic on their own, techniques like 𝗥𝗲𝗔𝗰𝘁 and 𝗧𝗼𝗧 are clear stepping stones to 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 — where autonomous agents can 𝗿𝗲𝗮𝘀𝗼𝗻, 𝗽𝗹𝗮𝗻, 𝗮𝗻𝗱 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁 𝘄𝗶𝘁𝗵 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀. The big picture? We’re slowly moving from "𝘱𝘳𝘰𝘮𝘱𝘵 𝘦𝘯𝘨𝘪𝘯𝘦𝘦𝘳𝘪𝘯𝘨" to "𝘤𝘰𝘨𝘯𝘪𝘵𝘪𝘷𝘦 𝘢𝘳𝘤𝘩𝘪𝘵𝘦𝘤𝘵𝘶𝘳𝘦 𝘥𝘦𝘴𝘪𝘨𝘯." And that’s where the real innovation lies.

27 Comments

Like Comment
27 Comments
Like Comment
Ross Dawson Ross Dawson is an Influencer

Futurist | Board advisor | Global keynote speaker | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice | Founder: AHT Group - Informivity - Bondi Innovation

34,046 followers 1y
Report this post
Prompt formatting can have a dramatic impact on LLM performance, but it varies substantially across models. Some pragmatic findings from a recent research paper: 💡 Prompt Format Significantly Affects LLM Performance. Different prompt formats (plain text, Markdown, YAML, JSON) can result in performance variations of up to 40%, depending on the task and model. For instance, GPT-3.5-turbo showed a dramatic performance shift between Markdown and JSON in code translation tasks, while GPT-4 exhibited greater stability. This indicates the importance of testing and optimizing prompts for specific tasks and models. 🛠️ Tailor Formats to Task and Model. Prompt formats like JSON, Markdown, YAML, and plain text yield different performance outcomes across tasks. For instance, GPT-3.5-turbo performed 40% better in JSON for code tasks, while GPT-4 preferred Markdown for reasoning tasks. Test multiple formats early in your process to identify which structure maximizes results for your specific task and model. 📋 Keep Instructions and Context Explicit. Include clear task instructions, persona descriptions, and examples in your prompts. For example, specifying roles (“You are a Python coder”) and output style (“Respond in JSON”) improves model understanding. Consistency in how you frame the task across different formats minimizes confusion and enhances reliability. 📊 Choose Format Based on Data Complexity. For simple tasks, plain text or Markdown often suffices. For structured outputs like programming or translations, formats such as JSON or YAML may perform better. Align the prompt format with the complexity of the expected response to leverage the model’s capabilities fully. 🔄 Iterate and Validate Performance. Run tests with variations in prompt structure to measure impact. Tools like Coefficient of Mean Deviation (CMD) or Intersection-over-Union (IoU) can help quantify performance differences. Start with benchmarks like MMLU or HumanEval to validate consistency and accuracy before deploying at scale. 🚀 Leverage Larger Models for Stability. If working with sensitive tasks requiring consistent outputs, opt for larger models like GPT-4, which show better robustness to format changes. For instance, GPT-4 maintained higher performance consistency across benchmarks compared to GPT-3.5. Link to paper in comments.
No more previous content

No more next content

Ross Dawson Ross Dawson is an Influencer

Futurist | Board advisor | Global keynote speaker | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice | Founder: AHT Group - Informivity - Bondi Innovation

Prompt formatting can have a dramatic impact on LLM performance, but it varies substantially across models. Some pragmatic findings from a recent research paper: 💡 Prompt Format Significantly Affects LLM Performance. Different prompt formats (plain text, Markdown, YAML, JSON) can result in performance variations of up to 40%, depending on the task and model. For instance, GPT-3.5-turbo showed a dramatic performance shift between Markdown and JSON in code translation tasks, while GPT-4 exhibited greater stability. This indicates the importance of testing and optimizing prompts for specific tasks and models. 🛠️ Tailor Formats to Task and Model. Prompt formats like JSON, Markdown, YAML, and plain text yield different performance outcomes across tasks. For instance, GPT-3.5-turbo performed 40% better in JSON for code tasks, while GPT-4 preferred Markdown for reasoning tasks. Test multiple formats early in your process to identify which structure maximizes results for your specific task and model. 📋 Keep Instructions and Context Explicit. Include clear task instructions, persona descriptions, and examples in your prompts. For example, specifying roles (“You are a Python coder”) and output style (“Respond in JSON”) improves model understanding. Consistency in how you frame the task across different formats minimizes confusion and enhances reliability. 📊 Choose Format Based on Data Complexity. For simple tasks, plain text or Markdown often suffices. For structured outputs like programming or translations, formats such as JSON or YAML may perform better. Align the prompt format with the complexity of the expected response to leverage the model’s capabilities fully. 🔄 Iterate and Validate Performance. Run tests with variations in prompt structure to measure impact. Tools like Coefficient of Mean Deviation (CMD) or Intersection-over-Union (IoU) can help quantify performance differences. Start with benchmarks like MMLU or HumanEval to validate consistency and accuracy before deploying at scale. 🚀 Leverage Larger Models for Stability. If working with sensitive tasks requiring consistent outputs, opt for larger models like GPT-4, which show better robustness to format changes. For instance, GPT-4 maintained higher performance consistency across benchmarks compared to GPT-3.5. Link to paper in comments.

19 Comments

Like Comment
19 Comments
Like Comment
Borja Menéndez Moreno

PhD | Lead Operations Research Engineer at Trucksters

6,291 followers 9mo
Report this post
When I started working in #OperationsResearch, my focus was clear: find the most efficient algorithms, build the best models, get the (near) optimal solution 🎯. It was all about elegance in the math, computational efficiency, and squeezing out every last improvement from a solver. And I loved it. But then I realized something crucial. Stakeholders don’t care about those technicalities. In fact, they can be a barrier 🚧 to them using your optimization engine. They might not even use your solutions. 🔹 Not because the math is wrong. 🔹 Not because the algorithms are slow. 🔹 But because we need to focus on what truly matters to the business. ⚠️ #Optimization isn’t just about finding the best solution; it’s about solving the right problem. The best model in the world is useless if it doesn't move the needle on business outcomes. This shift in mindset changed the way I approach OR projects. I started thinking beyond the solver and focusing on IMPACT: ✅ What business problem are we solving? ✅ How does this optimization help decision-makers? ✅ What KPI does this solution improve? OR isn’t just math. It’s decision-making at scale.

18 Comments
Like Comment
Akshay Pachaar

Co-Founder DailyDoseOfDS | BITS Pilani | 3 Patents | X (187K+)

166,337 followers 2mo
Report this post
Did Stanford just kill LLM fine-tuning? . . This new paper from Stanford, called Agentic Context Engineering (ACE), proves something wild: you can make models smarter without changing a single weight. Here's how it works: Instead of retraining the model, ACE evolves the context itself. The model writes its own prompt, reflects on what worked and what didn't, then rewrites it. Over and over. It becomes a self-improving system. Think of it like the model keeping a living notebook where every failure becomes a lesson and every success becomes a rule. The results are impressive: - 10.6% better than GPT-4-powered agents on AppWorld - 8.6% improvement on financial reasoning tasks - 86.9% lower cost and latency No labeled data required. Just feedback loops. Here's the counterintuitive part: Everyone's chasing short, clean prompts. ACE does the opposite. It builds dense, evolving playbooks that compound over time. Turns out LLMs don't need simplicity. They need context density. The question here is how to manage all this information and experience. This is where building a real-time memory layer for Agents like Zep AI (YC W24) can be a great solution and active area of research going forward. What are your thoughts? I have linked the paper in the next tweet! ____ If you found it insightful, reshare with your network. Find me → Akshay Pachaar ✔️ For more insights and tutorials on LLMs, AI Agents, and Machine Learning!
No more previous content

No more next content

Akshay Pachaar

Co-Founder DailyDoseOfDS | BITS Pilani | 3 Patents | X (187K+)

Did Stanford just kill LLM fine-tuning? . . This new paper from Stanford, called Agentic Context Engineering (ACE), proves something wild: you can make models smarter without changing a single weight. Here's how it works: Instead of retraining the model, ACE evolves the context itself. The model writes its own prompt, reflects on what worked and what didn't, then rewrites it. Over and over. It becomes a self-improving system. Think of it like the model keeping a living notebook where every failure becomes a lesson and every success becomes a rule. The results are impressive: - 10.6% better than GPT-4-powered agents on AppWorld - 8.6% improvement on financial reasoning tasks - 86.9% lower cost and latency No labeled data required. Just feedback loops. Here's the counterintuitive part: Everyone's chasing short, clean prompts. ACE does the opposite. It builds dense, evolving playbooks that compound over time. Turns out LLMs don't need simplicity. They need context density. The question here is how to manage all this information and experience. This is where building a real-time memory layer for Agents like Zep AI (YC W24) can be a great solution and active area of research going forward. What are your thoughts? I have linked the paper in the next tweet! ____ If you found it insightful, reshare with your network. Find me → Akshay Pachaar ✔️ For more insights and tutorials on LLMs, AI Agents, and Machine Learning!

119 Comments

Like Comment
119 Comments
Like Comment
George Ukkuru George Ukkuru is an Influencer

Helping Companies Ship Quality Software Faster | Expert in Test Automation & Quality Engineering | Driving Agile, Scalable Software Testing Solutions

14,083 followers 1y
Report this post
In my journey as a tester, I've come across many frustrating instances where projects were delayed due to performance bottlenecks. It's a common challenge - we design and code meticulously, only to find at the UAT stage that the application needs rearchitecting! A few years ago I started shifting my Performance Testing to the left. It was a game-changer, transforming my testing approach and helping me overcome my challenges. The key learning I want to impart from my experience is the value in these practices: 1. Involve your developers right from the start. 2. Conduct unit tests on individual pieces of code for detailed validation. 3. Have your performance test engineers examine your APIs for performance from the get-go. 4. Focus on individual feature performance, it can help identify and fix issues early. 5. Start small with tests, gradually scaling up. These practices, when integrated into your daily DevOps pipeline, not only improve quality but also enhance user satisfaction. How do you manage performance bottlenecks in your projects? I'd love to hear about your experiences and strategies. #PerformanceTesting #SoftwareTesting #QualityAssurance

12 Comments
Like Comment
Ian Koniak Ian Koniak is an Influencer

I help tech sales AEs perform to their full potential in sales and life by mastering their mindset, habits, and selling skills | Sales Coach | Former #1 Enterprise AE at Salesforce | $100M+ in career sales

96,485 followers 6mo
Report this post
I used to think hustle was the key to high performance. Then I learned the real secret: REST is the most powerful RGA. Most sellers grind themselves into dust chasing performance. But I’ve coached 100s of top performers—and the highest earners don’t work more hours. They master their energy. Here’s how I worked 40 hours a week (never work nights or weekends) and still outperformed 99% of reps: Let’s flip the script on what it takes to be a top performer in sales. Everyone talks about RGAs—Revenue Generating Activities. But no one talks about the energy required to do RGAs well. If you want to prospect with intensity, sell with presence, and close big deals— You need rest. At a mastermind recently, someone called it the “Ultimate RGA”: Rest Generating Activities. Because without rest, RGAs fall apart. You’ll be foggy. Reactive. Distracted. You’ll confuse activity with impact. Here’s how I train reps to recharge intentionally—so they can win without burnout: 1. Plan 4 Vacations a Year I pre-block 4 weeks off annually. They’re non-negotiable. It doesn’t matter if it’s Hawaii or your local mountain trail— The key is knowing you are unavailable. Not half-working. Not checking Slack. Fully present. Fully off. 2. Track and Protect Your Sleep I use a WHOOP. You can use anything. But if you're not sleeping 7+ hours, consistently, you’re underperforming. You can’t bring intensity to your calls when you’re running on fumes. Sleep is a performance multiplier. 3. Calendar Block Your Breaks My calendar is blocked 12–1 PM every day. Lunch with my wife. A walk. Or just quiet. Three hours of deep work → 1 hour of recovery → back for the final sprint. Burnout doesn’t happen from work. It happens from nonstop work. 4. Ruthless Time Boundaries I stop work at 5 PM most days. No nights. No weekends. Ever. You don’t need 70 hours a week to crush quota. You need to stop saying yes to distractions and start owning your schedule. Parkinson’s Law is real: The less time you give yourself, the more efficient you become. 5. Say No to Busy Work I use the 12 Week Year system. Everything I do ties back to a goal. Internal meetings? Minimized. Slack and email? Batched and time-boxed. If it doesn’t move pipeline or drive impact, I don’t touch it. If you’re working 60+ hours and still missing quota... It’s not your work ethic that’s broken. It’s your calendar. Stop measuring your week by hours worked. Start measuring it by energy invested in what matters. You don’t need to grind harder. You need to recharge better. Work less. Sell more. Live fully.

7 Comments
Like Comment

Performance Optimization Techniques

More in Performance Optimization Techniques

More Productivity topics

Explore categories