Advanced AI Training

Explore top LinkedIn content from expert professionals.

Jim Fan Jim Fan is an Influencer

NVIDIA Director of AI & Distinguished Scientist. Co-Lead of Project GR00T (Humanoid Robotics) & GEAR Lab. Stanford Ph.D. OpenAI's first intern. Solving Physical AGI, one motor at a time.

216,407 followers 9mo
Report this post
If an AI can control 1,000 robots to perform 1 million skills in 1 billion different simulations, then it may "just work" in our real world, which is simply another point in the vast space of possible realities. This is the fundamental principle behind why simulation works so effectively for robotics. Real-world teleoperation data scales linearly with human time (< 24 hrs/robot/day). Sim data scales exponentially with compute. There are 3 big trends for simulators in the near future: 1. Massive parallelization on large clusters. Physics equations are "just" matrix math at their core. I hear GPUs are good at matrix math 🔥. One can run 100K copies of simulation on a single GPU. To put this number in perspective: 1 hour of wallclock compute time gives a robot 10 years (!!) of training experience. That's how Neo was able to learn martial arts in a blink of an eye in the Matrix Dojo. 2. Generative graphics pipeline. Traditionally, simulators require a huge amount of manual effort from artists: 3D assets, textures, scene layouts, etc. But every component in the workflow can be automated: text-to-image, text-to-3D mesh, and LLMs that write Universal Scene Description (USD) files as a coding exercise. RoboCasa is one example of a prior work (https://robocasa.ai/). 3. End2end neural net that acts as simulator itself. This is still bluesky research and quite far from replacing a graphics pipeline, but we are seeing some exciting signs-of-life based on video gen models: Sora, Veo2, CogVideoX, Hunyuan (text-to-video); and action-driven world models: GameNGen, Oasis, Genie-2, etc. Genesis does great on (1) for certain tasks, shows good promises on (2), and could become a data generation tool for reaching (3). Its sim2real capabilities for locomotion are good, but there's still a long way to go for contact-rich, dexterous manipulation. It shows a bold vision and is on the right path to providing a virtual cradle for embodied AI. It is open-source and puts a streamlined user journey at the front and center. I had the privilege to know Zhou Xian and play a small part in his project since a year ago. Xian has been crunching code non-stop on Genesis with a very small group of core devs. He often replied to my messages at 3 am. Zhenjia Xu from our GEAR team helped with sim2real experiments in his spare time. Genesis is truly a grassroot effort with an intense focus on quality engineering. Nothing gives me more joy than seeing the simulation ecosystem bloom. Robotics should be a moonshot initiative owned by all of humanity. Congratulations! https://lnkd.in/gF7MSDXK

120 Comments
Like Comment
Reggie Townsend Reggie Townsend is an Influencer

9,973 followers 5mo
Report this post
When I have a conversation about AI with a layperson, reactions range from apocalyptic fears to unrestrained enthusiasm. Similarly, with the topic of whether to use synthetic data in corporate settings, perspectives among leaders vary widely. We're all cognizant that AI systems rely fundamentally on data. While most organizations possess vast data repositories, the challenge often lies in the quality rather than the quantity. A foundational data estate is a 21st century competitive advantage, and synthetic data has emerged as an increasingly compelling solution to address data quality in that estate. However, it raises another question. Can I trust synthetic more or less than experiential data? Inconveniently, it depends on context. High-quality data is accurate, complete, and relevant to the purpose for which its being used. Synthetic data can be generated to meet these criteria, but it must be done carefully to avoid introducing biases or inaccuracies, both of which are likely to occur to some measure in experiential data. Bottom line, there is no inherent hierarchical advantage between experiential data (what we might call natural data) and synthetic data—there are simply different characteristics and applications. What proves most trustworthy depends entirely on the specific context and intended purpose. I believe both forms of data deliver optimal value when employed with clarity about desired outcomes. Models trained on high-quality data deliver more reliable judgments on high impact topics like credit worthiness, healthcare treatments, and employment opportunities, thereby strengthening an organization's regulatory, reputational, and financial standing. For instance, in a recent visit a customer was grappling with a relatively modest dataset. They wanted to discern meaningful patterns within their limited data, concerned that an underrepresented data attribute or pattern might be critical to their analysis. A reasonable way of revealing potential patterns is to augment their dataset synthetically. The data set would maintain statistical integrity (the synthetic mimics the statistical properties and relationships of the original data) allowing any obscure patterns to emerge with clarity. We’re finding this method particularly useful for preserving privacy, identifying rare diseases or detecting sophisticated fraud. As we continue to proliferate AI across sectors, senior leaders must know it's not all "upside." Proper oversight mechanisms to verify that synthetic data accurately represents real-world conditions without introducing new distortions is a must. However, when approached with "responsible innovation" in mind, synthetic data offers a powerful tool for enabling organizations to augment limited datasets, test for bias, and enhance privacy protections, making synthetic data a competitive differentiator. #TrustworthyAI #ResponsibleInnovation #SyntheticData
No more previous content

No more next content

Reggie Townsend Reggie Townsend is an Influencer

When I have a conversation about AI with a layperson, reactions range from apocalyptic fears to unrestrained enthusiasm. Similarly, with the topic of whether to use synthetic data in corporate settings, perspectives among leaders vary widely. We're all cognizant that AI systems rely fundamentally on data. While most organizations possess vast data repositories, the challenge often lies in the quality rather than the quantity. A foundational data estate is a 21st century competitive advantage, and synthetic data has emerged as an increasingly compelling solution to address data quality in that estate. However, it raises another question. Can I trust synthetic more or less than experiential data? Inconveniently, it depends on context. High-quality data is accurate, complete, and relevant to the purpose for which its being used. Synthetic data can be generated to meet these criteria, but it must be done carefully to avoid introducing biases or inaccuracies, both of which are likely to occur to some measure in experiential data. Bottom line, there is no inherent hierarchical advantage between experiential data (what we might call natural data) and synthetic data—there are simply different characteristics and applications. What proves most trustworthy depends entirely on the specific context and intended purpose. I believe both forms of data deliver optimal value when employed with clarity about desired outcomes. Models trained on high-quality data deliver more reliable judgments on high impact topics like credit worthiness, healthcare treatments, and employment opportunities, thereby strengthening an organization's regulatory, reputational, and financial standing. For instance, in a recent visit a customer was grappling with a relatively modest dataset. They wanted to discern meaningful patterns within their limited data, concerned that an underrepresented data attribute or pattern might be critical to their analysis. A reasonable way of revealing potential patterns is to augment their dataset synthetically. The data set would maintain statistical integrity (the synthetic mimics the statistical properties and relationships of the original data) allowing any obscure patterns to emerge with clarity. We’re finding this method particularly useful for preserving privacy, identifying rare diseases or detecting sophisticated fraud. As we continue to proliferate AI across sectors, senior leaders must know it's not all "upside." Proper oversight mechanisms to verify that synthetic data accurately represents real-world conditions without introducing new distortions is a must. However, when approached with "responsible innovation" in mind, synthetic data offers a powerful tool for enabling organizations to augment limited datasets, test for bias, and enhance privacy protections, making synthetic data a competitive differentiator. #TrustworthyAI #ResponsibleInnovation #SyntheticData

21 Comments

Like Comment
21 Comments
Like Comment
Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer

584,831 followers 3mo
Report this post
One of the hardest parts of fine-tuning models? Getting high-quality data without breaching compliance. This Synthetic Data Generator Pipeline ia built to solve exactly that, and it is open-sources for you to use! You can now generate task-specific, high-quality synthetic datasets without using a single piece of real data, and still fine-tune performant models. Here’s what makes it different: → LLM-driven config generation Start with a simple prompt describing your task. The pipeline auto-generates YAMLs with structured I/O schemas, filters for diversity, and LLM-based evaluation criteria. → Streaming synthetic data generation The system emits JSON-formatted examples, prompt, response, metadata at scale. Each example includes row-level quality scores. You get transparency at both data and job level. → SFT + RFT with evaluator feedback We use models like DeepSeek R1 as judges. Low-quality clusters are automatically identified and regenerated. Each iteration teaches the model what “good” looks like. → Closed-loop optimization The pipeline fine-tunes itself, adjusting decoding params, enriching prompt structures, or expanding label schemas based on what’s missing. → Zero reliance on sensitive data No PII. No customer data. This is purpose-built for enterprise, healthcare, finance, and anyone who’s building responsibly. And it works: 📊 On an internal benchmark: - SFT with real, curated data: 79% accuracy - RFT with synthetic-only data: 73% accuracy That’s huge, especially when your hands are tied on data access. If you’re building copilots, vertical agents, or domain-specific models and want to skip the data wrangling phase, this is for you. Built by Fireworks AI 🔗 Try it out: https://lnkd.in/dXXDdyuM
No more previous content

No more next content

Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer

One of the hardest parts of fine-tuning models? Getting high-quality data without breaching compliance. This Synthetic Data Generator Pipeline ia built to solve exactly that, and it is open-sources for you to use! You can now generate task-specific, high-quality synthetic datasets without using a single piece of real data, and still fine-tune performant models. Here’s what makes it different: → LLM-driven config generation Start with a simple prompt describing your task. The pipeline auto-generates YAMLs with structured I/O schemas, filters for diversity, and LLM-based evaluation criteria. → Streaming synthetic data generation The system emits JSON-formatted examples, prompt, response, metadata at scale. Each example includes row-level quality scores. You get transparency at both data and job level. → SFT + RFT with evaluator feedback We use models like DeepSeek R1 as judges. Low-quality clusters are automatically identified and regenerated. Each iteration teaches the model what “good” looks like. → Closed-loop optimization The pipeline fine-tunes itself, adjusting decoding params, enriching prompt structures, or expanding label schemas based on what’s missing. → Zero reliance on sensitive data No PII. No customer data. This is purpose-built for enterprise, healthcare, finance, and anyone who’s building responsibly. And it works: 📊 On an internal benchmark: - SFT with real, curated data: 79% accuracy - RFT with synthetic-only data: 73% accuracy That’s huge, especially when your hands are tied on data access. If you’re building copilots, vertical agents, or domain-specific models and want to skip the data wrangling phase, this is for you. Built by Fireworks AI 🔗 Try it out: https://lnkd.in/dXXDdyuM

51 Comments

Like Comment
51 Comments
Like Comment
Greg Coquillo Greg Coquillo is an Influencer

Product Leader @AWS | Startup Investor | 2X Linkedin Top Voice for AI, Data Science, Tech, and Innovation | Quantum Computing & Web 3.0 | I build software that scales AI/ML Network infrastructure

212,953 followers 4mo
Report this post
Many of us are truggling to keep up with the evolution of Retrieval-Augmented Generation (RAG) This landscape is growing fast and it’s no longer just about pairing search with a language model. These 11 new types of RAG unlock advanced reasoning, factual accuracy, and collaboration across agents. Here’s how each new RAG type levels up your AI workflows: 1. 🔸InstructRAG Enhances task planning by integrating instruction graphs with RAG, ideal for LLMs in structured workflows. 2. 🔸MADAM-RAG Uses multi-agent debates to resolve conflicting info in retrieved documents, improving answer reliability. 3. 🔸CoRAG Enables shared learning across multiple clients, perfect for low-resource environments and collaborative training. 4. 🔸HM-RAG Supports multimodal retrieval (text, graphs, web) using hierarchical agents, great for complex data sources. 5. 🔸ReaRAG Improves reasoning accuracy using knowledge-guided paths and fewer unnecessary model iterations. 6. 🔸HeteRAG Decouples knowledge chunks and uses adaptive prompts for more precise, efficient information retrieval. 7. 🔸MCTS-RAG Incorporates Monte Carlo Tree Search to enhance step-by-step reasoning in knowledge-heavy domains. 8. 🔸CDF-RAG Uses causal graphs and dynamic feedback loops for reasoning over cause-and-effect, perfect for research and policy. 9. 🔸Typed-RAG Answers open-ended questions better by classifying types (comparison, debate, etc.) and applying type-specific logic. 10. 🔸NodeRAG Blends heterogeneous graph structures into RAG systems, which is ideal for multihop questions and structured data. 11. 🔸HyperRAG Tackles hallucinations using hypergraph models to validate relationships, especially helpful in medical and legal domains. ✅ These RAG variants push your AI system from basic Q&A to domain-specific intelligence. 💡 Save this guide and follow for more deep dives into advanced LLM architectures and real-world AI patterns. #genai #aiagents #artificialIntelligence
No more previous content

No more next content

Greg Coquillo Greg Coquillo is an Influencer

Product Leader @AWS | Startup Investor | 2X Linkedin Top Voice for AI, Data Science, Tech, and Innovation | Quantum Computing & Web 3.0 | I build software that scales AI/ML Network infrastructure

Many of us are truggling to keep up with the evolution of Retrieval-Augmented Generation (RAG) This landscape is growing fast and it’s no longer just about pairing search with a language model. These 11 new types of RAG unlock advanced reasoning, factual accuracy, and collaboration across agents. Here’s how each new RAG type levels up your AI workflows: 1. 🔸InstructRAG Enhances task planning by integrating instruction graphs with RAG, ideal for LLMs in structured workflows. 2. 🔸MADAM-RAG Uses multi-agent debates to resolve conflicting info in retrieved documents, improving answer reliability. 3. 🔸CoRAG Enables shared learning across multiple clients, perfect for low-resource environments and collaborative training. 4. 🔸HM-RAG Supports multimodal retrieval (text, graphs, web) using hierarchical agents, great for complex data sources. 5. 🔸ReaRAG Improves reasoning accuracy using knowledge-guided paths and fewer unnecessary model iterations. 6. 🔸HeteRAG Decouples knowledge chunks and uses adaptive prompts for more precise, efficient information retrieval. 7. 🔸MCTS-RAG Incorporates Monte Carlo Tree Search to enhance step-by-step reasoning in knowledge-heavy domains. 8. 🔸CDF-RAG Uses causal graphs and dynamic feedback loops for reasoning over cause-and-effect, perfect for research and policy. 9. 🔸Typed-RAG Answers open-ended questions better by classifying types (comparison, debate, etc.) and applying type-specific logic. 10. 🔸NodeRAG Blends heterogeneous graph structures into RAG systems, which is ideal for multihop questions and structured data. 11. 🔸HyperRAG Tackles hallucinations using hypergraph models to validate relationships, especially helpful in medical and legal domains. ✅ These RAG variants push your AI system from basic Q&A to domain-specific intelligence. 💡 Save this guide and follow for more deep dives into advanced LLM architectures and real-world AI patterns. #genai #aiagents #artificialIntelligence

67 Comments

Like Comment
67 Comments
Like Comment
Vin Vashishta Vin Vashishta is an Influencer

AI Strategist | Monetizing Data & AI For The Global 2K Since 2012 | 3X Founder | Best-Selling Author

202,872 followers 11mo
Report this post
Ilya Sutskever explains a lot of obscure concepts, but this one will drive AI capabilities from linear improvement, to exponential. Most AI labs use agentic platforms to improve models faster than data alone. Here’s how it works. Simple agentic platforms provide access to prebuilt apps and existing curated data sources. In the self-improvement paradigm, new agents are added to build new apps and generate new data sources. 1️⃣ During model training, agents are tasked with identifying training gaps. 2️⃣ They hand those gaps to a prescriptive agent that guesses what tools or datasets will help fill each gap. 3️⃣ App builder and synthetic data agents deliver the proposed training environment. 4️⃣ The training gap agent assesses the model to see if the training gap is narrowing based on the improvement plan. If it isn’t, the cycle repeats itself. The goal isn’t to a single model, but to improve all agents to the point where each does its job effectively. The training environment (or playground) grows to host a massive app and dataset suite. In phase 2, the goal shifts from improving the playground to improving the models’ ability to self-improve. Simply put, the objective shifts from optimizing the playground to optimizing how models use the playground to improve. In phase 3, models are optimized to pass on what they learn. Optimized teacher models deliver the biggest jumps in model capabilities, but are least understood. Near-term AI capabilities were overstated, but long-term AI capabilities are underestimated. Models teaching models and models that self-improve, will accelerate skills, capabilities, and eventually, expertise development. #ArtificialIntelligence #GenAI

12 Comments
Like Comment
Mrukant Popat

💥 Igniting Innovation in Engineering | CTO | AI / ML / Computer Vision, OS - operating system, Platform firmware | 100M+ devices running my firmware

5,102 followers 6mo
Report this post
𝗠𝗶𝘅𝘁𝘂𝗿𝗲 𝗼𝗳 𝗘𝘅𝗽𝗲𝗿𝘁𝘀 (𝗠𝗼𝗘): 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗟𝗟𝗠𝘀 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗹𝘆 𝘄𝗶𝘁𝗵 𝗦𝗽𝗮𝗿𝘀𝗲 𝗖𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻 Large Language Models (LLMs) continue to grow in size, pushing the limits of AI capabilities but also introducing challenges in cost, memory, and inference speed. Mixture of Experts (MoE) offers an innovative approach by using sparse computation, activating only a subset of parameters per input. Let's explore recent advances in MoE architectures and how models like DeepSeek-v2 and DeepSeek-v3 are optimizing efficiency. 🔹 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗶𝗻 𝗠𝗼𝗘: 𝗥𝗼𝘂𝘁𝗶𝗻𝗴 𝗕𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸𝘀 & 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗧𝗿𝗮𝗱𝗲-𝗼𝗳𝗳𝘀 While MoE improves efficiency, it also faces key challenges: 𝗧𝗼𝗸𝗲𝗻 𝗗𝗿𝗼𝗽𝗽𝗶𝗻𝗴 𝗶𝗻 𝗟𝗼𝗻𝗴 𝗦𝗲𝗾𝘂𝗲𝗻𝗰𝗲𝘀: OpenMoE struggles with routing stability, sometimes losing tokens in long sequences. Fixed Routing in Pretraining: Early routing patterns can be inefficient post-training. 𝗗𝗼𝗺𝗮𝗶𝗻 𝗦𝗵𝗶𝗳𝘁 𝗜𝘀𝘀𝘂𝗲𝘀: MoE models may struggle to generalize across different data distributions. A recommended solution is incorporating instruction-following data in pretraining to enhance routing adaptability. 🚀 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸 𝗠𝗼𝗘: Smarter Scaling for AI Models The DeepSeek series addresses these issues with innovative optimizations: 🔸 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝘃𝟮: 𝟮𝟯𝟲𝗕 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀, 𝟮𝟭𝗕 𝗔𝗰𝘁𝗶𝘃𝗲 1️⃣ Multi-Head Latent Attention (MLA): Cuts memory use by 93% with efficient KV cache storage. 2️⃣ Fine-Grained Expert Allocation: Balances shared and specialized experts across devices. 3️⃣ Device-Level Load Balancing Loss: Ensures even routing across devices, improving stability. 🔸 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝘃𝟯: 𝗔 𝟲𝟳𝟭𝗕 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 𝗠𝗼𝗱𝗲𝗹 𝘄𝗶𝘁𝗵 𝗡𝗲𝘄 𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗺𝗲𝗻𝘁𝘀 1️⃣ Multi-Token Prediction (MTP): Predicts multiple tokens at once for better efficiency. 2️⃣ Auxiliary-Loss-Free Load Balancing: Dynamically adjusts expert selection without added inefficiencies. 3️⃣ FP8 Mixed Precision Training: Reduces training costs significantly (~$5.6M for full training). 4️⃣ Extensive Post-Training: Includes context extension (128K tokens), SFT, RLHF, and knowledge distillation. 📊 𝗞𝗲𝘆 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀 ✅ Trained with 2.78M H800 GPU hours ✅ Performance rivals top closed-source LLMs ✅ Practical, scalable MoE for real-world deployment 🔮 𝗧𝗵𝗲 𝗙𝘂𝘁𝘂𝗿𝗲 𝗼𝗳 𝗠𝗼𝗘: 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗔𝗜 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 MoE is revolutionizing LLM training, making sparse computation viable at scale. While early MoE models had challenges, recent breakthroughs like MLA, MTP, and smarter load balancing are proving MoE's potential. DeepSeek-v3 shows that sparse models can match dense models, signaling a shift in AI scaling strategies. What’s your take on MoE architectures? Will they define the future of AI, or do dense models still have an edge? Let’s discuss! 👇 credit : Cameron R. Wolfe, Ph.D.
No more previous content

No more next content

Mrukant Popat

💥 Igniting Innovation in Engineering | CTO | AI / ML / Computer Vision, OS - operating system, Platform firmware | 100M+ devices running my firmware

𝗠𝗶𝘅𝘁𝘂𝗿𝗲 𝗼𝗳 𝗘𝘅𝗽𝗲𝗿𝘁𝘀 (𝗠𝗼𝗘): 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗟𝗟𝗠𝘀 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗹𝘆 𝘄𝗶𝘁𝗵 𝗦𝗽𝗮𝗿𝘀𝗲 𝗖𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻 Large Language Models (LLMs) continue to grow in size, pushing the limits of AI capabilities but also introducing challenges in cost, memory, and inference speed. Mixture of Experts (MoE) offers an innovative approach by using sparse computation, activating only a subset of parameters per input. Let's explore recent advances in MoE architectures and how models like DeepSeek-v2 and DeepSeek-v3 are optimizing efficiency. 🔹 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗶𝗻 𝗠𝗼𝗘: 𝗥𝗼𝘂𝘁𝗶𝗻𝗴 𝗕𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸𝘀 & 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗧𝗿𝗮𝗱𝗲-𝗼𝗳𝗳𝘀 While MoE improves efficiency, it also faces key challenges: 𝗧𝗼𝗸𝗲𝗻 𝗗𝗿𝗼𝗽𝗽𝗶𝗻𝗴 𝗶𝗻 𝗟𝗼𝗻𝗴 𝗦𝗲𝗾𝘂𝗲𝗻𝗰𝗲𝘀: OpenMoE struggles with routing stability, sometimes losing tokens in long sequences. Fixed Routing in Pretraining: Early routing patterns can be inefficient post-training. 𝗗𝗼𝗺𝗮𝗶𝗻 𝗦𝗵𝗶𝗳𝘁 𝗜𝘀𝘀𝘂𝗲𝘀: MoE models may struggle to generalize across different data distributions. A recommended solution is incorporating instruction-following data in pretraining to enhance routing adaptability. 🚀 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸 𝗠𝗼𝗘: Smarter Scaling for AI Models The DeepSeek series addresses these issues with innovative optimizations: 🔸 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝘃𝟮: 𝟮𝟯𝟲𝗕 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀, 𝟮𝟭𝗕 𝗔𝗰𝘁𝗶𝘃𝗲 1️⃣ Multi-Head Latent Attention (MLA): Cuts memory use by 93% with efficient KV cache storage. 2️⃣ Fine-Grained Expert Allocation: Balances shared and specialized experts across devices. 3️⃣ Device-Level Load Balancing Loss: Ensures even routing across devices, improving stability. 🔸 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝘃𝟯: 𝗔 𝟲𝟳𝟭𝗕 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 𝗠𝗼𝗱𝗲𝗹 𝘄𝗶𝘁𝗵 𝗡𝗲𝘄 𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗺𝗲𝗻𝘁𝘀 1️⃣ Multi-Token Prediction (MTP): Predicts multiple tokens at once for better efficiency. 2️⃣ Auxiliary-Loss-Free Load Balancing: Dynamically adjusts expert selection without added inefficiencies. 3️⃣ FP8 Mixed Precision Training: Reduces training costs significantly (~$5.6M for full training). 4️⃣ Extensive Post-Training: Includes context extension (128K tokens), SFT, RLHF, and knowledge distillation. 📊 𝗞𝗲𝘆 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀 ✅ Trained with 2.78M H800 GPU hours ✅ Performance rivals top closed-source LLMs ✅ Practical, scalable MoE for real-world deployment 🔮 𝗧𝗵𝗲 𝗙𝘂𝘁𝘂𝗿𝗲 𝗼𝗳 𝗠𝗼𝗘: 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗔𝗜 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 MoE is revolutionizing LLM training, making sparse computation viable at scale. While early MoE models had challenges, recent breakthroughs like MLA, MTP, and smarter load balancing are proving MoE's potential. DeepSeek-v3 shows that sparse models can match dense models, signaling a shift in AI scaling strategies. What’s your take on MoE architectures? Will they define the future of AI, or do dense models still have an edge? Let’s discuss! 👇 credit : Cameron R. Wolfe, Ph.D.

20 Comments

Like Comment
20 Comments
Like Comment
Pan Wu Pan Wu is an Influencer

Senior Data Science Manager at Meta

48,398 followers 1y
Report this post
In the realm of building machine learning models, there are typically two primary data sources: organic data, stemming directly from customer activities, and synthetic data, generated artificially through a deliberate process. Each holds its unique value and serves a distinct purpose. This blog post, written by the Data Scientists at Expedia Group, shares how their team leveraged synthetic search data to enable flight price forecasting. -- [Business need] The primary objective is to develop a price forecasting model that can offer future flight pricing predictions to customers. For instance, it aims to inform customers whether flight prices are likely to rise or fall in the next 7 days, aiding them in making informed purchasing decisions. -- [Challenges] However, organic customer search data falls short due to its sparsity, even for the most popular routes. For instance, it's rare to see daily searches for two-way flights from SFO to LAX for every conceivable combination of departure and arrival dates in the upcoming three months. The limitations of this organic data are evident, making it challenging to construct a robust forecasting model. -- [Solution] This is where synthetic search data comes into play. By systematically simulating search activities on the same route and under identical configurations, such as travel dates, on a regular basis, it provides a more comprehensive and reliable source of information. Leveraging synthetic data is a potent tool for systematic exploration, but it requires a well-balanced approach to ensure that the benefits outweigh the associated costs. Striking this balance is essential for unlocking the full potential of synthetic data in data science models. – – – To better illustrate concepts in this and future tech blogs, I created one podcast "Snacks Weekly on Data Science" (https://lnkd.in/gKgaMvbh) to make them more accessible. It's now available on Spotify and Apple podcasts. Please check it out, and I appreciate your support! #machinelearning #datascience #search #synthetic #data #forecasting https://lnkd.in/gRjR5tTQ

Using Synthetic Search Data for Flights Price Forecasting medium.com

2 Comments
Like Comment
Jazmia Henry

13,410 followers 10mo
Report this post
My DPhil research is shaping up... Fine-tuning large language models (LLMs) has revolutionized how we use AI, but let’s face it—it’s not perfect. Current methods demand too much: labeled data, computational resources, and time. Plus, they’re stuck in static environments. The result? Models that are powerful but rigid, unable to adapt to real-world, dynamic tasks. What if we could change that? My dissertation research proposes a groundbreaking method that integrates LLMs into simulation environments, combining self-training and reinforcement learning. Instead of relying on static datasets, these models learn dynamically, adapting to evolving scenarios. This approach reduces compute costs while improving metrics like perplexity and task success rates. It’s not just fine-tuning; it’s adaptive learning for AI that thinks on its feet.

5 Comments
Like Comment
Andrew Clearwater

Partner @ Dentons | Privacy, Cybersecurity, AI Governance

5,252 followers 9mo
Report this post
#EDPB opinion on #AI models and the #GDPR (Opinion 28/2024) #Anonymity of AI Models The EDPB states that AI models trained on #personaldata cannot, in all cases, be considered anonymous. Anonymity factors: * The likelihood of direct extraction of personal data from the model * The likelihood of obtaining personal data from queries * All means reasonably likely to be used by the controller or others The key operational step here is to document your assessment of these factors and the approaches that were taken to limited the risks of personal data extraction. #LegitimateInterest as Legal Basis When assessing legitimate interest as a legal basis for AI model development and deployment the focus remains on the existing three-step test. Further general considerations are outlined in the opinion where the role of data subjects’ reasonable expectations and mitigating measures to limit the impact of the processing are highlighted. A key operational step here is to view and possibly enhance the information provided to data subjects in the context of the processing Consequences of Unlawful Processing The Opinion outlines the impact of unlawful processing during AI model development and shares three factors for assessing the impact: * Whether development and deployment are separate purposes * The controller's due diligence in assessing the model's lawfulness * The risks posed by the deployment phase processing What are some of the areas of operational focus: * Enhanced documentation requirements for AI model development and deployment * Stringent legitimate interest assessments specific to AI contexts * Emphasis on transparency and managing data subjects' expectations * Thorough risk assessments, particularly for fundamental rights impacts

1 Comment
Like Comment
Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

40,498 followers 8mo
Report this post
Researchers from Oxford University just achieved a 14% performance boost in mathematical reasoning by making LLMs work together like specialists in a company. In their new MALT (Multi-Agent LLM Training) paper, they introduced a novel approach where three specialized LLMs - a generator, verifier, and refinement model - collaborate to solve complex problems, similar to how a programmer, tester, and supervisor work together. The breakthrough lies in their training method: (1) Tree-based exploration - generating thousands of reasoning trajectories by having models interact (2) Credit attribution - identifying which model is responsible for successes or failures (3) Specialized training - using both correct and incorrect examples to train each model for its specific role Using this approach on 8B parameter models, MALT achieved relative improvements of 14% on the MATH dataset, 9% on CommonsenseQA, and 7% on GSM8K. This represents a significant step toward more efficient and capable AI systems, showing that well-coordinated smaller models can match the performance of much larger ones. Paper https://lnkd.in/g6ag9rP4 — Join thousands of world-class researchers and engineers from Google, Stanford, OpenAI, and Meta staying ahead on AI http://aitidbits.ai
No more previous content

No more next content

Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

Researchers from Oxford University just achieved a 14% performance boost in mathematical reasoning by making LLMs work together like specialists in a company. In their new MALT (Multi-Agent LLM Training) paper, they introduced a novel approach where three specialized LLMs - a generator, verifier, and refinement model - collaborate to solve complex problems, similar to how a programmer, tester, and supervisor work together. The breakthrough lies in their training method: (1) Tree-based exploration - generating thousands of reasoning trajectories by having models interact (2) Credit attribution - identifying which model is responsible for successes or failures (3) Specialized training - using both correct and incorrect examples to train each model for its specific role Using this approach on 8B parameter models, MALT achieved relative improvements of 14% on the MATH dataset, 9% on CommonsenseQA, and 7% on GSM8K. This represents a significant step toward more efficient and capable AI systems, showing that well-coordinated smaller models can match the performance of much larger ones. Paper https://lnkd.in/g6ag9rP4 — Join thousands of world-class researchers and engineers from Google, Stanford, OpenAI, and Meta staying ahead on AI http://aitidbits.ai

4 Comments

Like Comment
4 Comments
Like Comment

Advanced AI Training

More in Advanced AI Training

More Artificial Intelligence topics

Explore categories