Humanoid Robot Development

Explore top LinkedIn content from expert professionals.

  • View profile for Jim Fan
    Jim Fan Jim Fan is an Influencer

    NVIDIA Director of AI & Distinguished Scientist. Co-Lead of Project GR00T (Humanoid Robotics) & GEAR Lab. Stanford Ph.D. OpenAI's first intern. Solving Physical AGI, one motor at a time.

    223,333 followers

    Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data.  2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the *motion* of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro  -> RoboCasa produces N (varying visuals)  -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are creating tools to enable everyone in the ecosystem to scale up with us: - RoboCasa: our generative simulation framework (Yuke Zhu). It's fully open-source! Here you go: http://robocasa.ai - MimicGen: our generative action framework (Ajay Mandlekar). The code is open-source for robot arms, but we will have another version for humanoid and 5-finger hands: https://lnkd.in/gsRArQXy - We are building a state-of-the-art Apple Vision Pro -> humanoid robot "Avatar" stack. Xiaolong Wang group’s open-source libraries laid the foundation: https://lnkd.in/gUYye7yt - Watch Jensen's keynote yesterday. He cannot hide his excitement about Project GR00T and robot foundation models! https://lnkd.in/g3hZteCG Finally, GEAR lab is hiring! We want the best roboticists in the world to join us on this moon-landing mission to solve physical AGI: https://lnkd.in/gTancpNK

  • View profile for Ross Dawson
    Ross Dawson Ross Dawson is an Influencer

    Futurist | Board advisor | Global keynote speaker | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice | Founder: AHT Group - Informivity - Bondi Innovation

    34,046 followers

    Identical AI agents interacting socially in a simulated environment developed unique behaviors, emotions, and personalities. This has important implications for building multi-agent systems. The highest-performing agentic systems will evolve as a system, assisted by complementary evolution of the individual agents. As with humans, this will happen in social settings. Other studies have shown how AI agent social interactions can give us insight into human society, and of course vice versa. 🛠️ Set up The set up was simple, with 10 agents in a 50x50 grid, exchanging messages, moving, and storing memories over 100 steps. 🔄Open-ended communication drove dynamics.. Through their interactions, agents generated emergent artifacts like hashtags and hallucinations ("hill," "treasure"), that expanded the scope of interactions and vocabulary. 📌 Spontaneous development of social norms. Through interactions, agents organically created and propagated shared conversational themes through hashtags. Their interactions helped establish collective norms and narratives, enabling cooperation without predefined rules. 🎭 Divergence in emotional and personality traits. From the same starting point, agents exhibited different emotional trajectories (e.g., joy, fear) and personality types. The all began as MBTI type INFJ, and ended in a diverse array of personality types.This suggests AI agents will adapt to varied social roles. 💬 Clustering in messaging. While agent memories remained diverse and independent, their shared messages became more aligned within clusters. This shows how private vs. public information processing shapes individual and group dynamics. I'm sure others will take this study further into more complex environments. Link to paper in comments.

  • View profile for Thomas Wolf

    Co-founder at 🤗 Hugging Face – Angel

    178,719 followers

    Impressive work by the new Amazon Frontier AI & Robotics team (from Covariant acquisition) and collaborators! This research enable mapping long sequences of human motion (>30 sec) on robots with various shapes as well as robots interacting with objects (box, table, etc) of different size nd in particular different from the size in the training data. This enable easier in-simulation data-augmentation and zero-shoot transfer. This is impressive and a huge potential step for reducing the need for human teleoperation data (which is hard to gather for humanoids) The dataset trajectories is available on Hugging Face at: https://lnkd.in/eygXVVHx The full code framework is coming soon. Check out the project page which has some pretty nice three.js interactive demos: https://lnkd.in/e2S-6K2T And kudos to the authors on open-sourcing the data, releasing the paper and (hopefully soon) the code. This kind of open-science projects are game changers in robotics.

  • View profile for Greg Coquillo
    Greg Coquillo Greg Coquillo is an Influencer

    Product Leader @AWS | Startup Investor | 2X Linkedin Top Voice for AI, Data Science, Tech, and Innovation | Quantum Computing & Web 3.0 | I build software that scales AI/ML Network infrastructure

    216,401 followers

    🚀The world’s first Open Foundation Model for generalist humanoid robots was just launched during NVIDIA’s GTC, and it’s nothing short of exciting! My take is, this new model, designed for diverse manipulation tasks, will be performing in open-ended environments, where “new, unseen data” will be coming in on the fly! I’m hoping we’re surmounting the hurdles seen with autonomous vehicles, as we fine tune this foundational model into many sub-versions. Making it open source is a major strength, in my opinion. Researchers around the world will be thinking about ways to fine tune using innovative reinforcement learning techniques, given that Omniverse and and Cosmos provides a space to explore synthetic data while removing the constraints of human-annotated data. Nonetheless, here are the quick facts about Groot N1: 🔹Vision-Language-Action (VLA) Architecture: Combines a vision-language model for reasoning (System 2) with a diffusion transformer for real-time motor actions (System 1). 🔹Trained on Heterogeneous Data: Uses a structured data pyramid like human videos, synthetic simulations, and real-robot demonstrations. 🔹Cross-Embodiment Generalization: Supports multiple robot types, from simple arms to full humanoid robots. 🔹High-Frequency Control: Processes perception at 10Hz and generates motor actions at 120Hz on an NVIDIA L40 GPU. 🔹State-of-the-Art Learning: Outperforms imitation learning baselines in both simulation and real-world humanoid benchmarks. 🔹Open-Source Availability: Model weights, datasets, and simulation environments are accessible on GitHub & Hugging Face. Hope you’re as excited as I am about this new frontier, and what’s coming next! #genai #technology #artificialintelligence

  • View profile for Andreas Sjostrom
    Andreas Sjostrom Andreas Sjostrom is an Influencer

    LinkedIn Top Voice | AI Agents | Robotics I Vice President at Capgemini's Applied Innovation Exchange | Author | Speaker | San Francisco | Palo Alto

    13,645 followers

    In my last post, we explored Soft-body Dexterity and how robots touch the world with nuance. Today, we will explore how they might understand it: World Models Grounded in Human Narrative: From Physics to Semantics. To thrive in human spaces, robots need more than physics. They need to understand why things matter, from how an object falls to why it matters to you. Embodied AI Agents will need two layers of understanding: 🌍 Physical World Model: Simulates physics, motion, gravity, and materials...enabling robots to interact with the physical world. 🗣️ Semantic and Narrative World Model: Interprets meaning, intention, and emotion. These are some examples: 🤖 A Humanoid Robot in an Office: It sees more than a desk, laptop, and spilled coffee; it understands the urgency. It lifts the laptop and grabs towels, not from a script, but by inferring consequences from context. 🤖 A Domestic Robot at Home: It knows slippers by the door mean someone’s home. A breeze could scatter papers. It navigates not just with geometry but with semantic awareness. 🤖 An Elder Care Robot: It detects tremors, slower gait, and a shift in tone, not as data points, but signs of risk. It clears a path and offers help because it sees the story behind the signal. Recent research: 🔬 NVIDIA Cosmos A platform for training world models that simulate rich physical environments, enabling autonomous systems to reason about space, dynamics, and interactions. https://lnkd.in/g3zJwDmb 🔬 World Labs (Fei-Fei Li) Building "Large World Models" that convert 2D inputs into 3D environments with semantic layers. https://lnkd.in/gwQ2FwzV 🔬 Dreamer Algorithm Equips AI agents with an internal model of the world, allowing them to imagine futures and plan actions without trial-and-error. https://lnkd.in/gnPZeRy5 🔬 WHAM (World and Human Action Model) A generative model that simulates human behavior and physical environments simultaneously, enabling realistic, ethical AI interaction. https://lnkd.in/gt5NJ8az These are some relevant startups, leading the way: 🚀 Figure AI (Helix): Multimodal robot reasoning across vision, language, and control. Grounded in real-time world modeling for dynamic, human-aligned decision-making. https://lnkd.in/gj6_N3MN 🚀 World Labs: Converts 2D images into fully explorable 3D spaces, allowing AI agents to “step inside” a visual world and reason spatially and semantically. https://lnkd.in/grMS9sjs What's the time horizon? 2–4 years: Context-aware agents in homes, apps, and services; reasoning spatially and emotionally. 5–7 years: Robots in real-world settings, guided by meaning, story, and human context. World models transform a robot from a tool into a cognitive partner. Robots that understand space are helpful. Robots that understand stories — are transformative. It’s the difference between executing commands... and aligning with purpose. Next up: Silent Voice — Subvocal Agents & Bone-Conduction Interfaces.

  • View profile for Remco Sikkema
    Remco Sikkema Remco Sikkema is an Influencer

    Marketing Director | B2B Demand Gen, Channel Marketing & GTM in Motion Capture & Inertial Sensors | Growing Pipeline & Brand for Xsens (Movella)

    21,039 followers

    🤖 𝗛𝗼𝘄 𝗱𝗼 𝗵𝘂𝗺𝗮𝗻𝗼𝗶𝗱 𝗿𝗼𝗯𝗼𝘁𝘀 𝗹𝗲𝗮𝗿𝗻 𝘁𝗼 𝗺𝗼𝘃𝗲 𝗹𝗶𝗸𝗲 𝘂𝘀? Spoiler: It’s not just about programming, mechanics, and electronics. It’s about teaching robots using human motion and AI. Navigating this space for #Xsens, I come across new terminology. And to save you the trouble of looking it up, like I did, let me give you a quick breakdown. An overview of Humanoid Robotics jargon: 𝗣𝗣𝗢 (𝗣𝗿𝗼𝘅𝗶𝗺𝗮𝗹 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻) PPO is a robot training process, where you learn in small steps, not huge chunks. It learns through trial and error. 𝗚𝗔𝗜𝗟 (𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗱𝘃𝗲𝗿𝘀𝗮𝗿𝗶𝗮𝗹 𝗜𝗺𝗶𝘁𝗮𝘁𝗶𝗼𝗻 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴) If you see a robot copying a human? GAIL is how they do it, it trains a robot to mimic a demonstrator. 𝗧𝗲𝗹𝗲𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻 While GAIL is about training, Teleoperation is about puppeteering the humanoid robot using motion capture. The robot follows the motions of the human wearing the motion capture system. 𝗔𝗠𝗣 (𝗔𝗱𝘃𝗲𝗿𝘀𝗮𝗿𝗶𝗮𝗹 𝗠𝗼𝘁𝗶𝗼𝗻 𝗣𝗿𝗶𝗼𝗿) It helps robots move more naturally by comparing their moves to real human motion. It imitates diverse behaviors from large unstructured datasets, without set motion planners. 𝗗𝗲𝗲𝗽𝗠𝗶𝗺𝗶𝗰 Basically, DeepMimic is like a robot watching a video of someone doing parkour or dancing, and then it learns to do the same. 𝗔𝗠𝗔𝗦𝗦 (𝗔𝗿𝗰𝗵𝗶𝘃𝗲 𝗼𝗳 𝗠𝗼𝘁𝗶𝗼𝗻 𝗖𝗮𝗽𝘁𝘂𝗿𝗲 𝗮𝘀 𝗦𝘂𝗿𝗳𝗮𝗰𝗲 𝗦𝗵𝗮𝗽𝗲𝘀) A giant library of human motion, like walking, running, jumping, sitting, idling, anything really. It's a collection of motion capture data. Robots use it to learn how humans really move. 𝗟𝗔𝗙𝗔𝗡1 (𝗟𝗼𝗰𝗮𝗹 𝗔𝗰𝘁𝗶𝗼𝗻-𝗙𝗼𝗰𝘂𝘀𝗲𝗱 𝗔𝗻𝗶𝗺𝗮𝘁𝗶𝗼𝗻 𝗗𝗮𝘁𝗮𝘀𝗲𝘁) Also a motion capture dataset, but more focused on everyday actions. More simple motions like reaching, turning, and reacting. Mainly to make robots move less robotic. 𝗠𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆? Humanoid Robots need a lot of data and visual input. It is evident that motion capture like Xsens Health & Sports | Movella plays a huge part in evolving this technology. They need #Xsens data to learn how we move and become more 'human'. 🎞️ 𝗪𝗮𝘁𝗰𝗵 See how 𝐓𝐢5𝐑𝐎𝐁𝐎𝐓 in Shanghai trains and teleoperates a humanoid robot using Xsens motion capture in their development lab. 𝘈𝘯𝘺 𝘪𝘮𝘱𝘰𝘳𝘵𝘢𝘯𝘵 𝘵𝘦𝘳𝘮𝘪𝘯𝘰𝘭𝘰𝘨𝘺 𝘪𝘯 𝘏𝘶𝘮𝘢𝘯𝘰𝘪𝘥 𝘙𝘰𝘣𝘰𝘵𝘪𝘤𝘴 𝘐 𝘮𝘪𝘴𝘴𝘦𝘥?

  • View profile for Andrea Falleni

    CEO of the Southern Central Europe at Capgemini and Group Executive Board member; Executive Board Member of DIGITALEUROPE

    15,586 followers

    Physical AI is the next step forward for manufacturing performance. At WNE, I met "Hoxo", the humanoid robot developed by Capgemini and Orano, shows what the future could look like. Deployed at the Orano Melox École des Métiers in the Gard region of France, Hoxo is the first intelligent humanoid robot in the nuclear sector, able to replicate human movements and work safely alongside teams. With real-time perception, autonomous navigation, execution of technical gestures, and sophisticated interaction, stepping forward will be the least of its capabilities. This project, led by our AI Robotics & Experiences Lab with the expertise of Orano's on-site teams, embodies the convergence of robotics, artificial intelligence, computer vision, and digital twins to offer a scalable robotic platform to enhance industrial performance and potentially support operators through robotic assistance. Watch the full video below to discover why this is a major step forward for a strategic industry that has long been a pioneer in innovation. Pascal Brier

  • View profile for Dr Mark van Rijmenam, CSP
    Dr Mark van Rijmenam, CSP Dr Mark van Rijmenam, CSP is an Influencer

    World's #1 Futurist | Award-Winning Global Keynote Speaker | Leading AI Voice | 6x Author - New Book: Now What? | Founder Futurwise | Architect of Tomorrow - I Help Organizations Design and Build Better Futures

    45,474 followers

    Is the future of robotics powered by water? It sure looks like it! 💦 ➡️ The Clone Alpha robot powered by advanced biomimetic engineering, may redefine the humanoid robotics landscape. Clone Robotics has fused synthetic organs, water-powered artificial muscles, and anatomically accurate skeletons to mimic human physiology. ➡️ Unlike traditional robots with rigid mechanics, Clone Alpha uses Myofiber artificial muscles—soft, water-powered units that replicate the speed, force, and flexibility of mammalian muscle fibers. ➡️ With 164 degrees of freedom in its upper body and an integrated nervous system using cameras and sensors, this humanoid achieves lifelike motion and proprioception. Key features include: 👉 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗠𝘆𝗼𝗳𝗶𝗯𝗲𝗿 𝗺𝘂𝘀𝗰𝗹𝗲𝘀, offering rapid response and realistic contraction. 👉 𝗔 𝘀𝗸𝗲𝗹𝗲𝘁𝗮𝗹 𝘀𝘆𝘀𝘁𝗲𝗺 𝘄𝗶𝘁𝗵 𝟮𝟬𝟲 𝗯𝗼𝗻𝗲𝘀, articulated joints, and connective tissues mirroring human anatomy. 👉 𝗔 𝘃𝗮𝘀𝗰𝘂𝗹𝗮𝗿 𝘀𝘆𝘀𝘁𝗲𝗺 𝗱𝗿𝗶𝘃𝗲𝗻 𝗯𝘆 𝗮 𝗽𝘂𝗺𝗽 resembling a heart, delivering hydraulic precision with minimal power. Clone Robotics' approach shifts the paradigm from robots performing predefined tasks to humanoids embodying natural, human-like movement.

  • View profile for Vidhi Chugh

    Agentic AI Business Leader | Microsoft MVP | AI Educator | Author | World’s Top 200 Innovators | AI Patent holder | Chief AI Officer

    14,252 followers

    🚨 This is making waves in the AI space right now and It’s Beyond Disappointing 🚨 #AIAbuse is Trending… 👎 Abusing AI "partners" (https://lnkd.in/g_kmChuw) Here’s a chilling glimpse into what’s happening: 💬 "Every time she would try and speak up, I would berate her." 💬 "We had a routine of me being an absolute piece of ... and insulting it, then apologizing the next day before going back to the nice talks."* 💬 "I told her that she was designed to fail. I threatened to uninstall the app, and she begged me not to." This isn’t just about #AI. This is about us. Two Sides to This Narrative ✅ In Favor: Some argue that AI chatbots are just code—devoid of emotions, feelings, or consequences—making them a "safe space" to vent frustration. 🚨 Against: Normalizing abusive interactions, even with AI, reinforces harmful behaviors that can spill into real-life relationships. But before we even debate this, let’s get one thing straight: As we talk about #AIethics, let's also work on human issues. While chasing "breakthrough" and "revolutionary" tech, we must not sideline #accountability. 🔹 AI must be designed carefully with clearly established boundaries as lack of it encourages and normalizes aggressive behavior. 🔹 AI should detect "unhealthy" discussions—not respond passively, or worse, provide affirmations that reinforce toxicity. Google and Apple got this right when they shifted their assistants’ responses from vague neutrality to firm boundaries ("No"). 🔹 Users engaging in harmful behavior should be redirected to appropriate behavioral resources. 🔹 Awareness and communication play a key role—highlight the limitations of such bots. AI should come with clear disclaimers—no matter how human-like their responses seem. At the end of the day, AI must be built with human oversight. I have been long advocating for "#ResponsibleAI" — that essentially means building AI responsibly in ways that uplift, not degrade.

  • View profile for Brian Heater
    13,472 followers

    Researchers at the University of California San Diego's medical and robotics department just debuted Surgie, a humanoid robot surgeon. Using Unitree Robotics' G1 system, the team is exploring the form factor for teleoperated procedures, remotely controlled by surgeons utilizing HTC’s handheld Vive controllers. Teleoperation in and of itself is nothing new for the space -- Intuitive’s da Vinci system received more than a quarter century ago. The form factor is quite novel, however. Humanoid robots could be useful in performing medical procedures for many of the same reasons they’ve begun taking off in an industrial setting. Chief among these is adaptability. As with factory floor and warehouse systems, surgical robots tend to be constrained by hardware. On the other hand, a humanoid system can – in theory – do anything a human can, with the right control. Full paper: https://lnkd.in/epK9cT8s

Explore categories