Robotics data is expensive and slow to collect. A lot of videos are available online, but not readily usable by robotics because of lack of action labels. AMPLIFY solves this problem by learning Actionless Motion Priors that unlock better sample efficiency, generalization, and scaling for robot learning. Our key insight is to factor the problem into two stages: The "what": Predict the visual dynamics required to accomplish a task The "how": Map predicted motions to low-level actions This decoupling enables remarkable generalizability: our policy can perform tasks where we have NO action data, only videos. We outperform SOTA BC baselines on this by 27x 🤯 AMPLIFY is composed of three stages: 1. Motion Tokenization: We track dense keypoint grids through videos and compress their trajectories into discrete motion tokens. 2. Forward Dynamics: Given an image and task description (e.g., "open the box"), we autoregressively predict a sequence of motion tokens representing how keypoints should move over the next second or so. This model can train on ANY text-labeled video data - robot demonstrations, human videos, YouTube videos. 3. Inverse Dynamics: We decode predicted motion tokens into robot actions. This module learns the robot-specific mapping from desired motions to actions. This part can train on ANY robot interaction data - not just expert demonstrations (think off-task data, play data, or even random actions). So, does it actually work? Few-shot learning: Given just 2 action-annotated demos per task, AMPLIFY nearly doubles SOTA few-shot performance on LIBERO. This is possible because our Actionless Motion Priors provide a strong inductive bias that dramatically reduces the amount of robot data needed to train a policy. Cross-embodiment learning: We train the forward dynamics model on both human and robot videos, but the inverse model sees only robot actions. Result: 1.4× average improvement on real-world tasks. Our system successfully transfers motion information from human demonstrations to robot execution. And now my favorite result: AMPLIFY enables zero-shot task generalization. We train on LIBERO-90 tasks and evaluate on tasks where we’ve seen no actions, only pixels. While our best baseline achieves ~2% success, AMPLIFY reaches a 60% average success rate, outperforming SOTA behavior cloning baselines by 27x. This is a new way to train VLAs for robotics which dont always start with large scale teleoperation. Instead of collecting millions of robot demonstrations, we just need to teach robots how to read the language of motion. Then, every video becomes training data. led by Jeremy Collins & Loránd Cheng in collaboration with Kunal Aneja, Albert Wilcox, Benjamin Joffe at College of Computing at Georgia Tech Check out our paper and project page for more details: 📄 Paper: https://lnkd.in/eZif-mB7 🌐 Website: https://lnkd.in/ezXhzWGQ
Zero-Shot Classification Techniques for Robotic Motion Control
Explore top LinkedIn content from expert professionals.
Summary
Zero-shot classification techniques for robotic motion control allow robots to perform new tasks or handle unfamiliar objects without needing extensive retraining, by learning from examples that don’t have direct action labels. These methods use video data and prompt guidance so robots can generalize their skills across different environments and challenges, reducing the need for large, specialized datasets.
- Use diverse data: Train robots with videos and demonstrations, even if they lack specific action details, so the robot can learn basic motion patterns that apply to many tasks.
- Apply prompt guidance: Guide robots in real time by providing new goals or hazards to avoid using simple prompts, enabling safe and flexible responses to changing situations.
- Enable cross-environment generalization: Equip robotic systems to adapt to both human and robot actions, allowing them to perform in new environments with minimal extra training.
-
-
What if a robot hand could grasp over 500 unseen objects, using just a single camera view? [⚡Join 2400+ Robotics enthusiasts - https://lnkd.in/dYxB9iCh] A paper by Hui Zhang, Zijian WU, Linyi Huang, Sammy Christen, and Jie Song from ETH Zürich and The Hong Kong University of Science and Technology introduces a zero-shot dexterous grasping system that generalises from simulation to real-world objects. "RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception" • Achieves 94.6% success on 512 real-world objects, trained on only 35 simulated objects • Uses a hand-centric representation based on dynamic distance vectors between finger joints and object surfaces • Employs a mixed curriculum learning strategy: imitation learning from a privileged teacher policy, followed by reinforcement learning under disturbances • Demonstrates robustness to observation noise, actuator inaccuracies, and external forces • Enables zero-shot grasping in cluttered environments and task-driven manipulation guided by vision-language models This approach enhances the adaptability of robotic hands, allowing for reliable grasping without extensive prior knowledge of object properties. It opens avenues for deploying dexterous robots in unstructured environments with minimal training data. If robots can grasp novel objects with such reliability, what complex manipulation tasks should we tackle next? Paper: https://lnkd.in/eb3itwhF Project Page: https://lnkd.in/efbaBz4H #DexterousManipulation #ReinforcementLearning #ZeroShotLearning #RoboticsResearch #ICRA2025
-
📢 🧬 New paper drop: "Prompting Decision Transformers for Zero-Shot Reach-Avoid Policies" by stellar PhD student Kevin Li Massachusetts Institute of Technology Harvard Medical School https://lnkd.in/eWbtEyVy Imagine an agent that can reach any goal while avoiding danger, without retraining, even when the hazards change. That's the reach-avoid challenge. Think self-driving cars dodging new construction or cell therapies steering clear of tumorigenic states. Most RL methods hardwire the danger zones during training. Want to avoid something new? Retrain. Want to scale to new configurations? Retrain. But what if you could just tell the model what to avoid, on the fly? Enter RADT: Reach-Avoid Decision Transformer. It learns from suboptimal data. It uses no rewards or costs. It encodes goals and avoid regions as prompt tokens. And it generalizes zero-shot to new goals and hazards. 🧵👇 What is different here? RADT does not see rewards. Instead, it learns from relabeled offline trajectories. Each trajectory is framed as either a "good" or "bad" demonstration of avoiding specified regions. The prompt looks like this: ✅ Goal token ❌ One or more avoid tokens (can be of any shape/size) 🟢 Success/failure indicators You can mix, match, or modify the prompt at inference time. RADT will adapt, zero-shot. Benchmarks: FetchReach and MazeObstacle 🏗️ RADT beats baselines (even retrained ones!) at avoiding hazards and hitting targets Handles more avoid regions and larger ones, without ever seeing them in training Zero-shot generalization actually works Real-world applications: Cell reprogramming 🧬 Start with a fibroblast, reach a cardiomyocyte, and avoid dangerous intermediate states (e.g., tumorigenic ones). RADT reduces time spent in harmful expression states, even when avoidance is impossible, it minimizes exposure. Why it matters: Flexible deployment: same model, new avoid regions Reward-free: no need for hand-designed cost functions Works in both robotics and biology Helps in safety-critical settings where retraining is infeasible Limitations: It can only handle "box-shaped" avoid regions for now But the core idea, prompt-driven, reward-free, zero-shot control, is powerful and widely applicable. RADT is part of a bigger vision: general-purpose agents that follow high-level instructions about where to go and what to avoid, safely and efficiently. Read the paper: https://lnkd.in/eWbtEyVy 👏 Big kudos to Kevin Li for pushing the frontier on safe, compositional policy learning! Massachusetts Institute of Technology Harvard Medical School Department of Biomedical Informatics Harvard Medical School Kempner Institute at Harvard University Harvard Data Science Initiative Broad Institute of MIT and Harvard Harvard University
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development