{"id":4553,"date":"2026-05-28T14:46:01","date_gmt":"2026-05-28T14:46:01","guid":{"rendered":"https:\/\/salarydistribution.com\/machine-learning\/2026\/05\/28\/nvidia-research-advances-robotics-from-simulation-to-the-real-world\/"},"modified":"2026-05-28T14:46:01","modified_gmt":"2026-05-28T14:46:01","slug":"nvidia-research-advances-robotics-from-simulation-to-the-real-world","status":"publish","type":"post","link":"https:\/\/salarydistribution.com\/machine-learning\/2026\/05\/28\/nvidia-research-advances-robotics-from-simulation-to-the-real-world\/","title":{"rendered":"NVIDIA Research Advances Robotics From Simulation to the Real World"},"content":{"rendered":"<div>\n<p><span>Robotics is entering a new phase: moving from controlled demos and scripted automation toward generalizable, reliable embodied autonomy in the real world.\u00a0<\/span><\/p>\n<p><span>At the <\/span><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/events\/icra\/\" rel=\"noopener\"><span>International Conference on Robotics and Automation (ICRA)<\/span><\/a><span>, eight of NVIDIA Research\u2019s 28 accepted papers show how simulation-to-real transfer is becoming a foundation for that shift, helping robots perceive, reason, plan and act across dynamic, unpredictable environments.<\/span><\/p>\n<p><span>Together, the papers span the full stack of challenges robot developers face: coordinating multiple arms in parallel, building policies that generalize across robot bodies, grasping novel objects in clutter, performing precise assembly and developing vision-language-action models that reason before they move.\u00a0<\/span><\/p>\n<p><span>The throughline is clear: sim-to-real is becoming a foundation for robots that can adapt, generalize, and operate with greater reliability outside the lab.<\/span><\/p>\n<h2><b>Coordinating Arms, Navigating Bodies, Grasping Objects<\/b><\/h2>\n<p><span>Picture a pharmaceutical lab run by robotic arms: picking up tubes, transferring liquids, mixing reagents \u2014 each step taking different amounts of time, all requiring careful coordination.\u00a0<\/span><\/p>\n<p><span>Traditional robot scheduling software handles those steps sequentially, one arm at a time.\u00a0<\/span><\/p>\n<p><b>ScheduleStream<\/b><span> changes that by running computations on GPUs, letting multiple arms plan movements and operate in parallel. The result \u2014 a 3x speedup across multi-arm planning scenarios, on hardware like the <\/span><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/autonomous-machines\/embedded-systems\/\" rel=\"noopener\"><span>NVIDIA Jetson<\/span><\/a><span> edge AI platform. Code for the framework is available on <\/span><a target=\"_blank\" href=\"https:\/\/github.com\/NVlabs\/ScheduleStream\" rel=\"noopener\"><span>GitHub<\/span><\/a><span>. <\/span><\/p>\n<p>\u00a0<\/p>\n<p><span>A robot that learns to navigate through a space \u2014 avoiding obstacles and finding its destination \u2014 usually learns to do it in one body. Put the same navigation software into a differently shaped robot and it often falls apart, because its parts all move differently.\u00a0<\/span><\/p>\n<p><span>The <\/span><b>COMPASS<\/b><span> policy framework solves this by first building the baseline navigation functionality using imitation learning and then using residual <\/span><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/reinforcement-learning\/\" rel=\"noopener\"><span>reinforcement learning<\/span><\/a><span> in <\/span><a target=\"_blank\" href=\"https:\/\/developer.nvidia.com\/isaac\/lab\" rel=\"noopener\"><span>NVIDIA Isaac Lab<\/span><\/a><span> to build specialists for diverse robot embodiments. Crucially, no real-world robot data is involved at any stage: everything is trained in Isaac Lab simulation.\u00a0<\/span><\/p>\n<p><span>Compared with an imitation learning baseline, COMPASS achieved a 4.5x improvement in average success rate. It also seamlessly transfers to real-world environments, demonstrating around 80% success across 20 real-world navigation trials on autonomous mobile robots and humanoids.\u00a0<\/span><\/p>\n<p><span>COMPASS is <\/span><a target=\"_blank\" href=\"https:\/\/github.com\/NVlabs\/COMPASS\/tree\/main\/.claude\/skills\" rel=\"noopener\"><span>agent-friendly<\/span><\/a><span>, with dedicated skills \u2014 and developers can connect the pipeline with <\/span><a target=\"_blank\" href=\"https:\/\/developer.nvidia.com\/omniverse\/nurec\" rel=\"noopener\"><span>NVIDIA Omniverse NuRec<\/span><\/a><span> to post-train and validate robots in a <\/span><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/digital-twin\/\" rel=\"noopener\"><span>digital twin<\/span><\/a><span> of a novel environment before deployment.\u00a0<\/span><\/p>\n<p><span>Most grasping systems identify the object, predict a grasp, plan a path, then execute. But the last few centimeters are where small errors matter most.<\/span><\/p>\n<p><b>Grasp-MPC <\/b><span>adaptively computes robotic grasps, continuously correcting the robot\u2019s motion as it closes in on the object, rather than carrying out a fixed plan \u2014 the way a person grabs something by feeling rather than calculating every joint angle in advance.<\/span><\/p>\n<p><span>To build the policy, the researchers generated 2 million simulated trajectories across 8,000 objects using annotations from the <\/span><a target=\"_blank\" href=\"https:\/\/github.com\/NVlabs\/GraspGen\" rel=\"noopener\"><span>GraspGen<\/span><\/a><span> dataset and motion planning data from <\/span><a target=\"_blank\" href=\"https:\/\/github.com\/nvlabs\/curobo\" rel=\"noopener\"><span>cuRobo<\/span><\/a><span>, a CUDA-accelerated library for robot motion generation.\u00a0<\/span><\/p>\n<p><span>After training on both successful and failed trajectories, Grasp-MPC learned to grasp novel objects in cluttered tabletops and shelves \u2014 achieving around 75% overall success on real robots, compared with a baseline of 41%.<\/span><\/p>\n<p>\u00a0<\/p>\n<p><b>Deformable Cluster Manipulation<\/b><span> introduces a framework that tackles a parallel challenge: enabling systems to grasp not just one object, but a whole bundle of flexible, tangled material at once.\u00a0<\/span><\/p>\n<p><span>The framework was motivated by a real-world task: clearing a mass of tree branches that have grown over a power line, where there\u2019s no single clean object to grab. The system uses its entire arm, not just the gripper: wrapping it around the branch cluster and sweeping it aside, the way someone might gather an armful of cables or push a tangle of brush out of the way.\u00a0<\/span><\/p>\n<p><span>The researchers built a tree generator using biological growth equations to create synthetic trees of many different shapes and sizes \u2014 then trained the system across thousands of them in <\/span><a target=\"_blank\" href=\"https:\/\/developer.nvidia.com\/isaac\" rel=\"noopener\"><span>NVIDIA Isaac <\/span><\/a><span>open simulation frameworks.\u00a0<\/span><\/p>\n<p><span>The policy deploys to real branches zero shot. Beyond power lines, the researchers see potential in cable management, agricultural inspection and anywhere robots need to handle a tangle rather than a single graspable item.<\/span><\/p>\n<figure id=\"attachment_93448\" aria-describedby=\"caption-attachment-93448\" class=\"wp-caption aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-93448\" src=\"https:\/\/blogs.nvidia.com\/wp-content\/uploads\/2026\/05\/Cluster-Manipulation-1680x622.jpg\" alt=\"\" width=\"1200\" height=\"444\"><figcaption id=\"caption-attachment-93448\" class=\"wp-caption-text\">Clearing tree branches in zero-shot sim-to-real deployment.<\/figcaption><\/figure>\n<h2><b>Assembling With Precision<\/b><\/h2>\n<p><span>Precise assembly \u2014 threading a nut onto a bolt, inserting a gear onto a gearshaft, pressing a peg into a hole \u2014 is notoriously hard to get right with simulation alone.\u00a0<\/span><\/p>\n<p><span>The real world is complex. Real surfaces aren\u2019t perfectly smooth. Sensors don\u2019t behave as specified. Tiny discrepancies that a simulator ignores can stop a robot in its tracks.<\/span><\/p>\n<p><span>The <\/span><b>SPARR<\/b><span> method addresses this by splitting the job in two. A policy trained in Isaac Lab learns the general strategy for the assembly task in simulation. Then, on the actual hardware, a second layer learns to correct for whatever the simulator got wrong \u2014 using the robot\u2019s own camera and without any human demonstrations or guidance.\u00a0<\/span><\/p>\n<p><span>SPARR improves success rates by 38% and reduces cycle time by around 30% compared with zero-shot sim-to-real baselines.\u00a0<\/span><\/p>\n<p><span>On National Institute of Standards and Technology (NIST) assembly tasks not seen during training, success improves by nearly 75% \u2014 approaching the results of methods that require a human in the loop.<\/span><\/p>\n<p><span>The <\/span><b>Refinery<\/b><span> framework takes on the next layer of difficulty in assembly: tasks with multiple sequential steps, where how step one is finished determines whether step two is even possible. It\u2019s like assembling furniture \u2014 leave a panel at the wrong angle, and the next fastener won\u2019t go in.\u00a0<\/span><\/p>\n<p><span>By understanding how success varies across initial conditions and training across hundreds of simulated assembly scenarios, Refinery learns how to complete each step and leave each component in a position that sets up the next. It achieves 91% simulation success and a nearly 11% mean improvement over baselines with comparable real-world results \u2014 and its policies can be chained to handle long, multi-part sequences.<\/span><\/p>\n<h2><b>Action Models That Keep Their Word<\/b><\/h2>\n<p><span>The <\/span><b>PEEK<\/b><span> pipeline helps robots see past the clutter. In a typical manipulation task, the robot\u2019s camera picks up everything in the scene \u2014 but most of it is irrelevant noise.\u00a0<\/span><\/p>\n<p><span>One task demonstrated on the PEEK project page is \u201cgive the banana to NVIDIA founder and CEO Jensen Huang\u201d: a photo of Huang sits on a table alongside a photo of Michael Jordan, a collection of unrelated objects and other distractors.\u00a0<\/span><\/p>\n<p><span>A human doing the task instantly focuses on the banana and the right photo; a standard robot policy has to process everything and often gets confused. PEEK solves this by having a vision language model read the task instruction and focus the robot\u2019s line of vision accordingly \u2014 showing a movement path, and highlighting around the objects that matter, while fading out everything else.\u00a0<\/span><\/p>\n<p><span>The policy then acts on that annotated view rather than the raw scene. For a policy trained purely in simulation, adding PEEK produced a 41x real-world improvement in accuracy. For large VLA models and smaller policies, gains range from 2-3.5x. Because it works at the image level, PEEK integrates with any camera-based policy without modification.<\/span><\/p>\n<p>\u00a0<\/p>\n<p><b>Do What You Say<\/b><span> \u2014 a collaboration with researchers at Carnegie Mellon University, University of Utah and University of Sydney \u2014 addresses a specific failure mode that matters more as robots tackle longer, more complex tasks.\u00a0<\/span><\/p>\n<p><span>Give a robot an instruction like \u201cstore everything on this table inside the cabinet\u201d or \u201cprepare a Manhattan,\u201d and it has to break that down into individual steps and execute them in sequence.\u00a0<\/span><\/p>\n<p><span>The problem is that the AI model can correctly reason through what it needs to do \u2014 and then execute something different.\u00a0<\/span><\/p>\n<p><span>The method, called SEAL, fixes this at runtime without any retraining: the robot generates several candidate action sequences, thinks through where each one would actually lead and picks the outcome that matches what it said it would do. SEAL delivers up to 15% accuracy gains over prior work, with robustness against rephrased instructions, changed objects, scene clutter and shifted camera angles.<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span>In addition to papers, NVIDIA is expanding <\/span><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/research\/robotics\/\" rel=\"noopener\"><span>robotics research<\/span><\/a><span> infrastructure with large-scale open datasets for robotics. <\/span><span>The <\/span><a target=\"_blank\" href=\"https:\/\/huggingface.co\/collections\/nvidia\/physical-ai\" rel=\"noopener\"><span>NVIDIA Physical AI Dataset<\/span><\/a><span> is the world\u2019s largest open dataset for physical development, surpassing 15 million+ downloads, while <\/span><a target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets\/nvidia\/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim\" rel=\"noopener\"><span>NVIDIA Isaac GR00T X Embodiment Sim<\/span><\/a><span> has become one of the most-downloaded robotics datasets. <\/span><span>\u00a0<\/span><\/p>\n<h2><b>Universities Accelerate Physical AI Research With NVIDIA Technologies<\/b><\/h2>\n<p><span>Robotics teams from universities such as Carnegie Mellon University (CMU), ETH Zurich, MIT and University of Texas at Austin are tapping NVIDIA technologies to move physical AI research from simulation to real-world systems \u2014 with nearly 50 accepted papers referencing NVIDIA-accelerated simulation, robot learning and compute.<\/span><\/p>\n<p><span>Examples include a paper from CMU demonstrating a <\/span><a target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2603.03740\" rel=\"noopener\"><span>robotic control framework<\/span><\/a><span> trained in NVIDIA Isaac Lab and MIT work on <\/span><a target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2511.14565\" rel=\"noopener\"><span>large language model-guided reinforcement learning<\/span><\/a><span> powered by NVIDIA GPUs.<\/span><\/p>\n<p><i><span>Explore <\/span><\/i><a target=\"_blank\" href=\"https:\/\/research.nvidia.com\/\" rel=\"noopener\"><i><span>NVIDIA Research\u2019s physical AI work<\/span><\/i><\/a><i><span>. Developers can get started with <\/span><\/i><a target=\"_blank\" href=\"https:\/\/developer.nvidia.com\/isaac\" rel=\"noopener\"><i><span>Isaac Lab and Isaac Sim<\/span><\/i><\/a><i><span>.<\/span><\/i><\/p>\n<p><i><span>Stay up to date by subscribing to our <\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/industries\/robotics\/robotics-stay-informed\/\" rel=\"noopener\"><i><span>newsletter<\/span><\/i><\/a><i><span>, and following NVIDIA Robotics on<\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.linkedin.com\/showcase\/nvidiarobotics\/\" rel=\"noopener\"><i><span> LinkedIn<\/span><\/i><\/a><i><span>, <\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.instagram.com\/nvidiarobotics\/\" rel=\"noopener\"><i><span>Instagram<\/span><\/i><\/a><i><span>, <\/span><\/i><a target=\"_blank\" href=\"https:\/\/x.com\/NVIDIARobotics\" rel=\"noopener\"><i><span>X<\/span><\/i><\/a><i><span> and <\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.facebook.com\/NVIDIARobotics\" rel=\"noopener\"><i><span>Facebook<\/span><\/i><\/a><i><span>.<\/span><\/i><\/p>\n<p><i><span>To start your robotics journey, enroll in our free NVIDIA <\/span><\/i><a target=\"_blank\" href=\"https:\/\/www.nvidia.com\/en-us\/learn\/learning-path\/robotics\/\" rel=\"noopener\"><i><span>Robotics Fundamentals courses<\/span><\/i><\/a><i><span> today.<\/span><\/i><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/blogs.nvidia.com\/blog\/icra-research-robotics-simulation-to-real-world\/<\/p>\n","protected":false},"author":0,"featured_media":4554,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/4553"}],"collection":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/comments?post=4553"}],"version-history":[{"count":0,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/posts\/4553\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media\/4554"}],"wp:attachment":[{"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/media?parent=4553"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/categories?post=4553"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/salarydistribution.com\/machine-learning\/wp-json\/wp\/v2\/tags?post=4553"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}