R2D2: Advancing Robot Mobility and Whole-Body Control with Novel Workflows and AI Foundation Models from NVIDIA Research

Welcome to the first edition of the NVIDIA Robotics Research and Development Digest (R²D²). This technical blog series will give developers and researchers deeper insight and access to the latest physical AI and robotics research breakthroughs across various NVIDIA Research labs.

Developing robust robots presents significant challenges, such as:

Data Scarcity: Generating diverse, real-world training data for AI models.
Adaptability: Ensuring solutions generalize across varied robot types and environments, and adapt to dynamic, unpredictable settings.
Integration: Effectively combining mobility, manipulation, control, and reasoning.

We address these challenges through advanced research validated on our platforms. Our approach combines cutting-edge research with engineering workflows, tested on our AI and robotics platforms including NVIDIA Omniverse, Cosmos, Isaac Sim, and Isaac Lab. The resulting models, policies, and datasets serve as customizable references for the research and developer community to adapt to specific robotics needs. We look forward to sharing our discoveries and building the future of robotics together.

In this edition of R²D², you’ll learn about the following robot mobility and whole-body control workflows and models, and how they address key robot navigation, mobility, and control challenges:

MobilityGen: A simulation-based workflow that uses Isaac Sim to rapidly generate large synthetic motion datasets for building models for robots across different embodiments and environments, as well as testing robots to navigate new environments, reducing costs and time compared to real-world data collection.
COMPASS (Cross-embOdiment Mobility Policy via ResiduAl RL and Skill Synthesis): A workflow for developing cross-embodiment mobility policies, facilitating fine-tuning using Isaac Lab, and zero-shot sim-to-real deployment.
HOVER (Humanoid Versatile Controller): A workflow and a unified whole-body control generalist policy for diverse control modes in humanoid robots in Isaac Lab.
ReMEmbR (a Retrieval-augmented Memory for Embodied Robots): A workflow that enables robots to reason and take mobility action, using LLMs, VLMs, and RAG (Retrieval-Augmented Generation).

NVIDIA robot mobility workflows and AI models

Mobile robots, like humanoid robots, quadrupeds, and autonomous mobile robots (AMRs), are increasingly used in diverse environments, necessitating robust navigation systems that operate safely in both mapped and unknown settings while avoiding obstacles and reducing downtime. Current navigation software struggles with adaptability, as algorithms differ significantly between robot types (e.g., AMRs vs. humanoid robots) and require extensive fine-tuning for environmental changes, increasing engineering complexity, and hindering scalability.

Video 1. NVIDIA robot mobility workflows and AI models

NVIDIA Research addresses these challenges by developing AI-driven end-to-end foundation models, efficient data-generation pipelines, and training workflows that enable zero-shot deployment, allowing robots to navigate cluttered spaces without relying on costly sensors.

High-level diagram showing major steps in the mobility workflow. The first step is data generation in simulation, the second step is training and fine-tuning models on the generated data and the third step is testing and deploying the model on the real robot. — *Figure 1. The mobility workflow includes three major steps, data generation, training and fine-tuning models on the generated data, and testing models before zero-shot deployment on the real robot.*

MobilityGen for Data Generation

MobilityGen is a workflow that uses NVIDIA Isaac Sim to easily generate synthetic motion data for mobile robots, including humanoid robots, quadrupeds, and wheeled robots. You can use this data to train and test robot mobility models as well as perception algorithms — solving the problem of data scarcity for training robots.

MobilityGen helps add diversity to datasets by enabling users to:

Add dynamic objects
Add robot action data
Combine human demonstrations
Augment data (e.g. lighting conditions)

MobilityGen provides ground-truth data in the form of occupancy maps, pose information, velocity information, RGB, depth and segmentation images, and customizable action and rendered data. It supports data collection methods including keyboard or gamepad teleoperation, and automated random actions or customizable path planning.

By tackling data scarcity, MobilityGen strengthens the perception and mobility foundations of an integrated robotics stack. Learn more about MobilityGen and how to generate a locomotion and navigation dataset for the Unitree H1 humanoid robot using teleoperation in this free self-paced Deep Learning Institute (DLI) course.

High-level steps included in synthetic data generation for robot mobility. — Figure 2. Synthetic data generation for robot mobility includes four main steps; building or importing an environment in simulation, importing your robot’s model, moving the robot in simulation and recording trajectories, and finally rendering the data to use for training and testing.

Video 2. MobilityGen User With Isaac Sim for Synthetic Data Generation

COMPASS for cross-embodiment mobility policies

COMPASS is a workflow for developing cross-embodiment mobility policies. It provides a generalizable end-to-end mobility workflow and models that enable zero-shot simulation-to-real deployment across multiple robot embodiments. This aims to solve the problem of scaling due to slow development and testing cycles for roboticists.

COMPASS integrates vision-based end-to-end imitation learning (IL) with X-Mobility, residual reinforcement learning (RL) in Isaac Lab, and policy distillation methods to scale across different robot platforms. While the IL-based X-Mobility policy is pre-trained on a specific embodiment from data generated using MobilityGen, the generalist policy from COMPASS can achieve a 5x higher success rate for different embodiments. This enables different robots to navigate efficiently in complex environments using the unified policy. It also gives users the flexibility and convenience for fine-tuning the policy for specific embodiments and environments.

Image of the components in COMPASS showing three blocks for imitation learning, residual RL and cross-embodiment distillation — *Figure 3. COMPASS workflow*

The first stage of the workflow uses world modeling with IL-based methods to train a representation of mobility “common sense” for environmental states and actions. Some examples of such “common sense” are world dynamics understanding, obstacle detection and avoidance, path planning, and environmental awareness.

The second stage uses residual RL to incrementally refine the IL policy from the first step into an embodiment-specific specialist. The third stage uses data from each specialist and merges them into a cross-embodiment model using policy distillation. In this way, the expertise of each specialist is baked into the final distilled policy, increasing adaptability across different platforms.

COMPASS achieves zero-shot multi-robot interaction, showcasing how robots function in different environments. It can also be used to connect with a loco-manipulation controller for loco-manipulation related tasks.

Video 3. Humanoid robots using the COMPASS policy

By tackling generalizability across embodiments, COMPASS strengthens the mobility foundation of an integrated robotics stack.

HOVER for Humanoid Robot Whole-Body Control

So far, we’ve learned about mobility policies to enable robots to move from one point to a goal position. This isn’t enough for robust motion — we additionally need to enable balance and full-body control for safe and smooth movement. HOVER aims to provide a reference workflow for this.

Traditionally, humanoid robots need different modes of control to perform diverse tasks, like velocity tracking for navigation and upper-body joint tracking for tabletop manipulation. HOVER is a workflow trained in Isaac Lab that consolidates all these modes of control into a unified policy for humanoid robots. Other controllers can also be used in place of HOVER for robots with the other workflows explained in this blog.

By integrating the complexity of a humanoid robots’ several moving parts into a unified neural whole-body controller, HOVER strengthens the control foundation of an integrated robotics stack. HOVER (Humanoid Versatile Controller) is a multi-mode policy distillation framework that unifies diverse control modes into a single policy, enabling seamless transitions between them. An oracle policy is trained to mimic human motion data using RL, then a policy distillation process is used to transfer skills from the oracle policy to a generalist policy.

HOVER code also includes a working example of deployment code for use on Unitree H1 robots. It enables users with access to a robot to replicate the motion and stability showcased in the figures below.

video of HOVER policy is trained in Isaac Lab, tested in MuJoCo and deployed to a real robot — *Figure 4. The HOVER policy is trained in Isaac Lab, tested in* *MuJoCo* *and deployed to a real robot; (left) tested in simulation using MuJoCo and (right) deployed to the real robot.*

A humanoid robot balances while executing some arm motions. — *Figure 5. The HOVER policy executes an arm motion while maintaining balance.*

ReMEmbR for robot reasoning

The workflows we’ve explored so far address dataset creation, mobility policies, and whole-body control for humanoid robots. To achieve full autonomous mobility with conversational intelligence, we need to integrate robot reasoning and cognition. How can a robot remember what it has seen in an environment and act accordingly, based on user input?

ReMEmbR is a workflow that combines LLMs, VLMs and RAG (Retrieval-Augmented Generation) to enable robots to reason, answer questions, and take navigation actions in large areas using long-time memory action. This works as a “memory” for embodied robots that helps with perception-based question-answering and semantic action-taking.

High-level diagram of the ReMEmbR workflow. There is a Memory Building Phase that takes a video and prompt and stores the embeddings in a database. The second phase is the Querying Phase, that takes a user’s question and uses relevant information from the database to generate an answer in natural language. — *Figure 6. The ReMEmbR workflow*

ReMEmbR can be used to provide input to the other workflows we learnt about in this blog, bringing all of them together to help solve the complexities of robot mobility. We also released the NaVQA dataset (Navigation Visual Question Answering) for evaluation, which comprises examples featuring spatial, temporal, and descriptive questions with various output types.

A high-level diagram showing how the following four workflows fit together - MobilityGen, ReMEmbR, COMPASS, and HOVER. MobilityGen is used to generate training data in Isaac Sim. ReMEmbR provides the ‘robot memory’ and long horizon reasoning . The data from MobilityGen is used to train the COMPASS policy in Isaac Lab. Once you have a trained policy, you can use a controller like HOVER for smooth robot movement. — *Figure 7. Four workflows fit together; MobilityGen, ReMEmbR, COMPASS, and HOVER.*

By bringing the power of LLMs and VLMs to tackle reasoning, ReMEmbR strengthens the reasoning and adaptability of an integrated AI-based robotics stack.

Ecosystem Adoption

Leading organizations in humanoid robotics, warehouse automation, and autonomous systems are adopting NVIDIA’s research workflows to accelerate development and achieve breakthroughs in scalability and adaptability.

UCR (Under Control Robotics) integrated X-Mobility to guide its robot, Moby, seamlessly to its destination. This modular system proved adaptable to industrial tasks like data collection, material handling, and automating high-risk operations.
Advantech in collaboration with ADATA and Ubitus adopted ReMEmbR to enable their robots to reason and act based on extended observations.

Getting Started

Ready to dive in? Explore these additional resources:

MobilityGen: GitHub and DLI Tutorial
COMPASS: Project website, Paper, and GitHub
X-Mobility: Project website, Paper, and GitHub
HOVER: Project website, Paper, and GitHub
ReMEmbR and NaVQA dataset: Project website, Paper, GitHub and blog

This post is part of our NVIDIA Robotics Research and Development Digest (R²D²) to give developers deeper insight into the latest breakthroughs from NVIDIA Research across physical AI and robotics applications.

Learn more about NVIDIA Research and stay up to date by subscribing to the newsletter and following NVIDIA Robotics on YouTube, Discord, and developer forums. To start your robotics journey, enroll in our free NVIDIA Robotics Fundamentals courses today.

Acknowledgements

Thanks to Abrar Anwar, Joydeep Biswas, Yan Chang, Jim Fan, Pulkit Goyal, Lionel Gulich, Tairan He, Rushane Hua, Neel Jawale, Zhenyu Jiang, Jan Kautz, H. Hawkeye King, Chenran Li, Michael Lin, Toru Lin, Changliu Liu, Wei Liu, Zhengyi Luo, Billy Okal, Stephan Pleines, Soha Pouya, Guanya Shi, Shri Sundaram, Peter Varvak, Xiaolong Wang, John Welsh, Wenli Xiao, Zhenjia Xu, Huihua Zhao, and Yuke Zhu for their contributions to the research papers mentioned in this blog.