Humanoid robots are designed to adapt to human workspaces, tackling repetitive or demanding tasks. However, creating general-purpose humanoid robots for real-world tasks and unpredictable environments is challenging. Each of these tasks often requires a dedicated AI model. Training these models from scratch for every new task and environment is a laborious process due to the need for vast task-specific data, high computational cost, and limited generalization.
NVIDIA Isaac GR00T helps tackle these challenges and accelerates general-purpose humanoid robot development by providing you with open-source SimReady data, simulation frameworks such as NVIDIA Isaac Sim and Isaac Lab, synthetic data blueprints, and pretrained foundation models.
NVIDIA Isaac GR00T N1 features and benefits
NVIDIA Isaac GR00T N1 is the world’s first open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.
GR00T N1 was trained on an expansive humanoid dataset, complemented by synthetic data generated using the components of the NVIDIA Isaac GR00T Blueprint and internet-scale video data. It is adaptable through post-training for specific embodiments, tasks, and environments. A subset of this data is now freely available to the developer community through the open-source NVIDIA physical AI dataset on Hugging Face.
GR00T N1 uses one model and set of weights to enable manipulation behaviors on humanoid robots, such as the Fourier GR-1 and 1X Neo. It demonstrates robust generalization across a range of tasks, including grasping and manipulating objects with one or both arms, as well as transferring items between arms.

It can also execute complex, multi-step tasks that require sustained contextual understanding and the integration of diverse skills. These capabilities make it well-suited for applications in material handling, packaging, and inspection.
Today, NVIDIA announced the availability of the GR00T N1 2B model, the first in a series of fully customizable models that we will pretrain and release.
GR00T N1 model architecture
GR00T N1 features a dual-system architecture inspired by human cognition, consisting of the following complementary components:
- Vision-Language Model (System 2): This methodical thinking system is based on NVIDIA-Eagle with SmolLM-1.7B. It interprets the environment through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions.
- Diffusion Transformer (System 1): This action model generates continuous actions to control the robot’s movements, translating the action plan made by System 2 into precise, continuous robot movements.
These systems are tightly coupled, enabling them to be optimized together during post-training.

GR00T N1 data strategy for pretraining
Training a generalist model like GR00T N1 demands a robust data approach that leverages the complementary benefits of diverse data types. The GR00T N1 training data forms a pyramid, with data quantity decreasing and embodiment specificity increasing from base to peak.
- At the foundation, internet-scale web data and human videos provide a broad base of visual and linguistic information. These datasets capture human-object interactions, offering insights into natural motion patterns and task semantics.
- The middle layer incorporates synthetic data generated by the NVIDIA Omniverse platform.
- At the peak is real robot data collected through teleoperation on various platforms, offering precise insights into robotic capabilities.
Human-centered online videos provide valuable insights into human-object interactions but lack motor control signals for robots. Simulation data fills this gap with infinite, real-time data through GPU acceleration, though it faces a simulation-to-reality gap.
Real robot data bridges this gap but is costly and time-consuming. By combining this diverse data and using techniques such as latent action training, which teaches robots to learn from large-scale, unlabeled, human video data without supervision, a robust strategy emerges that enhances robot training, improving the performance and adaptability of GR00T N1.
This approach was put into practice using the NVIDIA Isaac GR00T blueprint. With it, over 750K synthetic trajectories were generated in just 11 hours, equivalent to 6.5K hours or nine continuous months of human demonstration data. The integration of this synthetic data with real data resulted in a 40% performance boost for GR00T N1 compared to using only real data.
Hands-on with GR00T N1
You can get started with GR00T N1 using the following steps:
- Data preparation: Format your robot demonstration data (video, state, action) triplets into a GR00T dataset, which is compatible with the Hugging Face LeRobot format.
- Data validation: Use the validation script to ensure that your data adheres to the correct format.
- Post-training: Use PyTorch scripts to fine-tune the pretrained GR00T N1 model with your custom dataset.
- Inference: Connect the inference script to your robot controller to execute the actions on your target hardware or your simulation environment using the post-trained GR00T N1 model.
- Evaluation: Run the evaluation scripts to get the task-success rate of the model.
Performance
The GR00T N1 models were evaluated using both simulated and real-world benchmarks to assess their performance in diverse robotic embodiments and manipulation tasks. Simulation experiments used three distinct benchmarks, while real-world tests focused on tabletop manipulation tasks with the GR-1 humanoid robot.
Simulation benchmarks
Three benchmarks are used for simulation experiments: two open-source ones from prior studies and a new suite mirroring real-world tabletop manipulation tasks, chosen to evaluate the models across different robot embodiments and diverse manipulation tasks.
RoboCasa | DexMG | GR-1 | Average | |
BC Transformer | 26.3% | 53.9% | 16.1% | 26.4% |
Diffusion Policy | 25.6% | 56.1% | 32.7% | 33.4% |
NVIDIA Isaac GR00T N1 2B | 32.1% | 66.5% | 50.0% | 45.0% |
Real benchmarks
The models were assessed on a variety of manipulation tasks that require precise object handling, coordinated two-handed movements, and advanced spatial awareness, allowing for refined control in intricate interactions.
Pick-and-Place | Articulated | Industrial | Coordination | Average | |
Diffusion Policy (10% Data) | 3.0% | 14.3% | 6.7% | 27.5% | 10.2% |
NVIDIA Isaac GR00T N1 2B (10% Data) | 35.0% | 62.0% | 31.0% | 50.0% | 42.6% |
Pick-and-Place | Articulated | Industrial | Coordination | Average | |
Diffusion Policy (Full Data) | 36.0% | 38.6% | 61.0% | 62.5% | 46.4% |
NVIDIA Isaac GR00T N1 2B (Full Data) | 82.0% | 70.9% | 70.0% | 82.5% | 76.8% |
Compared to the Diffusion Policy baseline, the Isaac GR00T N1 model demonstrates smoother and more fluid motion, alongside a marked improvement in grasping accuracy, particularly when fine-tuned on smaller post-training datasets.
Results further highlight that GR00T N1 not only learns new tasks more efficiently but also follows language instructions with greater precision than baseline methods.
Get started today
You can access the following resources to start working with GR00T N1:
- The NVIDIA Isaac GR00T-N1-2B model is available on Hugging Face.
- Sample datasets and PyTorch scripts for fine-tuning are available from the /NVIDIA/Isaac-GR00T GitHub repo.
Use the following resources for post-training and inference:
- For post-training, the minimum configuration is either one NVIDIA RTX A6000 or one NVIDIA GeForce RTX 4090 GPUs. For more demanding needs, suggested configurations include the NVIDIA DGX Spark or NVIDIA DGX H100 systems.
- For inference, the GR00T N1 model can be deployed on either the NVIDIA RTX A6000 GPU or the NVIDIA Jetson AGX Orin supercomputer.
For more information about the model, see the GR00T N1: An Open Foundation Model for Generalist Humanoid Robots whitepaper.
This model, combined with NVIDIA Isaac GR00T synthetic motion and data generation pipelines, along with simulation frameworks such as Isaac Lab and Isaac Sim, enables you to create general-purpose humanoid robots.
For more detailed information about NVIDIA Isaac GR00T, watch the GTC Keynote from NVIDIA CEO Jensen Huang and GTC key sessions, including An Introduction to Building Humanoid Robots.
Stay up to date by subscribing to the newsletter and following NVIDIA Robotics on YouTube, Discord, and developer forums.