Yan Chang – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-04-03T18:45:30Z http://www.open-lab.net/blog/feed/ Yan Chang <![CDATA[R2D2: Advancing Robot Mobility and Whole-Body Control with Novel Workflows and AI Foundation Models from NVIDIA Research]]> http://www.open-lab.net/blog/?p=98193 2025-04-03T18:45:30Z 2025-03-27T15:00:00Z Welcome to the first edition of the NVIDIA Robotics Research and Development Digest (R2D2). This technical blog series will give developers and researchers...]]>

Welcome to the first edition of the NVIDIA Robotics Research and Development Digest (R2D2). This technical blog series will give developers and researchers deeper insight and access to the latest physical AI and robotics research breakthroughs across various NVIDIA Research labs. Developing robust robots presents significant challenges, such as: We address these challenges through…

Source

]]>
Yan Chang <![CDATA[Advancing Humanoid Robot Sight and Skill Development with NVIDIA Project GR00T]]> http://www.open-lab.net/blog/?p=91333 2024-11-14T17:10:46Z 2024-11-06T16:00:00Z Humanoid robots present a multifaceted challenge at the intersection of mechatronics, control theory, and AI. The dynamics and control of humanoid robots are...]]>

Humanoid robots present a multifaceted challenge at the intersection of mechatronics, control theory, and AI. The dynamics and control of humanoid robots are complex, requiring advanced tools, techniques, and algorithms to maintain balance during locomotion and manipulation tasks. Collecting robot data and integrating sensors also pose significant challenges, as humanoid robots require a fusion of…

Source

]]>
Yan Chang <![CDATA[Using Generative AI to Enable Robots to Reason and Act with ReMEmbR]]> http://www.open-lab.net/blog/?p=88932 2024-11-07T05:08:39Z 2024-09-23T20:01:55Z Vision-language models (VLMs) combine the powerful language understanding of foundational LLMs with the vision capabilities of vision transformers (ViTs) by...]]>

Vision-language models (VLMs) combine the powerful language understanding of foundational LLMs with the vision capabilities of vision transformers (ViTs) by projecting text and images into the same embedding space. They can take unstructured multimodal data, reason over it, and return the output in a structured format. Building on a broad base of pretraining, they can be easily adapted for…

Source

]]>
���˳���97caoporen����