Note: As of January 6, 2025, VILA is now part of the Cosmos Nemotron VLM family. NVIDIA is proud to announce the release of NVIDIA Cosmos Nemotron, a family of state-of-the-art vision language models (VLMs) designed to query and summarize images and videos from physical or virtual environments. Cosmos Nemotron builds upon NVIDIA’s groundbreaking visual understanding research including VILA…
]]>Note: As of January 6, 2025 VILA is now part of the new Cosmos Nemotron vision language models. Visual language models have evolved significantly recently. However, the existing technology typically only supports one single image. They cannot reason among multiple images, support in context learning or understand videos. Also, they don’t optimize for inference speed. We developed VILA…
]]>