VLMs

Apr 24, 2025

Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM

Researchers from the University College London (UCL) Deciding, Acting, and Reasoning with Knowledge (DARK) Lab leverage NVIDIA NIM microservices in their new...

7 MIN READ

Mar 19, 2025

MONAI Integrates Advanced Agentic Architectures to Establish Multimodal Medical AI Ecosystem

The growing volume and complexity of medical data—and the pressing need for early disease diagnosis and improved healthcare efficiency—are driving...

7 MIN READ

Mar 10, 2025

Streamline LLM Deployment for Autonomous Vehicle Applications with NVIDIA DriveOS LLM SDK

Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of...

7 MIN READ

Three icons leading to a computer monitor.

Feb 26, 2025

Building a Simple VLM-Based Multimodal Information Retrieval System with NVIDIA NIM

In today’s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined,...

15 MIN READ

Feb 26, 2025

Vision Language Model Prompt Engineering Guide for Image and Video Understanding

Vision language models (VLMs) are evolving at a breakneck speed. In 2020, the first VLMs revolutionized the generative AI landscape by bringing visual...

12 MIN READ

Feb 13, 2025

Upcoming Webinar: Unlocking Video Analytics With AI Agents

Master prompt engineering, fine-tuning, and customization to build video analytics AI agents.

1 MIN READ

Stylized image of JetPack connected to a monitor.

Jan 16, 2025

NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules

The introduction of the NVIDIA Jetson Orin Nano Super Developer Kit sparked a new age of generative AI for small edge devices. The new Super Mode delivered an...

12 MIN READ

Decorative image of icons and a molecular structure in green.

Jan 06, 2025

Build a Video Search and Summarization Agent with NVIDIA AI Blueprint

This post was originally published July 29, 2024 but has been extensively revised with NVIDIA AI Blueprint information. Traditional video analytics applications...

11 MIN READ

Dec 09, 2024

Just Released: NVIDIA VILA VLM

Now available in preview, NVIDIA VILA is an advanced multimodal VLM that provides visual understanding of multi-images and video.

1 MIN READ

Dec 03, 2024

Build an Agentic Video Workflow with Video Search and Summarization

Building a question-answering chatbot with large language models (LLMs) is now a common workflow for text-based interactions. What about creating an AI system...

11 MIN READ

Oct 31, 2024

Build Multimodal Visual AI Agents Powered by NVIDIA NIM

The exponential growth of visual data—ranging from images to PDFs to streaming videos—has made manual review and analysis virtually impossible....

11 MIN READ

Sep 25, 2024

Deploying Accelerated Llama 3.2 from the Edge to the Cloud

Expanding the open-source Meta Llama collection of models, the Llama 3.2 collection includes vision language models (VLMs), small language models (SLMs), and an...

6 MIN READ

Sep 23, 2024

Using Generative AI to Enable Robots to Reason and Act with ReMEmbR

Vision-language models (VLMs) combine the powerful language understanding of foundational LLMs with the vision capabilities of vision transformers (ViTs) by...

10 MIN READ

An illustration representing an AI model.

Jul 17, 2024

Develop Generative AI-Powered Visual AI Agents for the Edge

An exciting breakthrough in AI technology—Vision Language Models (VLMs)—offers a more dynamic and flexible method for video analysis. VLMs enable users to...

9 MIN READ

Jun 28, 2024

Introducing DoRA, a High-Performing Alternative to LoRA for Fine-Tuning

Full fine-tuning (FT) is commonly employed to tailor general pretrained models for specific downstream tasks. To reduce the training cost, parameter-efficient...

6 MIN READ

Jun 04, 2024

Power Cloud-Native Microservices at the Edge with NVIDIA JetPack 6.0, Now GA

NVIDIA JetPack SDK powers NVIDIA Jetson modules, offering a comprehensive solution for building end-to-end accelerated AI applications. JetPack 6 expands the...

12 MIN READ