Benchmark

Apr 24, 2025

Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...

7 MIN READ

Apr 15, 2025

NVIDIA Llama Nemotron Ultra Open Model Delivers Groundbreaking Reasoning Accuracy

AI is no longer just about generating text or images—it’s about deep reasoning, detailed problem-solving, and powerful adaptability for real-world...

7 MIN READ

Apr 02, 2025

LLM Benchmarking: Fundamental Concepts

The past few years have witnessed the rise in popularity of generative AI and large language models (LLMs), as part of a broad AI revolution. As LLM-based...

14 MIN READ

Mar 19, 2025

Shrink Genomics and Single-Cell Analysis Time to Minutes with NVIDIA Parabricks and NVIDIA AI Blueprints

NVIDIA Parabricks is a scalable genomics analysis software suite that solves omics challenges with accelerated computing and deep learning to unlock new...

8 MIN READ

Mar 18, 2025

Measure and Improve AI Workload Performance with NVIDIA DGX Cloud Benchmarking

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical...

7 MIN READ

Mar 18, 2025

NVIDIA NeMo Retriever Delivers Accurate Multimodal PDF Data Extraction 15x Faster

Enterprises are generating and storing more multimodal data than ever before, yet traditional retrieval systems remain largely text-focused. While they can...

11 MIN READ

Mar 18, 2025

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over...

14 MIN READ

Mar 11, 2025

Efficient ETL with Polars and Apache Spark on NVIDIA Grace CPU

The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The...

7 MIN READ

Feb 14, 2025

Optimizing Qwen2.5-Coder Throughput with NVIDIA TensorRT-LLM Lookahead Decoding

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents,...

7 MIN READ

Stylized image of JetPack connected to a monitor.

Jan 16, 2025

NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules

The introduction of the NVIDIA Jetson Orin Nano Super Developer Kit sparked a new age of generative AI for small edge devices. The new Super Mode delivered an...

12 MIN READ

Jan 16, 2025

Accelerating Time Series Forecasting with RAPIDS cuML

Time series forecasting is a powerful data science technique used to predict future values based on data points from the past Open source Python libraries like...

4 MIN READ

Picture of the NVIDIA H200 NVL GPU on a black background.

Dec 20, 2024

Taking Computational Fluid Dynamics to the Next Level with the NVIDIA H200 Tensor Core GPU

Computational fluid dynamics (CFD) is used in industry and academia to address a wide range of use cases, including external aerodynamics, internal flows, heat...

5 MIN READ

Dec 19, 2024

RAPIDS 24.12 Introduces cuDF on PyPI, CUDA Unified Memory for Polars, and Faster GNNs

RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up groupby aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the...

8 MIN READ

Dec 17, 2024

Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding

Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only...

8 MIN READ

Nov 19, 2024

Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are...

6 MIN READ

Nov 15, 2024

Streamlining AI Inference Performance and Deployment with NVIDIA TensorRT-LLM Chunked Prefill

In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment...

4 MIN READ