NVLink

Mar 27, 2025

A New Era in Data Center Networking with NVIDIA Silicon Photonics-based Network Switching

NVIDIA is breaking new ground by integrating silicon photonics directly with its NVIDIA Quantum and NVIDIA Spectrum switch ICs. At GTC 2025, we announced the...

5 MIN READ

Mar 25, 2025

Automating AI Factories with NVIDIA Mission Control

Advanced AI models such as DeepSeek-R1 are proving that enterprises can now build cutting-edge AI models specialized with their own data and expertise. These...

7 MIN READ

An image of the NVIDIA Blackwell Ultra system on a black background.

Mar 19, 2025

NVIDIA Blackwell Ultra for the Era of AI Reasoning

For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead...

5 MIN READ

Feb 13, 2025

Simplify System Memory Management with the Latest NVIDIA GH200 NVL2 Enterprise RA

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined...

8 MIN READ

Jan 24, 2025

Optimize AI Inference Performance with NVIDIA Full-Stack Solutions

The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing...

9 MIN READ

Dec 11, 2024

Deploying NVIDIA H200 NVL at Scale with New Enterprise Reference Architecture

Last month at the Supercomputing 2024 conference, NVIDIA announced the availability of NVIDIA H200 NVL, the latest NVIDIA Hopper platform. Optimized for...

8 MIN READ

Nov 21, 2024

Best Practices for Multi-GPU Data Analysis Using RAPIDS with Dask

As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth—multi-gpu training and analysis...

5 MIN READ

Nov 01, 2024

3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input...

5 MIN READ

Oct 28, 2024

NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing...

7 MIN READ

Oct 09, 2024

Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...

8 MIN READ

Oct 08, 2024

Bringing AI-RAN to a Telco Near You

Inferencing for generative AI and AI agents will drive the need for AI compute infrastructure to be distributed from edge to central clouds. IDC predicts that...

14 MIN READ

Sep 26, 2024

Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...

8 MIN READ

Sep 24, 2024

NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1

In the latest round of MLPerf Inference – a suite of standardized, peer-reviewed inference benchmarks – the NVIDIA platform delivered outstanding...

7 MIN READ

Decorative image of a cube of green cubes, surrounded by other cubes on a dark background.

Sep 16, 2024

Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22

For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....

8 MIN READ

Sep 06, 2024

Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0

NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...

7 MIN READ

Sep 05, 2024

Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch

As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that...

5 MIN READ