Ampere – NVIDIA Technical Blog

Ampere – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-04-03T18:49:37Z http://www.open-lab.net/blog/feed/ Brian Shi <![CDATA[Boosting Q&A Accuracy with GraphRAG Using PyG and Graph Databases]]> http://www.open-lab.net/blog/?p=97900 2025-04-03T18:46:06Z 2025-03-26T21:41:08Z

Large language models (LLMs) often struggle with accuracy when handling domain-specific questions, especially those requiring multi-hop reasoning or access to...]]>

Large language models (LLMs) often struggle with accuracy when handling domain-specific questions, especially those requiring multi-hop reasoning or access to... Decorative image.

Decorative image.

Large language models (LLMs) often struggle with accuracy when handling domain-specific questions, especially those requiring multi-hop reasoning or access to proprietary data. While retrieval-augmented generation (RAG) can help, traditional vector search methods often fall short. In this tutorial, we show you how to implement GraphRAG in combination with fine-tuned GNN+LLM models to achieve��

]]> 0 Elias Wolfberg <![CDATA[New AI-Powered 3D Printing Can Help Surgeons Rehearse Procedures]]> http://www.open-lab.net/blog/?p=89206 2025-01-07T20:27:16Z 2024-09-20T15:32:09Z

Researchers at Washington State University (WSU) unveiled a new AI-guided 3D printing technique that can help physicians print intricate replicas of human...]]>

Researchers at Washington State University (WSU) unveiled a new AI-guided 3D printing technique that can help physicians print intricate replicas of human...

AI 3D Printing WSU

Researchers at Washington State University (WSU) unveiled a new AI-guided 3D printing technique that can help physicians print intricate replicas of human organs. Surgeons can then use these organ models to practice before performing the actual surgery, which gives doctors more tools to improve surgical results. The AI algorithm was trained on images and key attributes of human kidneys and��

]]> 0 Babak Hejazi <![CDATA[Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates]]> http://www.open-lab.net/blog/?p=83888 2024-07-16T17:19:07Z 2024-06-12T20:30:00Z

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...]]>

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...

cublas-compilation

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance computing (HPC) workloads. This post provides an overview of the following updates on cuBLAS matrix multiplications (matmuls) since version 12.0, and a walkthrough: Grouped GEMM APIs can be viewed as a generalization of the batched��

]]> 0 Wessam Bahnassi <![CDATA[Advancing GPU-Driven Rendering with Work Graphs in Direct3D 12]]> http://www.open-lab.net/blog/?p=78794 2024-04-09T23:45:27Z 2024-03-11T17:00:00Z

GPU-driven rendering has long been a major goal for many game applications. It enables better scalability for handling large virtual scenes and reduces cases...]]>

GPU-driven rendering has long been a major goal for many game applications. It enables better scalability for handling large virtual scenes and reduces cases...

field-networked-squares

GPU-driven rendering has long been a major goal for many game applications. It enables better scalability for handling large virtual scenes and reduces cases where the CPU could bottleneck a game��s performance. Short of running the game��s logic on the GPU, I see the pinnacle of GPU-driven rendering as a scenario in which the CPU sends the GPU only the new frame��s camera information��

]]> 0 Brad Nemire <![CDATA[Oracle Cloud Infrastructure Sets Quantitative Financial HPC Calculations Record with NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=75292 2023-12-14T19:27:26Z 2023-12-12T23:33:55Z

NVIDIA A100 Tensor Core GPUs were featured in a stack that set several records in a recent STAC-A2? benchmark standard based on financial market risk...]]>

NVIDIA A100 Tensor Core GPUs were featured in a stack that set several records in a recent STAC-A2? benchmark standard based on financial market risk...

Oracle NVIDIA STAC A2 benchmark

NVIDIA A100 Tensor Core GPUs were featured in a stack that set several records in a recent STAC-A2 benchmark standard based on financial market risk analysis.

]]> 0 Gareth Sylvester-Bradley <![CDATA[Software-Defined Broadcast with NVIDIA Holoscan for Media]]> http://www.open-lab.net/blog/?p=70826 2024-03-13T19:05:53Z 2023-09-14T19:00:00Z

The broadcast industry is undergoing a transformation in how content is created, managed, distributed, and consumed. This transformation includes a shift from...]]>

The broadcast industry is undergoing a transformation in how content is created, managed, distributed, and consumed. This transformation includes a shift from...

Holoscan-media-Techblog-1480x830-1

The broadcast industry is undergoing a transformation in how content is created, managed, distributed, and consumed. This transformation includes a shift from traditional linear workflows bound by fixed-function devices to flexible and hybrid, software-defined systems that enable the future of live streaming. Developers can now apply to join the early access program for NVIDIA Holoscan for��

]]> 3 Hongxiao Bai <![CDATA[Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines]]> http://www.open-lab.net/blog/?p=67288 2023-07-13T19:00:19Z 2023-07-03T16:00:00Z

Deep learning is achieving significant success in various fields and areas, as it has revolutionized the way we analyze, understand, and manipulate data. There...]]>

Deep learning is achieving significant success in various fields and areas, as it has revolutionized the way we analyze, understand, and manipulate data. There... Decorative image of structured sparsity

Decorative image of structured sparsity

Deep learning is achieving significant success in various fields and areas, as it has revolutionized the way we analyze, understand, and manipulate data. There are many success stories in computer vision, natural language processing (NLP), medical diagnosis and health care, autonomous vehicles, recommendation systems, and climate and weather modeling. In an era of ever-growing neural network��

]]> 0 Soma Velayutham <![CDATA[NVIDIA AX800 Delivers High-Performance 5G vRAN and AI Services on One Common Cloud Infrastructure]]> http://www.open-lab.net/blog/?p=65075 2023-10-23T17:18:57Z 2023-05-29T03:00:00Z

The pace of 5G investment and adoption is accelerating. According to the GSMA Mobile Economy 2023 report, nearly $1.4 trillion will be spent on 5G CapEx,...]]>

The pace of 5G investment and adoption is accelerating. According to the GSMA Mobile Economy 2023 report, nearly $1.4 trillion will be spent on 5G CapEx,... NVIDIA AX800

NVIDIA AX800

The pace of 5G investment and adoption is accelerating. According to the GSMA Mobile Economy 2023 report, nearly $1.4 trillion will be spent on 5G CapEx, between 2023 and 2030. Radio access network (RAN) may account for over 60% of the spend. Increasingly, the CapEx spend is moving from the traditional approach with proprietary hardware, to virtualized RAN (vRAN) and Open RAN architectures��

]]> 0 Martin Marciniszyn Mehringer <![CDATA[Benchmarking Deep Neural Networks for Low-Latency Trading and Rapid Backtesting on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=60208 2023-07-05T19:25:20Z 2023-02-02T14:00:00Z

Lowering response times to new market events is a driving force in algorithmic trading. Latency-sensitive trading firms keep up with the ever-increasing pace of...]]>

Lowering response times to new market events is a driving force in algorithmic trading. Latency-sensitive trading firms keep up with the ever-increasing pace of... GPU

GPU

Lowering response times to new market events is a driving force in algorithmic trading. Latency-sensitive trading firms keep up with the ever-increasing pace of financial electronic markets by deploying low-level hardware devices like Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs) into their systems. However, as markets become increasingly��

]]> 2 Maggie Zhang <![CDATA[Dividing NVIDIA A30 GPUs and Conquering Multiple Workloads]]> http://www.open-lab.net/blog/?p=50380 2023-04-04T16:58:51Z 2022-08-30T19:00:35Z

Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each...]]>

Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each...

mig-for-distributed-workloads

Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each instance has its own compute cores, high-bandwidth memory, L2 cache, DRAM bandwidth, and media engines such as decoders. This enables multiple workloads or multiple users to run workloads simultaneously on one GPU to maximize the GPU��

]]> 0 Gwena Cunha Sergio <![CDATA[Accelerating Quantized Networks with the NVIDIA QAT Toolkit for TensorFlow and NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=48838 2023-04-04T17:00:05Z 2022-06-16T17:28:18Z

We��re excited to announce the NVIDIA Quantization-Aware Training (QAT) Toolkit for TensorFlow 2 with the goal of accelerating the quantized networks with...]]>

We��re excited to announce the NVIDIA Quantization-Aware Training (QAT) Toolkit for TensorFlow 2 with the goal of accelerating the quantized networks with...

tfqat-featured

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. We��re excited to announce the NVIDIA Quantization-Aware Training (QAT) Toolkit for TensorFlow 2 with the goal of accelerating the quantized networks with NVIDIA TensorRT on NVIDIA GPUs. This toolkit provides you with an easy-to-use API to quantize��

]]> 0 Ashraf Eassa <![CDATA[Fueling High-Performance Computing with Full-Stack Innovation]]> http://www.open-lab.net/blog/?p=48769 2023-07-05T19:27:52Z 2022-06-02T18:45:00Z

High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling...]]>

High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling...

Fueling High Performance Computing with Full-Stack Innovation

High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling climate change, or creating accurate simulations of our world, these solutions demand an enormous��and rapidly growing��amount of processing power. They are increasingly out of reach of traditional computing approaches.

]]> 1 Terry Yin <![CDATA[Training a State-of-the-Art ImageNet-1K Visual Transformer Model using NVIDIA DGX SuperPOD]]> http://www.open-lab.net/blog/?p=48136 2023-06-12T09:34:30Z 2022-05-25T16:00:00Z

Recent work has demonstrated that large transformer models can achieve or advance the SOTA in computer vision tasks such as semantic segmentation and object...]]>

Recent work has demonstrated that large transformer models can achieve or advance the SOTA in computer vision tasks such as semantic segmentation and object...

volo-featured

Recent work has demonstrated that large transformer models can achieve or advance the SOTA in computer vision tasks such as semantic segmentation and object detection. However, unlike convolutional network models that can do it only with the standard public dataset, it takes a proprietary dataset that is magnitudes larger. The recent project VOLO (Vision Outlooker) from SEA AI Lab��

]]> 1 Maggie Zhang <![CDATA[Accelerating AI Inference Workloads with NVIDIA A30 GPU]]> http://www.open-lab.net/blog/?p=47944 2022-08-30T18:58:43Z 2022-05-11T22:43:14Z

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC...]]>

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC...

a30-featured

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC applications for mainstream servers in data centers. The A30 PCIe card combines the third-generation Tensor Cores with large HBM2 memory (24 GB) and fast GPU memory bandwidth (933 GB/s) in a low-power envelope (maximum 165 W).

]]> 1 Kalyan Meher Vadrevu <![CDATA[Register for the NVIDIA Metropolis Developer Webinars on Sept. 22]]> http://www.open-lab.net/blog/?p=37245 2023-08-18T19:34:34Z 2021-09-08T20:01:16Z

Join NVIDIA experts and Metropolis partners on Sept. 22 for webinars exploring developer SDKs, GPUs, go-to-market opportunities, and more. All three sessions,...]]>

Join NVIDIA experts and Metropolis partners on Sept. 22 for webinars exploring developer SDKs, GPUs, go-to-market opportunities, and more. All three sessions,...

Metropolis

Join NVIDIA experts and Metropolis partners on Sept. 22 for webinars exploring developer SDKs, GPUs, go-to-market opportunities, and more. All three sessions, each with unique speakers and content, will be recorded and will be available for on-demand viewing later. Register Now >> Wednesday, September 22, 2021, 1 PM PDT Wednesday, September 22, 2021��

]]> 0 Maggie Zhang <![CDATA[Deploying NVIDIA Triton at Scale with MIG and Kubernetes]]> http://www.open-lab.net/blog/?p=31573 2025-03-18T18:20:18Z 2021-08-26T03:00:00Z

NVIDIA Triton Inference Server is an open-source AI model serving software that simplifies the deployment of trained AI models at scale in production. Clients...]]>

NVIDIA Triton Inference Server is an open-source AI model serving software that simplifies the deployment of trained AI models at scale in production. Clients...

NGINX-plus-load-balancer

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. NVIDIA Triton Inference Server is an open-source AI model serving software that simplifies the deployment of trained AI models at scale in production.

]]> 0 Ram Cherukuri <![CDATA[Discovering New Features in CUDA 11.4]]> http://www.open-lab.net/blog/?p=35000 2024-08-28T17:47:39Z 2021-07-29T17:25:00Z

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.4. This release includes GPU-accelerated libraries, debugging and optimization...]]>

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.4. This release includes GPU-accelerated libraries, debugging and optimization...

CUDA toolkit featured

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.4. This release includes GPU-accelerated libraries, debugging and optimization tools, programming language enhancements, and a runtime library to build and deploy your application on GPUs across the major CPU architectures: x86, Arm, and POWER. CUDA 11.4 is focused on enhancing the programming model and��

]]> 0 Purnendu Mukherjee <![CDATA[Real-Time Natural Language Processing with BERT Using NVIDIA TensorRT (Updated)]]> http://www.open-lab.net/blog/?p=34688 2023-06-12T21:08:51Z 2021-07-20T13:00:00Z

This post was originally published in August 2019 and has been updated for NVIDIA TensorRT 8.0. Large-scale language models (LSLMs) such as BERT, GPT-2, and...]]>

This post was originally published in August 2019 and has been updated for NVIDIA TensorRT 8.0. Large-scale language models (LSLMs) such as BERT, GPT-2, and...

TensorRT-Web-TensorRT-8-Launch-KVs-1734857-BERT-Socials_-1000x600

This post was originally published in August 2019 and has been updated for NVIDIA TensorRT 8.0. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Large-scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought exciting leaps in accuracy for many natural language processing��

]]> 0 Greg Ruetsch <![CDATA[Using Tensor Cores in CUDA Fortran]]> http://www.open-lab.net/blog/?p=24627 2023-03-22T01:11:50Z 2021-04-15T21:00:20Z

Tensor Cores, which are programmable matrix multiply and accumulate units, were first introduced in the V100 GPUs where they operated on half-precision (16-bit)...]]>

Tensor Cores, which are programmable matrix multiply and accumulate units, were first introduced in the V100 GPUs where they operated on half-precision (16-bit)...

CUDA_Tensor_Featured_image

Tensor Cores, which are programmable matrix multiply and accumulate units, were first introduced in the V100 GPUs where they operated on half-precision (16-bit) multiplicands. Tensor Core functionality has been expanded in the following architectures, and in the Ampere A100 GPUs (compute capability 8.0) support for other data types was added, including double precision.

]]> 1 Takuma Yamaguchi <![CDATA[Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores]]> http://www.open-lab.net/blog/?p=24706 2023-05-24T00:25:03Z 2021-03-19T16:24:28Z

Sparse-matrix dense-matrix multiplication (SpMM) is a fundamental linear algebra operation and a building block for more complex algorithms such as finding the...]]>

Sparse-matrix dense-matrix multiplication (SpMM) is a fundamental linear algebra operation and a building block for more complex algorithms such as finding the...

block-SpMM_hero

Sparse-matrix dense-matrix multiplication (SpMM) is a fundamental linear algebra operation and a building block for more complex algorithms such as finding the solutions of linear systems, computing eigenvalues through the preconditioned conjugate gradient, and multiple right-hand sides Krylov subspace iterative solvers. SpMM is also an important kernel used in many domains such as fluid dynamics��

]]> 21 Dusan Stosic <![CDATA[Accelerating AI Training with NVIDIA TF32 Tensor Cores]]> http://www.open-lab.net/blog/?p=23724 2022-08-21T23:41:01Z 2021-01-27T23:09:58Z

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and...]]>

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and...

AI_training_TF32_tensor_cores_Featured_Image

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture. It brings Tensor Core acceleration to single-precision DL workloads, without needing any changes to model scripts.

]]> 1 Erik Bohnhorst <![CDATA[Adding More Support in NVIDIA GPU Operator]]> http://www.open-lab.net/blog/?p=23095 2023-04-04T17:00:41Z 2021-01-26T23:12:47Z

Editor's note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, "How to Easily use GPUs with Kubernetes". Reliably provisioning...]]>

Editor's note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, "How to Easily use GPUs with Kubernetes". Reliably provisioning...

whats-new-gpu-operator

Editor��s note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, ��How to Easily use GPUs with Kubernetes��. Reliably provisioning servers with GPUs can quickly become complex as multiple components must be installed and managed to use GPUs with Kubernetes. The GPU Operator simplifies the initial deployment and management and is based on the Operator Framework.

]]> 0 Davide Onofrio <![CDATA[Minimizing Deep Learning Inference Latency with NVIDIA Multi-Instance GPU]]> http://www.open-lab.net/blog/?p=22868 2022-08-21T23:40:50Z 2020-12-18T18:39:52Z

Recently, NVIDIA unveiled the A100 GPU model, based on the NVIDIA Ampere architecture. Ampere introduced many features, including Multi-Instance GPU (MIG), that...]]>

Recently, NVIDIA unveiled the A100 GPU model, based on the NVIDIA Ampere architecture. Ampere introduced many features, including Multi-Instance GPU (MIG), that...

Flowers_Demo

Recently, NVIDIA unveiled the A100 GPU model, based on the NVIDIA Ampere architecture. Ampere introduced many features, including Multi-Instance GPU (MIG), that play a special role for deep learning-based (DL) applications. MIG makes it possible to use a single A100 GPU as if it were multiple smaller GPUs, maximizing utilization for DL workloads and providing dynamic scalability.

]]> 1 Ram Cherukuri <![CDATA[Enhancing Memory Allocation with New NVIDIA CUDA 11.2 Features]]> http://www.open-lab.net/blog/?p=22770 2024-08-28T17:54:37Z 2020-12-16T16:00:00Z

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every...]]>

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every...

CUDA_3x2

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every NVIDIA GPU platform for general purpose compute acceleration. The latest CUDA release, CUDA 11.2, is focused on improving the user experience and application performance for CUDA developers. CUDA 11.2��

]]> 0 William Tsu <![CDATA[Supercharging the World��s Fastest AI Supercomputing Platform on NVIDIA HGX A100 80GB GPUs]]> http://www.open-lab.net/blog/?p=22799 2023-07-11T23:17:22Z 2020-12-10T17:07:14Z

Exploding model sizes in deep learning and AI, complex simulations in high-performance computing (HPC), and massive datasets in data analytics all continue to...]]>

Exploding model sizes in deep learning and AI, complex simulations in high-performance computing (HPC), and massive datasets in data analytics all continue to...

HGX_A100_8_way_3QTR_Front_Left-Edit-1-625x417

Exploding model sizes in deep learning and AI, complex simulations in high-performance computing (HPC), and massive datasets in data analytics all continue to demand faster and more advanced GPUs and platforms. At SC20, we announced the NVIDIA A100 80GB GPU, the latest addition to the NVIDIA Ampere family, to help developers, researchers, and scientists tackle their toughest challenges.

]]> 0 Federico Busato <![CDATA[Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt]]> http://www.open-lab.net/blog/?p=22602 2022-08-21T23:40:49Z 2020-12-08T19:34:58Z

Deep neural networks achieve outstanding performance in a variety of fields, such as computer vision, speech recognition, and natural language processing. The...]]>

Deep neural networks achieve outstanding performance in a variety of fields, such as computer vision, speech recognition, and natural language processing. The... Decorative image of Tensor Cores.

Decorative image of Tensor Cores.

Deep neural networks achieve outstanding performance in a variety of fields, such as computer vision, speech recognition, and natural language processing. The computational power needed to process these neural networks is rapidly increasing, so efficient models and computation are crucial. Neural network pruning, removing unnecessary model parameters to yield a sparse network, is a useful way to��

]]> 10 Maggie Zhang <![CDATA[Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU]]> http://www.open-lab.net/blog/?p=21816 2023-07-27T19:58:45Z 2020-12-01T00:30:40Z

With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI,...]]>

With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI,...

a100-mig-featured-image

With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing. Along with the great performance increase over prior generation GPUs comes another groundbreaking innovation, Multi-Instance GPU (MIG). With MIG, each A100 GPU can be partitioned up to seven��

]]> 11 Arts Yang <![CDATA[Getting Kubernetes Ready for the NVIDIA A100 GPU with Multi-Instance GPU]]> http://www.open-lab.net/blog/?p=22271 2023-07-27T19:59:33Z 2020-12-01T00:30:00Z

Multi-Instance GPU (MIG) is a new feature of the latest generation of NVIDIA GPUs, such as A100. It enables users to maximize the utilization of a single GPU by...]]>

Multi-Instance GPU (MIG) is a new feature of the latest generation of NVIDIA GPUs, such as A100. It enables users to maximize the utilization of a single GPU by...

mig-single-or-mixed-strategy

]]> 4 Dave Salvator <![CDATA[Getting Immediate Speedups with NVIDIA A100 TF32]]> http://www.open-lab.net/blog/?p=22210 2023-04-04T17:01:17Z 2020-11-13T21:03:46Z

The NVIDIA A100 brought the biggest single-generation performance gains ever in our company��s history. These speedups are a product of architectural...]]>

The NVIDIA A100 brought the biggest single-generation performance gains ever in our company��s history. These speedups are a product of architectural...

precision

The NVIDIA A100 brought the biggest single-generation performance gains ever in our company��s history. These speedups are a product of architectural innovations that include Multi-Instance GPU (MIG), support for accelerated structural sparsity, and a new precision called TF32, which is the focus of this post. TF32 is a great precision to use for deep learning training, as it combines the range of��

]]> 1 Nefi Alarcon <![CDATA[CINECA to Build World��s Fastest AI Supercomputer]]> https://news.www.open-lab.net/?p=18524 2022-08-21T23:50:39Z 2020-10-15T18:36:36Z

NVIDIA this week announced that the Italian inter-university consortium CINECA �� one of the world��s most important supercomputing centers �� will...]]>

NVIDIA this week announced that the Italian inter-university consortium CINECA �� one of the world��s most important supercomputing centers �� will...

cineca

NVIDIA this week announced that the Italian inter-university consortium CINECA �� one of the world��s most important supercomputing centers �� will use the company��s accelerated computing platform to build the world��s fastest AI supercomputer. The new ��Leonardo�� system, built with Atos, is expected to deliver 10 exaflops of FP16 AI performance to enable advanced AI and HPC converged application��

]]> 0 Matthieu Tardy <![CDATA[Controlling Data Movement to Boost Performance on the NVIDIA Ampere Architecture]]> http://www.open-lab.net/blog/?p=20958 2024-05-23T13:14:04Z 2020-09-23T00:23:52Z

The NVIDIA Ampere architecture provides new mechanisms to control data movement within the GPU and CUDA 11.1 puts those controls into your hands. These...]]>

The NVIDIA Ampere architecture provides new mechanisms to control data movement within the GPU and CUDA 11.1 puts those controls into your hands. These...

The NVIDIA Ampere architecture provides new mechanisms to control data movement within the GPU and CUDA 11.1 puts those controls into your hands. These mechanisms include asynchronously copying data into shared memory and influencing the residency of data in the L2 cache. This post walks through how to use the asynchronous copy feature, and how to set up your algorithms to overlap��

]]> 0 Nefi Alarcon <![CDATA[GTC HPC Presentations]]> https://news.www.open-lab.net/?p=18025 2022-08-21T23:40:12Z 2020-09-10T03:14:05Z

Starting on October 5-9, This fall��s GTC will run continuously for five days, across seven time zones. The conference will showcase the latest breakthroughs...]]>

Starting on October 5-9, This fall��s GTC will run continuously for five days, across seven time zones. The conference will showcase the latest breakthroughs...

hpc-gtc

Starting on October 5-9, This fall��s GTC will run continuously for five days, across seven time zones. The conference will showcase the latest breakthroughs in HPC, and many other GPU technology interest areas. Attend live events in the time zone that works best for you, or browse an extensive catalog of on-demand content showcasing innovative uses of HPC technology. Here��s a preview of��

]]> 0 Vinh Nguyen <![CDATA[Accelerating TensorFlow on NVIDIA A100 GPUs]]> http://www.open-lab.net/blog/?p=18957 2023-06-12T21:15:05Z 2020-07-24T22:22:06Z

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG)...]]>

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG)...

tf_logo_social

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG) and third-generation NVLink. Ampere Tensor Cores introduce a novel math mode dedicated for AI training: the TensorFloat-32 (TF32). TF32 is designed to accelerate the processing of FP32 data types, commonly used in DL workloads.

]]> 0 Vinh Nguyen <![CDATA[Improving Computer Vision with NVIDIA A100 GPUs]]> http://www.open-lab.net/blog/?p=18363 2023-04-04T17:01:27Z 2020-06-16T17:23:00Z

During the 2020 NVIDIA GPU Technology Conference keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the NVIDIA...]]>

During the 2020 NVIDIA GPU Technology Conference keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the NVIDIA...

bi3d-estimate-binary-depth (2)

During the 2020 NVIDIA GPU Technology Conference keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the NVIDIA Ampere GPU architecture. In this post, we detail the exciting new features of the A100 that make NVIDIA GPUs an ever-better powerhouse for computer vision workloads. We also showcase two recent CV research projects from NVIDIA Research��

]]> 0 Janusz Lisiecki <![CDATA[Loading Data Fast with DALI and the New Hardware JPEG Decoder in NVIDIA A100 GPUs]]> http://www.open-lab.net/blog/?p=18130 2022-08-21T23:40:13Z 2020-06-15T23:10:00Z

Today, smartphones, the most popular device for taking pictures, can capture images as large as 4K UHD (3840��2160 image), more than 25 MB of raw data. Even...]]>

Today, smartphones, the most popular device for taking pictures, can capture images as large as 4K UHD (3840��2160 image), more than 25 MB of raw data. Even...

end-to-end-data-processing-pipeline-throughput-featured

Today, smartphones, the most popular device for taking pictures, can capture images as large as 4K UHD (3840��2160 image), more than 25 MB of raw data. Even considering the embarrassingly low HD resolution (1280��720), a raw image requires more than 2.5 MB of storage. Storing as few as 100 UHD images would require almost 3 GB of free space. Clearly, if you store data this way��

]]> 0 Mahesh Khadatare <![CDATA[Leveraging the Hardware JPEG Decoder and NVIDIA nvJPEG Library on NVIDIA A100 GPUs]]> http://www.open-lab.net/blog/?p=18226 2023-04-04T17:01:36Z 2020-06-15T22:51:02Z

According to surveys, the average person produces 1.2 trillion images that are captured by either a phone or a digital camera. The storage of such images,...]]>

According to surveys, the average person produces 1.2 trillion images that are captured by either a phone or a digital camera. The storage of such images,...

compressed-butterfly (2)

According to surveys, the average person produces 1.2 trillion images that are captured by either a phone or a digital camera. The storage of such images, especially in high-resolution raw format, uses lots of memory. JPEG refers to the Joint Photographic Experts Group, which celebrated its 25th birthday in 2017. The JPEG standard specifies the codec, which defines how an image is compressed��

]]> 0 Nefi Alarcon <![CDATA[NVIDIA��s New Ampere Data Center GPU in Full Production]]> https://news.www.open-lab.net/?p=17003 2023-03-22T01:06:27Z 2020-05-14T13:01:00Z

NVIDIA today introduced the first GPU based on the NVIDIA Ampere architecture, the NVIDIA A100, is in full production and shipping to customers worldwide. The...]]>

NVIDIA today introduced the first GPU based on the NVIDIA Ampere architecture, the NVIDIA A100, is in full production and shipping to customers worldwide. The...

a100_feature

NVIDIA today introduced the first GPU based on the NVIDIA Ampere architecture, the NVIDIA A100, is in full production and shipping to customers worldwide. The A100 draws on design breakthroughs in the NVIDIA Ampere architecture �� offering the company��s largest leap in performance to date within its eight generations of GPUs �� to unify AI training and inference and boost performance by up to��

]]> 0 Mohammad Shoeybi <![CDATA[State-of-the-Art Language Modeling Using Megatron on the NVIDIA A100 GPU]]> http://www.open-lab.net/blog/?p=17320 2023-04-04T17:01:46Z 2020-05-14T13:00:46Z

Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as...]]>

Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as...

time-spent-per-iteration

Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as question-answering, dialog systems, summarization, and article completion. However, during training, large models do not fit in the available memory of a single accelerator, requiring model parallelism to split the parameters across multiple��

]]> 1 William Tsu <![CDATA[Introducing NVIDIA HGX A100: The Most Powerful Accelerated Server Platform for AI and High Performance Computing]]> http://www.open-lab.net/blog/?p=17647 2023-04-04T17:01:54Z 2020-05-14T13:00:04Z

The NVIDIA mission is to accelerate the work of the da Vincis and Einsteins of our time. Scientists, researchers, and engineers are focused on solving some of...]]>

The NVIDIA mission is to accelerate the work of the da Vincis and Einsteins of our time. Scientists, researchers, and engineers are focused on solving some of...

HGX_A100_8_way

The NVIDIA mission is to accelerate the work of the da Vincis and Einsteins of our time. Scientists, researchers, and engineers are focused on solving some of the world��s most important scientific, industrial, and big data challenges using artificial intelligence (AI) and high performance computing (HPC). The NVIDIA HGX A100 with A100 Tensor Core GPUs delivers the next giant leap in our��

]]> 0 Ronny Krashinsky <![CDATA[NVIDIA Ampere Architecture In-Depth]]> http://www.open-lab.net/blog/?p=17431 2023-05-24T00:05:26Z 2020-05-14T13:00:00Z

Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU...]]>

Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU...

nvidia-a100-gpu-on-sxm4

Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU architecture. This post gives you a look inside the new A100 GPU, and describes important new features of NVIDIA Ampere architecture GPUs. The diversity of compute-intensive applications running in modern cloud data centers has driven��

]]> 0 Pramod Ramarao <![CDATA[CUDA 11 Features Revealed]]> http://www.open-lab.net/blog/?p=17442 2023-03-22T01:06:34Z 2020-05-14T13:00:00Z

The new NVIDIA A100 GPU based on the NVIDIA Ampere GPU architecture delivers the greatest generational leap in accelerated computing. The A100 GPU has...]]>

The new NVIDIA A100 GPU based on the NVIDIA Ampere GPU architecture delivers the greatest generational leap in accelerated computing. The A100 GPU has...

cuda-11

The new NVIDIA A100 GPU based on the NVIDIA Ampere GPU architecture delivers the greatest generational leap in accelerated computing. The A100 GPU has revolutionary hardware capabilities and we��re excited to announce CUDA 11 in conjunction with A100. CUDA 11 enables you to leverage the new hardware capabilities to accelerate HPC, genomics, 5G, rendering, deep learning, data analytics��

]]> 4 Jackson Marusarz <![CDATA[Unleashing the Power of NVIDIA Ampere Architecture with NVIDIA Nsight Developer Tools]]> http://www.open-lab.net/blog/?p=17447 2024-08-28T17:56:46Z 2020-05-14T13:00:00Z

The NVIDIA Ampere GPU architecture has arrived! It��s time to make sure that your applications are getting the most out of the powerful compute resources in...]]>

The NVIDIA Ampere GPU architecture has arrived! It��s time to make sure that your applications are getting the most out of the powerful compute resources in...

nsight-tools-flowchart

The NVIDIA Ampere GPU architecture has arrived! It��s time to make sure that your applications are getting the most out of the powerful compute resources in this new architecture. With the release of CUDA 11, we are adding several features to the Nsight family of Developer Tools to help you do just that. These additions improve usability, productivity, and make it easier for you to find bugs��

]]> 0 ��˳��97caoporen��