Paulius Micikevicius – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2023-07-27T20:01:41Z http://www.open-lab.net/blog/feed/ Paulius Micikevicius <![CDATA[Accelerating AI Training with NVIDIA TF32 Tensor Cores]]> http://www.open-lab.net/blog/?p=23724 2022-08-21T23:41:01Z 2021-01-27T23:09:58Z NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and...]]>

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture. It brings Tensor Core acceleration to single-precision DL workloads, without needing any changes to model scripts.

Source

]]>
1
Paulius Micikevicius <![CDATA[Tips for Optimizing GPU Performance Using Tensor Cores]]> http://www.open-lab.net/blog/?p=14687 2023-07-27T20:01:41Z 2019-06-10T13:00:06Z Our most popular question is "What can I do to get great GPU performance for deep learning?"?We��ve recently published a detailed Deep Learning Performance...]]>

Our most popular question is “What can I do to get great GPU performance for deep learning?” We’ve recently published a detailed Deep Learning Performance Guide to help answer this question. The guide explains how GPUs process data and gives tips on how to design networks for better performance. We also take a close look at Tensor Core optimization to help improve performance. This post takes a…

Source

]]>
15
Paulius Micikevicius <![CDATA[Mixed-Precision Training of Deep Neural Networks]]> http://www.open-lab.net/blog/parallelforall/?p=8452 2022-08-21T23:38:30Z 2017-10-11T16:00:57Z Deep?Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language...]]>

Deep Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language translation, speech processing, game playing, and many others. DNN complexity has been increasing to achieve these results, which in turn has increased the computational resources required to train these networks. Mixed-precision training lowers the…

Source

]]>
5
���˳���97caoporen����