FP16 – NVIDIA Technical Blog

FP16 – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-19T21:31:15Z http://www.open-lab.net/blog/feed/ Ayesha Asif <![CDATA[End-to-End AI for NVIDIA-Based PCs: Optimizing AI by Transitioning from FP32 to FP16]]> http://www.open-lab.net/blog/?p=63734 2024-08-28T17:41:58Z 2023-04-27T16:00:00Z

This post is part of a series about optimizing end-to-end AI. The performance of AI models is heavily influenced by the precision of the computational resources...]]>

This post is part of a series about optimizing end-to-end AI. The performance of AI models is heavily influenced by the precision of the computational resources... Series image with part 7 caption.

Series image with part 7 caption.

This post is part of a series about optimizing end-to-end AI. The performance of AI models is heavily influenced by the precision of the computational resources being used. Lower precision can lead to faster processing speeds and reduced memory usage, while higher precision can contribute to more accurate results. Finding the right balance between precision and performance is crucial for��

]]> 0 Gary Burnett <![CDATA[Object Detection on GPUs in 10 Minutes]]> http://www.open-lab.net/blog/?p=15047 2022-08-21T23:39:32Z 2019-06-26T19:00:39Z

Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require...]]>

Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require...

Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require substantial training using vast datasets to achieve high levels of accuracy. NVIDIA GPUs excel at the parallel compute performance required to train large networks in order to generate datasets for object detection inference.

]]> 8 Valerie Sarge <![CDATA[Tips for Optimizing GPU Performance Using Tensor Cores]]> http://www.open-lab.net/blog/?p=14687 2023-07-27T20:01:41Z 2019-06-10T13:00:06Z

Our most popular question is "What can I do to get great GPU performance for deep learning?"?We��ve recently published a detailed Deep Learning Performance...]]>

Our most popular question is "What can I do to get great GPU performance for deep learning?"?We��ve recently published a detailed Deep Learning Performance...

tensor_cube_white-1280-362x265

Our most popular question is ��What can I do to get great GPU performance for deep learning?�� We��ve recently published a detailed Deep Learning Performance Guide to help answer this question. The guide explains how GPUs process data and gives tips on how to design networks for better performance. We also take a close look at Tensor Core optimization to help improve performance. This post takes a��

]]> 15 Neil Trevett <![CDATA[Machine Learning Acceleration in Vulkan with Cooperative Matrices]]> http://www.open-lab.net/blog/?p=14322 2022-08-21T23:39:25Z 2019-04-16T21:00:10Z

Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and...]]>

Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and...

Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and techniques.?Machine learning?avoids?the need for a programmer to explicitly program the steps in solving a complex pattern-matching problem such as understanding speech or recognizing objects within an image. NVIDIA aims to bring machine learning to��

]]> 3 Amulya Vishwanath <![CDATA[Video Series: Mixed-Precision Training Techniques Using Tensor Cores for Deep Learning]]> http://www.open-lab.net/blog/?p=13416 2022-08-21T23:39:19Z 2019-01-30T18:00:34Z

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The?complexity and size of neural networks...]]>

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The?complexity and size of neural networks...

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The complexity and size of neural networks continue to grow. Mixed-precision training using Tensor Cores on Volta and Turing architectures enable higher performance while maintaining network accuracy for heavily compute- and memory-intensive Deep Neural Networks (DNNs).

]]> 0 Geetika Gupta <![CDATA[Using Tensor Cores for Mixed-Precision Scientific Computing]]> http://www.open-lab.net/blog/?p=13346 2022-08-21T23:39:18Z 2019-01-23T14:00:44Z

Double-precision floating point?(FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...]]>

Double-precision floating point?(FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...

Double-precision floating point (FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in engineering and scientific applications require the extra precision to compute correct answers or even reach an answer. However, FP64 also requires more computing resources and runtime to deliver the increased precision levels.

]]> 2 Chip Huyen <![CDATA[Mixed Precision Training for NLP and Speech Recognition with OpenSeq2Seq]]> http://www.open-lab.net/blog/?p=12300 2022-08-21T23:39:09Z 2018-10-09T13:00:45Z

The success of neural networks thus far has been built on bigger datasets, better theoretical models, and reduced training time. Sequential models, in...]]>

The success of neural networks thus far has been built on bigger datasets, better theoretical models, and reduced training time. Sequential models, in...

mixed_precision_training_flow

]]> 1 Scott Yokim <![CDATA[Tensor Ops Made Easier in cuDNN]]> http://www.open-lab.net/blog/?p=11502 2022-08-21T23:38:58Z 2018-08-20T21:00:23Z

Neural network models have quickly taken advantage of NVIDIA Tensor Cores for deep learning since their introduction in the Tesla V100 GPU last year. For...]]>

Neural network models have quickly taken advantage of NVIDIA Tensor Cores for deep learning since their introduction in the Tesla V100 GPU last year. For...

cuDNN_logo_white_on_black_1432x920

Neural network models have quickly taken advantage of NVIDIA Tensor Cores for deep learning since their introduction in the Tesla V100 GPU last year. For example, new performance records for ResNet50 training were announced recently with Tensor Core-based solutions. (See the NVIDIA developer post on new performance milestones for additional details). NVIDIA��s cuDNN library enables CUDA��

]]> 1 Jeremy Appleyard <![CDATA[Programming Tensor Cores in CUDA 9]]> http://www.open-lab.net/blog/parallelforall/?p=8496 2024-05-17T17:25:34Z 2017-10-17T09:29:09Z

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x...]]>

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x... Decorative image of Tensor Cores.

Decorative image of Tensor Cores.

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x the 32-bit floating point throughput of the previous-generation NVIDIA P100. Tensor Cores enable you to use mixed-precision for higher throughput without sacrificing accuracy. Tensor Cores provide a huge boost to convolutions and matrix operations.

]]> 14 Paulius Micikevicius <![CDATA[Mixed-Precision Training of Deep Neural Networks]]> http://www.open-lab.net/blog/parallelforall/?p=8452 2022-08-21T23:38:30Z 2017-10-11T16:00:57Z

Deep?Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language...]]>

Deep?Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language... CUDA AI Cube

CUDA AI Cube

Deep Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language translation, speech processing, game playing, and many others. DNN complexity has been increasing to achieve these results, which in turn has increased the computational resources required to train these networks. Mixed-precision training lowers the��

]]> 5 Mark Harris <![CDATA[Mixed-Precision Programming with CUDA 8]]> http://www.open-lab.net/blog/parallelforall/?p=7311 2022-08-21T23:38:00Z 2016-10-19T21:30:47Z

Update, March 25, 2019: The latest Volta and Turing GPUs now incoporate?Tensor Cores, which accelerate certain types of FP16 matrix math. This enables faster...]]>

Update, March 25, 2019: The latest Volta and Turing GPUs now incoporate?Tensor Cores, which accelerate certain types of FP16 matrix math. This enables faster... CUDA AI Cube

CUDA AI Cube

Update, March 25, 2019: The latest Volta and Turing GPUs now incoporate Tensor Cores, which accelerate certain types of FP16 matrix math. This enables faster and easier mixed-precision computation within popular AI frameworks. Making use of Tensor Cores requires using CUDA 9 or later. NVIDIA has also added automatic mixed precision capabilities to TensorFlow, PyTorch, and MXNet.

]]> 1 ��˳��97caoporen��