A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x the 32-bit floating point throughput of the previous-generation NVIDIA P100. Tensor Cores enable you to use mixed-precision for higher throughput without sacrificing accuracy. Tensor Cores provide a huge boost to convolutions and matrix operations.
]]>This week at GTC 2016, we announced the latest update to NVIDIA Deep Learning SDK, which now includes cuDNN 5. Version 5 offers new features, improved performance and support for the latest generation NVIDIA Tesla P100 GPU. New features in cuDNN 5 include: One of the new features we’ve added in cuDNN 5 is support for Recurrent Neural Networks (RNN). RNNs are a powerful tool used for…
]]>Often cited as the main reason that naïve C/C++ code cannot match FORTRAN performance, pointer aliasing is an important topic to understand when considering optimizations for your C/C++ code. In this tip I will describe what pointer aliasing is and a simple way to alter your code so that it does not harm your application performance. Two pointers alias if the memory to which they point…
]]>