John Tran – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2023-07-27T20:01:41Z http://www.open-lab.net/blog/feed/ John Tran <![CDATA[Tips for Optimizing GPU Performance Using Tensor Cores]]> http://www.open-lab.net/blog/?p=14687 2023-07-27T20:01:41Z 2019-06-10T13:00:06Z Our most popular question is "What can I do to get great GPU performance for deep learning?"?We��ve recently published a detailed Deep Learning Performance...]]>

Our most popular question is “What can I do to get great GPU performance for deep learning?” We’ve recently published a detailed Deep Learning Performance Guide to help answer this question. The guide explains how GPUs process data and gives tips on how to design networks for better performance. We also take a close look at Tensor Core optimization to help improve performance. This post takes a…

Source

]]>
15
John Tran <![CDATA[CUTLASS: Fast Linear Algebra in CUDA C++]]> http://www.open-lab.net/blog/parallelforall/?p=8708 2023-02-13T17:46:48Z 2017-12-06T04:03:29Z Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview...]]>

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview release described in the blog post below. We have decomposed the structure of the GEMM computation into deeper, structured primitives for loading data, computing predicate masks, streaming data at each level of the GEMM hierarchy…

Source

]]>
13
���˳���97caoporen����