Babak Hejazi – NVIDIA Technical Blog

Babak Hejazi – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-07-16T17:19:07Z http://www.open-lab.net/blog/feed/ Babak Hejazi <![CDATA[Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates]]> http://www.open-lab.net/blog/?p=83888 2024-07-16T17:19:07Z 2024-06-12T20:30:00Z

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...]]>

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance computing (HPC) workloads. This post provides an overview of the following updates on cuBLAS matrix multiplications (matmuls) since version 12.0, and a walkthrough: Grouped GEMM APIs can be viewed as a generalization of the batched…

]]> Babak Hejazi <![CDATA[New cuBLAS 12.0 Features and Matrix Multiplication Performance on NVIDIA Hopper GPUs]]> http://www.open-lab.net/blog/?p=60111 2023-02-23T18:21:10Z 2023-02-01T18:30:00Z

The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper architecture with the fourth generation of NVIDIA Tensor Cores, recently debuted delivering...]]>

The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper architecture with the fourth generation of NVIDIA Tensor Cores, recently debuted delivering unprecedented performance and sweeping AI benchmarks such as MLPerf training. A significant fraction of operations in AI and machine learning benchmarks are general matrix multiplications (GEMMS), which are also referred to as matmul…

]]> 0 ��˳��97caoporen��