Ben Williams – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-07T22:58:27Z http://www.open-lab.net/blog/feed/ Ben Williams <![CDATA[Networking Reliability and Observability at Scale with NCCL 2.24]]> http://www.open-lab.net/blog/?p=96731 2025-03-07T22:58:27Z 2025-03-13T16:30:00Z The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking....]]>

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVLink, or networking. It uses advanced topology detection, optimized communication graphs…

Source

]]>
Ben Williams <![CDATA[New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23]]> http://www.open-lab.net/blog/?p=95412 2025-02-06T19:33:51Z 2025-01-31T22:47:37Z The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...]]>

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVLink, or networking. It uses advanced topology detection, optimized communication graphs…

Source

]]>
Ben Williams <![CDATA[Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22]]> http://www.open-lab.net/blog/?p=87077 2024-09-19T19:30:36Z 2024-09-17T00:31:08Z For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....]]>

For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes. In this post, we discuss the details of the NCCL 2.22 release and the pain points addressed. NVIDIA Magnum IO NCCL is a library designed to optimize inter-GPU and multi-node communication, crucial for efficient parallel computing…

Source

]]>
���˳���97caoporen����