Collective communications are a performance-critical ingredient of modern distributed AI training workloads such as recommender systems and natural language processing. NVIDIA Collective Communication Library (NCCL), a Magnum IO Library, implements GPU-accelerated collective operations: NCCL is topology-aware and is optimized to achieve high bandwidth and low latency over PCIe��
]]>