For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes. In this post, we discuss the details of the NCCL 2.22 release and the pain points addressed. NVIDIA Magnum IO NCCL is a library designed to optimize inter-GPU and multi-node communication, crucial for efficient parallel computing��
]]>NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on OpenSHMEM, NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams.
]]>Computational energy efficiency has become a primary decision criterion for most supercomputing centers. Data centers, once built, are capped in terms of the amount of power they can use without expensive and time-consuming retrofits. Maximizing insight in the form of workload throughput then means maximizing workload per watt. NVIDIA products have, for several generations��
]]>NVIDIA AI Enterprise is an end-to-end, secure, cloud-native suite of AI software. The recent release of NVIDIA AI Enterprise 3.0 introduces new features to help optimize the performance and efficiency of production AI. This post provides details about the new features listed below and how they work. New AI workflows in the 3.0 release of NVIDIA AI Enterprise help reduce the��
]]>Today��s leading-edge high performance computing (HPC) systems contain tens of thousands of GPUs. In NVIDIA systems, GPUs are connected on nodes through the NVLink scale-up interconnect, and across nodes through a scale-out network like InfiniBand. The software libraries that GPUs use to communicate, share work, and efficiently operate in parallel are collectively called NVIDIA Magnum IO��
]]>You could make an argument that the history of civilization and technological advancement is the history of the search and discovery of materials. Ages are named not for leaders or civilizations but for the materials that defined them: Stone Age, Bronze Age, and so on. The current digital or information age could be renamed the Silicon or Semiconductor Age and retain the same meaning.
]]>Collective communications are a performance-critical ingredient of modern distributed AI training workloads such as recommender systems and natural language processing. NVIDIA Collective Communication Library (NCCL), a Magnum IO Library, implements GPU-accelerated collective operations: NCCL is topology-aware and is optimized to achieve high bandwidth and low latency over PCIe��
]]>With computation shifting from the CPU to faster GPUs for AI, ML and HPC applications, IO into and out of the GPU can become the primary bottleneck to the overall application performance. NVIDIA created Magnum IO GPUDirect Storage (GDS) to streamline data movement between storage and GPU memory and remove performance bottlenecks in the platform, like being forced to store and forward data��
]]>Supercomputers are significant investments. However they are extremely valuable tools for researchers and scientists. To effectively and securely share the computational might of these data centers, NVIDIA introduced the Cloud-Native Supercomputing architecture. It combines bare metal performance, multitenancy, and performance isolation for supercomputing. Magnum IO, the I/
]]>This is the fourth post in the Accelerating IO series. It addresses storage issues and shares recent results and directions with our partners. We cover the new GPUDirect Storage release, benefits, and implementation. Accelerated computing needs accelerated IO. Otherwise, computing resources get starved for data. Given that the fraction of all workflows for which data fits in memory is��
]]>Magnum IO is the collection of IO technologies from NVIDIA and Mellanox that make up the IO subsystem of the modern data center and enable applications at scale. If you are trying to scale up your application to multiple GPUs, or scaling it out across multiple nodes, you are probably using some of the libraries in Magnum IO. NVIDIA is now publishing the Magnum IO Developer Environment 21.04��
]]>This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture and positioned it in the broader context of CUDA, CUDA-X, and vertical application domains. Of the four major components of the architecture��
]]>This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the modern data center. Previously the boundary of the unit of computing, sheet metal no longer constrains the resources that can be applied to a single problem or the data set that can be housed. The new unit is the data center.
]]>