With computation shifting from the CPU to faster GPUs for AI, ML and HPC applications, IO into and out of the GPU can become the primary bottleneck to the overall application performance. NVIDIA created Magnum IO GPUDirect Storage (GDS) to streamline data movement between storage and GPU memory and remove performance bottlenecks in the platform, like being forced to store and forward data…
]]>This is the fourth post in the Accelerating IO series. It addresses storage issues and shares recent results and directions with our partners. We cover the new GPUDirect Storage release, benefits, and implementation. Accelerated computing needs accelerated IO. Otherwise, computing resources get starved for data. Given that the fraction of all workflows for which data fits in memory is…
]]>This is the third post in the Accelerating IO series, which has the goal of describing the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture; positioned it in the broader context of CUDA, CUDA-X, and vertical application domains; and listed the four major components of the…
]]>This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture and positioned it in the broader context of CUDA, CUDA-X, and vertical application domains. Of the four major components of the architecture…
]]>This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the modern data center. Previously the boundary of the unit of computing, sheet metal no longer constrains the resources that can be applied to a single problem or the data set that can be housed. The new unit is the data center.
]]>When you double the number of processors used to solve a given problem, you expect the solution time to be cut in half. However, most programmers know from experience that applications tend to reach a point of diminishing returns when increasing the number of processors being used to solve a fixed-size problem. How efficiently an application can use more processors is called parallel…
]]>As AI and HPC datasets continue to increase in size, the time spent loading data for a given application begins to place a strain on the total application’s performance. When considering end-to-end application performance, fast GPUs are increasingly starved by slow I/O. I/O, the process of loading data from storage to GPUs for processing, has historically been controlled by the CPU.
]]>