CJ Newburn – NVIDIA Technical Blog

CJ Newburn – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2023-07-11T23:17:10Z http://www.open-lab.net/blog/feed/ CJ Newburn <![CDATA[Accelerating IO in the Modern Data Center: Magnum IO Storage Partnerships]]> http://www.open-lab.net/blog/?p=39968 2023-03-22T01:16:55Z 2021-11-09T09:30:00Z

With computation shifting from the CPU to faster GPUs for AI, ML and HPC applications, IO into and out of the GPU can become the primary bottleneck to the...]]>

With computation shifting from the CPU to faster GPUs for AI, ML and HPC applications, IO into and out of the GPU can become the primary bottleneck to the overall application performance. NVIDIA created Magnum IO GPUDirect Storage (GDS) to streamline data movement between storage and GPU memory and remove performance bottlenecks in the platform, like being forced to store and forward data…

]]> 1 CJ Newburn <![CDATA[Accelerating IO in the Modern Data Center: Magnum IO Storage]]> http://www.open-lab.net/blog/?p=35783 2022-08-21T23:52:25Z 2021-08-23T18:02:00Z

This is the fourth post in the Accelerating IO series. It addresses storage issues and shares recent results and directions with our partners. We cover the new...]]>

This is the fourth post in the Accelerating IO series. It addresses storage issues and shares recent results and directions with our partners. We cover the new GPUDirect Storage release, benefits, and implementation. Accelerated computing needs accelerated IO. Otherwise, computing resources get starved for data. Given that the fraction of all workflows for which data fits in memory is…

]]> 1 CJ Newburn <![CDATA[Accelerating IO in the Modern Data Center: Computing and IO Management]]> http://www.open-lab.net/blog/?p=23756 2023-07-11T23:17:10Z 2021-02-06T01:29:32Z

This is the third post in the Accelerating IO series, which has the goal of describing the architecture, components, and benefits of Magnum IO, the IO subsystem...]]>

This is the third post in the Accelerating IO series, which has the goal of describing the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture; positioned it in the broader context of CUDA, CUDA-X, and vertical application domains; and listed the four major components of the…

]]> 0 CJ Newburn <![CDATA[Accelerating IO in the Modern Data Center: Network IO]]> http://www.open-lab.net/blog/?p=21733 2022-08-21T23:40:44Z 2020-10-20T19:13:11Z

This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern...]]>

This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture and positioned it in the broader context of CUDA, CUDA-X, and vertical application domains. Of the four major components of the architecture…

]]> 1 CJ Newburn <![CDATA[Accelerating IO in the Modern Data Center: Magnum IO Architecture]]> http://www.open-lab.net/blog/?p=21121 2023-03-22T01:09:09Z 2020-10-05T13:00:00Z

This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the...]]>

This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the modern data center. Previously the boundary of the unit of computing, sheet metal no longer constrains the resources that can be applied to a single problem or the data set that can be housed. The new unit is the data center.

]]> 3 CJ Newburn <![CDATA[Scaling Scientific Computing with NVSHMEM]]> http://www.open-lab.net/blog/?p=18979 2023-02-13T17:45:04Z 2020-08-25T17:23:15Z

Figure 1. In the NVSHMEM memory model, each process (PE) has private memory, as well as symmetric memory that forms a partition of the partitioned global...]]>

When you double the number of processors used to solve a given problem, you expect the solution time to be cut in half. However, most programmers know from experience that applications tend to reach a point of diminishing returns when increasing the number of processors being used to solve a fixed-size problem. How efficiently an application can use more processors is called parallel…

]]> 1 CJ Newburn <![CDATA[GPUDirect Storage: A Direct Path Between Storage and GPU Memory]]> http://www.open-lab.net/blog/?p=15376 2022-08-21T23:39:34Z 2019-08-06T13:00:31Z

As AI and HPC datasets continue to increase in size, the time spent loading data for a given application begins to place a strain on the total application��s...]]>

As AI and HPC datasets continue to increase in size, the time spent loading data for a given application begins to place a strain on the total application’s performance. When considering end-to-end application performance, fast GPUs are increasingly starved by slow I/O. I/O, the process of loading data from storage to GPUs for processing, has historically been controlled by the CPU.

]]> 7 ��˳��97caoporen��