Akhil Langer – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-09-19T19:34:01Z http://www.open-lab.net/blog/feed/ Akhil Langer <![CDATA[Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0]]> http://www.open-lab.net/blog/?p=88550 2024-09-19T19:34:01Z 2024-09-06T20:30:09Z NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...]]>

NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on OpenSHMEM, NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams.

Source

]]>
Akhil Langer <![CDATA[Accelerating NVSHMEM 2.0 Team-Based Collectives Using NCCL]]> http://www.open-lab.net/blog/?p=22803 2022-08-21T23:40:50Z 2021-01-22T21:47:31Z NVSHMEM 2.0 is introducing a new API for performing collective operations based on the Team Management feature of the OpenSHMEM 1.5 specification. A team is a...]]>

NVSHMEM 2.0 is introducing a new API for performing collective operations based on the Team Management feature of the OpenSHMEM 1.5 specification. A team is a subset of processing elements (PEs) in an OpenSHMEM job. The concept is analogous to communicators in MPI. The new Teams API is a replacement for the active-set-based API for collective operations in the OpenSHMEM specification that was…

Source

]]>
0
Akhil Langer <![CDATA[Scaling Scientific Computing with NVSHMEM]]> http://www.open-lab.net/blog/?p=18979 2023-02-13T17:45:04Z 2020-08-25T17:23:15Z Figure 1. In the NVSHMEM memory model, each process (PE) has private memory, as well as symmetric memory that forms a partition of the partitioned global...]]>

When you double the number of processors used to solve a given problem, you expect the solution time to be cut in half. However, most programmers know from experience that applications tend to reach a point of diminishing returns when increasing the number of processors being used to solve a fixed-size problem. How efficiently an application can use more processors is called parallel…

Source

]]>
1
���˳���97caoporen����