Jiri Kraus – NVIDIA Technical Blog

Jiri Kraus – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-08-12T15:49:35Z http://www.open-lab.net/blog/feed/ Jiri Kraus <![CDATA[Accelerating NVSHMEM 2.0 Team-Based Collectives Using NCCL]]> http://www.open-lab.net/blog/?p=22803 2022-08-21T23:40:50Z 2021-01-22T21:47:31Z

NVSHMEM 2.0 is introducing a new API for performing collective operations based on the Team Management feature of the OpenSHMEM 1.5 specification. A team is a...]]>

NVSHMEM 2.0 is introducing a new API for performing collective operations based on the Team Management feature of the OpenSHMEM 1.5 specification. A team is a subset of processing elements (PEs) in an OpenSHMEM job. The concept is analogous to communicators in MPI. The new Teams API is a replacement for the active-set-based API for collective operations in the OpenSHMEM specification that was…

]]> 0 Jiri Kraus <![CDATA[Increase Performance with GPU Boost and K80 Autoboost]]> http://www.open-lab.net/blog/parallelforall/?p=4005 2022-08-21T23:37:28Z 2014-11-17T14:19:32Z

NVIDIA? GPU Boost™?is a feature available on NVIDIA? GeForce? and Tesla? GPUs that?boosts application performance by increasing GPU core and memory...]]>

NVIDIA® GPU Boost is a feature available on NVIDIA® GeForce® and Tesla® GPUs that boosts application performance by increasing GPU core and memory clock rates when sufficient power and thermal headroom are available (See the earlier Parallel Forall post about GPU Boost by Mark Harris). In the case of Tesla GPUs, GPU Boost is customized for compute-intensive workloads running on clusters.

]]> 19 Jiri Kraus <![CDATA[CUDA Pro Tip: Profiling MPI Applications]]> http://www.open-lab.net/blog/parallelforall/?p=3313 2022-08-21T23:37:06Z 2014-06-19T19:05:55Z

When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it's necessary to identify the MPI rank where...]]>

When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it’s necessary to identify the MPI rank where the performance issue occurs. Before CUDA 6.5 it was hard to do this because the CUDA profiler only shows the PID of the processes and leaves the developer to figure out the mapping from PIDs to MPI ranks. Although the mapping can be done…

]]> 1 Jiri Kraus <![CDATA[Accelerating a C++ CFD Code with OpenACC]]> http://www.open-lab.net/blog/parallelforall/?p=2741 2022-08-21T23:37:03Z 2014-06-03T13:51:44Z

Computational Fluid Dynamics (CFD) is a valuable tool to study the behavior of fluids. Today, many areas of engineering use CFD. For example, the automotive...]]>

Computational Fluid Dynamics (CFD) is a valuable tool to study the behavior of fluids. Today, many areas of engineering use CFD. For example, the automotive industry uses CFD to study airflow around cars, and to optimize the car body shapes to reduce drag and improve fuel efficiency. To get accurate results in fluid simulation it is necessary to capture complex phenomena such as turbulence…

]]> 0 Jiri Kraus <![CDATA[CUDA Pro Tip: Generate Custom Application Profile Timelines with NVTX]]> http://www.open-lab.net/blog/parallelforall/?p=2003 2024-08-12T15:49:35Z 2013-09-04T01:49:42Z

The last time you used the timeline feature in the NVIDIA Visual Profiler, Nsight VSE or the new Nsight Systems to analyze a complex application, you might have...]]>

The last time you used the timeline feature in the NVIDIA Visual Profiler, Nsight VSE or the new Nsight Systems to analyze a complex application, you might have wished to see a bit more than just CUDA API calls and GPU kernels. In this post I will show you how you can use the NVIDIA Tools Extension (NVTX) to annotate the time line with useful information. I will demonstrate how to add time…

]]> 6 Jiri Kraus <![CDATA[Benchmarking CUDA-Aware MPI]]> http://www.parallelforall.com/?p=1171 2023-07-05T19:44:41Z 2013-03-28T03:29:29Z

I introduced CUDA-aware MPI in my last post, with an introduction to MPI and a description of the functionality and benefits of CUDA-aware MPI. In this post I...]]>

I introduced CUDA-aware MPI in my last post, with an introduction to MPI and a description of the functionality and benefits of CUDA-aware MPI. In this post I will demonstrate the performance of MPI through both synthetic and realistic benchmarks. Since you now know why CUDA-aware MPI is more efficient from a theoretical perspective, let’s take a look at the results of MPI bandwidth and…

]]> 16 Jiri Kraus <![CDATA[An Introduction to CUDA-Aware MPI]]> http://www.parallelforall.com/?p=1362 2022-08-21T23:36:53Z 2013-03-14T02:18:53Z

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed?processes that is?commonly used in HPC to build...]]>

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed processes that is commonly used in HPC to build applications that can scale to multi-node computer clusters. As such, MPI is fully compatible with CUDA, which is designed for parallel computing on a single computer or node. There are many reasons for wanting to combine the two parallel…

]]> 5 ��˳��97caoporen��