When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it��s necessary to identify the MPI rank where the performance issue occurs. Before CUDA 6.5 it was hard to do this because the CUDA profiler only shows the PID of the processes and leaves the developer to figure out the mapping from PIDs to MPI ranks. Although the mapping can be done��
]]>