NVIDIA Nsight Systems is a comprehensive tool for tracking application performance across CPU and GPU resources. It helps ensure that hardware is being efficiently used, traces API calls, and gives insight into inter-node network communication by describing how low-level metrics sum to application performance and finding where it can be improved. Nsight Systems can scale to cluster-size…
]]>Recently, a user came to us in the forums. They sent a screenshot of a profiling result using NVIDIA Nsight Systems on a PyTorch program. A single launch of an element-wise operation gave way to questions about the overheads and latencies in CUDA code, and how they are visualized with the Nsight Systems GUI. This seemed like a question for which a lot of people could use an answer. First…
]]>Gone are the days when it was expected that a programmer would “own” all the systems that they needed. Modern computational work frequently happens in shared systems, in the cloud, or otherwise on hardware not owned by the user or even their employer. This is good for developers. It can save time and money by allowing for testing and development on multiple architectures or OSs without…
]]>