• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • Content Creation / Rendering

    Migrating from Range Profiler to GPU Trace in Nsight Graphics

    Image of a city street at night with neon signs.

    Starting in Nsight Graphics 2023.1, the GPU Trace Profiler is the best way to profile your graphics application at the frame level. The Frame Profiler activity, and the Range Profiler tool window, have been removed.

    Don’t worry! The key profiling information is still available, only in a different form. This post guides you through the steps in GPU Trace, for each familiar workflow in the Range Profiler.

    In this post, I answer the following questions:

    • As a new user of GPU Trace, what should my workflow look like?
    • Where can I find each piece of data, previously shown in the Range Profiler?

    Application launch

    Previously, to access the Range Profiler, you may have used either the Frame Profiler or Frame Debugger activity, as shown on the left of Figure 1.

    Now, when starting an application, select the GPU Trace Profiler option, as shown on the right of Figure 1. In the Metric Set dropdown list, you can select the metrics to appear in the timeline view. This list also includes the Advanced Mode option, which enables additional metrics to be displayed in tables and tooltips.

    Diagram shows the differences between launcher dialogs in consecutive versions of Nsight Graphics. On the left, Nsight Graphics 2022.7, showing the Frame Profiler’s launch settings. On the right, Nsight Graphics 2023.1, showing the GPU Trace launch settings.
    Figure 1. Initial Connect to process dialog box, per activity

    Data collection

    Here are the previous steps for viewing profiling data with the Range Profiler:

    1. Press F11 in the application, or choose Capture for Live Analysis in the UI.
    2. Wait for the application to enter a replay loop.
    3. Open the Range Profiler tool window.

    With GPU Trace, you can view profiling data with the following step:

    1. Press F11 in the application or choose Generate GPU Trace Capture.
    2. After the data transfer progress reaches 100%, choose Open.

    The first order of business is navigating through perf markers and actions on a timeline (draw calls, dispatches, and so on). The two tools are similar: the markers are shown on a timeline and, upon selection of a perf marker, the displayed metrics are updated.

    The Range Profiler’s selector contains a graphical display of perf markers over time, with their nesting structure.
    Figure 2. Range Profiler’s Range Selector rows.
    GPU Trace’s Markers row contains a graphical display of perf markers over time, with their nesting structure. The timings of groups of actions such as ExecuteCommandLists, draws, and dispatches are also shown.
    Figure 3. GPU Trace’s Queue and Markers rows

    Here are some similarities and differences between the tools:

    • The Range Profiler can only show total values for the entire measured region.
      In contrast, GPU Trace is able to display time-series data for key metrics.
    • The most important metrics in the Range Profiler are visible on the GPU Trace timeline!  Figure 4 shows the corresponding elements.
    • Range-level metric values are visible in GPU Trace, in the Metrics tab on the right. The main difference is that GPU Trace accumulates sampled data, where workloads may be running in parallel; while the Range Profiler was isolating each measurement.

    In GPU Trace, certain metrics are only available when the Advanced Mode metric set has been selected. For example, the Warp Stall reasons are highlighted in cyan in Figure 4.

    Diagram shows the correspondence between textual elements in the Range Profiler, and graphical rows in GPU Trace.
    Figure 4. Where to find the most important Range Profiler metrics in GPU Trace

    Block diagrams

    The two major visual depictions of performance metrics in the Range Profiler were the GPU block diagram and Memory block diagram.

    GPU block diagram

    While GPU Trace does not present a block diagram of the GPU, all stats shown within the block diagram can be found on the GPU Trace timeline, in some fashion.

    Diagram shows the utilization of each pipeline stage in the GPU.
    Figure 5. Range Profiler’s GPU pipeline

    Table 1 shows the similar elements for the NVIDIA Ampere or NVIDIA Ada architectures, in the Throughput Metrics or Advanced Mode metric sets.

    StageDiagram elementGPU Trace rowGPU Trace metric
    GeometryPrim DistUnit ThroughputsPD Throughput
    GeometryVtx Attr FetchUnit ThroughputsVAF Throughput
    GeometryVPCUnit ThroughputsPES+VPC Throughput
    GeometryStream OutUnit ThroughputsPES+VPC Throughput
    RasterizationRasterizer[1]Unit ThroughputsRASTER Throughput
    RasterizationZROP SOLUnit ThroughputsZROP Throughput
    RasterizationCROP SOLUnit ThroughputsCROP Throughput
    ShadingSMUnit ThroughputsSM Throughput
    ShadingSM Pie ChartSM InstructionSM Issue Active[2]
    ShadingSM Pie ChartSM OccupancyWarps per Shader Stage[2]
    MemoryTextureUnit ThroughputsL1 Throughput[3]
    MemoryL2Unit ThroughputsL2 Throughput
    MemoryVRAMUnit ThroughputsVRAM Throughput
    Table 1. Correspondence of the GPU block diagram to timeline rows.
    1. Range Profiler displays no value for Raster Throughput.
    2. The Range Profiler’s pie chart shows instructions executed, per shader stage.
      GPU Trace can measure total instructions, but a per-shader stage decomposition is only available in the Occupancy chart.
    3. On modern GPUs, the L1TEX cache is a combined L1 Data Cache that contains a Load/Store Unit + Texture Unit. Despite the short name “L1”, it includes Texture as well.

    Memory block diagram

    GPU Trace does not present a block diagram of the memory hierarchy. However, each element of the Range Profiler’s memory diagram has some corresponding timeline data in GPU Trace.

    Diagram shows the utilization of each layer in the GPU’s memory cache hierarchy.
    Figure 6. Range Profiler’s GPU memory

    Table 2 shows the similar elements for the NVIDIA Ampere or NVIDIA Ada architectures, in the Throughput Metrics or Advanced Mode metric sets.

    Diagram elementGPU Trace rowGPU Trace metric
    Shader → TextureL1 ThroughputsL1 LSU Data-Stage Throughput
    L1 Texture Data-Stage Throughput
    Texture → ShaderL1 ThroughputsL1 LSU Writeback-Stage Throughput
    L1 Texture Writeback-Stage Throughput
    Texture Hit-RateL1 Hit RateL1 Hit Rate
    Input Assembler → L2[4]L2 BandwidthL2 Bandwidth from HUB[5]
    Texture → L2[4]L2 BandwidthL2 Bandwidth from L1[6]
    StreamOut → L2[4]L2 BandwidthL2 Bandwidth from PE[7]
    ROP → L2[4]L2 BandwidthL2 Bandwidth from CROP +
    L2 Bandwidth from ZROP
    L2 Hit RateL2 Hit RatesL2 Hit Rate
    L2 → VRAMVRAM BandwidthVRAM Write Bandwidth
    VRAM → L2VRAM BandwidthVRAM Read Bandwidth
    PCIe TX Bandwidth[8]PCIe BandwidthPCIe Write Bandwidth
    PCIe RX Bandwidth[8]PCIe BandwidthPCIe Read Bandwidth
    Table 2. Correspondence of memory block diagram to timeline rows
    1. Range Profiler does not display any values for “memory requests to L2”.
    2. HUB traffic includes the Primitive Distributor, Copy Engines, and a few other units.
    3. “L1” is short for L1TEX, and includes both Load/Store and Texture bandwidth.
    4. Primitive Engine traffic may include internal operations, in addition to streamout.
    5. Range Profiler does not display PCIe bandwidth.

    What about the Shader Profiler?

    In Nsight Graphics 2023.1, the Shader Profiler continues to be available through the Frame Debugger activity.

    The Shader Profiler is an essential part of a holistic profiling workflow, providing HLSL and GLSL source-level performance stats. Using it with GPU Trace can provide a complete picture of why frame performance is low, and the specific reasons each shader is achieving less than optimal performance.

    For more information about how this works, see the following resources:

    Conclusion

    The Nsight Graphics GPU Trace Profiler activity provides the same or better levels of information as the Range Profiler. In most cases, metrics are displayed over time, rather than as a single number, revealing the real-time performance characteristics of concurrent GPU workloads.

    NVIDIA continues to develop and improve GPU Trace, helping you to extract maximum performance on each new powerful architecture and programming model. To get started, download the latest version of Nsight Graphics.

    If you have questions or comments, reach out through the NVIDIA Developer forums or email us at NsightGraphics@nvidia.com. Remember to file any bugs you find using the integrated Feedback button on the top right of the tool window. For videos on how to use the tools and best practices from our experts, subscribe to the NVIDIA Game Developer YouTube channel.

    Watch the GDC demo video to see how GPU Trace was used to optimize path tracing in Cyberpunk 2077: How Cyberpunk 2077 Achieved Photorealistic Graphics with NVIDIA’s Tools – YouTube.

    For more information about GPU Trace and its many applications, see the following resources:

    Here are additional resources across a wider array of profiling tools:

    Acknowledgments

    Thanks to the following NVIDIA colleagues, who have contributed to this post: Louis Bavoil, Robert Jensen, Axel Mamode, and Aurelio Reis.

    Discuss (0)
    +2

    Tags

    人人超碰97caoporen国产