Advanced API Performance – NVIDIA Technical Blog

Advanced API Performance – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-27T16:00:00Z http://www.open-lab.net/blog/feed/ Lars Nordskog <![CDATA[Advanced API Performance: Swap Chains]]> http://www.open-lab.net/blog/?p=74280 2023-12-11T20:20:45Z 2023-12-15T17:00:00Z

Swap chains are an integral part of how you get rendering data output to a screen. They usually consist of some group of output-ready buffers, each of which can...]]>

Swap chains are an integral part of how you get rendering data output to a screen. They usually consist of some group of output-ready buffers, each of which can... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

Swap chains are an integral part of how you get rendering data output to a screen. They usually consist of some group of output-ready buffers, each of which can be rendered to one at a time in rotation. In parallel with rendering to one of a swap chain��s buffers, some other buffer in the swap chain is generally read from for display output. This post covers best practices when working with��

]]> 0 Oleg Kuznetsov <![CDATA[Advanced API Performance: Intrinsics]]> http://www.open-lab.net/blog/?p=71300 2023-12-30T00:44:05Z 2023-11-21T18:37:48Z

Intrinsics can be thought of as higher-level abstractions of specific hardware instructions. They offer direct access to low-level operations or...]]>

Intrinsics can be thought of as higher-level abstractions of specific hardware instructions. They offer direct access to low-level operations or... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

Intrinsics can be thought of as higher-level abstractions of specific hardware instructions. They offer direct access to low-level operations or hardware-specific features, enabling increased performance. In this way, operations can be performed across threads within a warp, also known as a wavefront. The following code example is an example with��

]]> 0 Leroy Sikkes <![CDATA[Advanced API Performance: Descriptors]]> http://www.open-lab.net/blog/?p=71317 2023-11-02T20:23:13Z 2023-10-27T16:00:00Z

By using descriptor types, you can bind resources to shaders and specify how those resources are accessed. This creates efficient communication between the CPU...]]>

By using descriptor types, you can bind resources to shaders and specify how those resources are accessed. This creates efficient communication between the CPU... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

By using descriptor types, you can bind resources to shaders and specify how those resources are accessed. This creates efficient communication between the CPU and GPU and enables shaders to access the necessary data during rendering.

]]> 0 Leroy Sikkes <![CDATA[Advanced API Performance: Debugging]]> http://www.open-lab.net/blog/?p=71308 2024-08-28T17:34:51Z 2023-10-13T16:00:00Z

NVIDIA offers a large suite of tools for graphics debugging, including NVIDIA Nsight System for CPU debugging, and Nsight Graphics for GPU debugging. Nsight...]]>

NVIDIA offers a large suite of tools for graphics debugging, including NVIDIA Nsight System for CPU debugging, and Nsight Graphics for GPU debugging. Nsight... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

NVIDIA offers a large suite of tools for graphics debugging, including NVIDIA Nsight System for CPU debugging, and Nsight Graphics for GPU debugging. Nsight Aftermath is useful for analyzing crash dumps. Thanks to Patrick Neill, Jeffrey Kiel, Justin Kim, Andrew Allan, and Louis Bavoil for their help with this post.

]]> 0 Johannes Deligiannis <![CDATA[Advanced API Performance: Shaders]]> http://www.open-lab.net/blog/?p=70243 2023-10-25T23:52:32Z 2023-09-01T15:36:30Z

This post covers best practices when working with shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced...]]>

This post covers best practices when working with shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices when working with shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Shaders play a critical role in graphics programming by enabling you to control various aspects of the rendering process. They run on the GPU and are responsible for manipulating vertices, pixels, and other data.

]]> 0 Tim Cheblokov <![CDATA[Advanced API Performance: Pipeline State Objects]]> http://www.open-lab.net/blog/?p=67779 2023-10-02T05:00:51Z 2023-07-18T19:00:00Z

This post covers best practices when working with pipeline state objects on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see...]]>

This post covers best practices when working with pipeline state objects on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices when working with pipeline state objects on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Pipeline state objects (PSOs) define how input data is interpreted and rendered by the hardware when submitting work to the GPUs. Proper management of PSOs is essential for optimal usage of system��

]]> 0 Joseph Cavanaugh <![CDATA[Advanced API Performance: CPUs]]> http://www.open-lab.net/blog/?p=64153 2023-10-02T05:00:51Z 2023-05-17T18:00:00Z

This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]>

This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. To get the best performance from your NVIDIA GPU, pair it with efficient work delegation on the CPU. Frame-rate caps, stutter, and other subpar application performance events can often be traced back to a bottleneck on the CPU.

]]> 0 Yury Uralsky <![CDATA[Advanced API Performance: Sampler Feedback]]> http://www.open-lab.net/blog/?p=62908 2023-10-02T05:02:21Z 2023-05-04T17:11:42Z

This post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]>

This post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Sampler feedback is a DirectX 12 Ultimate feature for capturing and recording texture sampling information and locations. Sampler feedback was designed to provide better support for streaming and texture-space shading.

]]> 0 Ana Mihut <![CDATA[Advanced API Performance: Vulkan Clearing and Presenting]]> http://www.open-lab.net/blog/?p=48112 2023-10-02T05:00:52Z 2022-07-01T15:09:39Z

This post covers best practices for Vulkan clearing and presenting on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all...]]>

This post covers best practices for Vulkan clearing and presenting on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for Vulkan clearing and presenting on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. With the recent Vulkan 1.3 release, it��s timely to add some Vulkan-specific tips that are not necessarily explicitly covered by the other Advanced API Performance posts. In addition to introducing new Vulkan 1.3��

]]> 1 Ryan Prescott <![CDATA[Advanced API Performance: SetStablePowerState]]> http://www.open-lab.net/blog/?p=48106 2024-08-28T17:45:35Z 2022-06-28T15:00:00Z

This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]>

This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Most modern processors, including GPUs, change processor core and memory clock rates during application execution. These changes can vary performance, introducing errors in measurements and rendering comparisons��

]]> 14 Justin Kim <![CDATA[Advanced API Performance: Variable Rate Shading]]> http://www.open-lab.net/blog/?p=36325 2023-10-02T05:00:53Z 2022-05-16T21:42:00Z

This post covers best practices for variable rate shading on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]>

This post covers best practices for variable rate shading on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for variable rate shading on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Variable rate shading (VRS) is a graphics feature allowing applications to control the frequency of pixel shader invocations independent of the resolution of the render target. It is available in both D3D12 and Vulkan.

]]> 1 Ivan Belyavtsev <![CDATA[Advanced API Performance: Clears]]> http://www.open-lab.net/blog/?p=34146 2023-10-02T05:00:53Z 2022-05-11T22:51:00Z

This post covers best practices for clears on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips....]]>

This post covers best practices for clears on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for clears on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Surface clearing is a widely used accessory operation. Thanks to Michael Murphy, Maurice Harris, Dmitry Zhdan, and Patric Neil for their advice and feedback.

]]> 1 Ana Mihut <![CDATA[Advanced API Performance: Mesh Shaders]]> http://www.open-lab.net/blog/?p=35887 2023-10-02T05:00:54Z 2021-10-25T16:10:00Z

This post covers best practices for mesh shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance...]]>

This post covers best practices for mesh shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for mesh shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Mesh shaders are a recent addition to the programmatical pipeline and aim to overcome the bottlenecks of the fixed layout used by the classical geometry pipeline. This post covers best practices for both DirectX and Vulkan��

]]> 0 Andrew Allan <![CDATA[Advanced API Performance: Memory and Resources]]> http://www.open-lab.net/blog/?p=35933 2023-10-02T05:00:55Z 2021-10-25T16:05:00Z

This post covers best practices for memory and resources on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]>

This post covers best practices for memory and resources on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for memory and resources on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Optimal memory management in DirectX 12 is critical to a performant application. The following advice should be followed for the best performance while avoiding stuttering.

]]> 1 Wessam Bahnassi <![CDATA[Advanced API Performance: Command Buffers]]> http://www.open-lab.net/blog/?p=34148 2023-10-02T05:00:55Z 2021-10-25T16:00:00Z

This post covers best practices for command buffers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]>

This post covers best practices for command buffers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for command buffers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Command buffers are the main mechanism for sending commands from the CPU to be executed on the GPU. By following the best practices listed in this post, you can achieve performance gains on both the CPU and the GPU by��

]]> 1 Jiho Choi <![CDATA[Advanced API Performance: Barriers]]> http://www.open-lab.net/blog/?p=33064 2023-10-02T05:00:56Z 2021-10-22T23:49:00Z

This post covers best practices for barriers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance...]]>

This post covers best practices for barriers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for barriers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. For the best performance on our hardware, here��s what you should and shouldn��t do when you��re using barriers with DX12 or Vulkan. This is updated from DX12 Do��s And Don��ts.

]]> 1 Katherine Sun <![CDATA[Advanced API Performance: Async Copy]]> http://www.open-lab.net/blog/?p=33041 2023-10-02T05:00:56Z 2021-10-22T23:47:00Z

This post covers best practices for async copy on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance...]]>

This post covers best practices for async copy on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for async copy on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Async copy runs on completely independent hardware but you have to schedule it onto the separate queue. You can consider turning an async copy into an async compute as a performance strategy. NVIDIA has a dedicated async copy��

]]> 2 Vladimir Bondarev <![CDATA[Advanced API Performance: Async Compute and Overlap]]> http://www.open-lab.net/blog/?p=33048 2023-10-02T05:00:57Z 2021-10-22T23:45:00Z

This post covers best practices for async compute and overlap on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]>

This post covers best practices for async compute and overlap on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API... A graphic of a computer sending code to multiple stacks.

A graphic of a computer sending code to multiple stacks.

This post covers best practices for async compute and overlap on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. The general principle behind async compute is to increase the overall unit throughput by reducing the number of unused warp slots and to facilitate the simultaneous use of nonconflicting datapaths.

]]> 1 ��˳��97caoporen��