Swap chains are an integral part of how you get rendering data output to a screen. They usually consist of some group of output-ready buffers, each of which can be rendered to one at a time in rotation. In parallel with rendering to one of a swap chain��s buffers, some other buffer in the swap chain is generally read from for display output. This post covers best practices when working with��
]]>Intrinsics can be thought of as higher-level abstractions of specific hardware instructions. They offer direct access to low-level operations or hardware-specific features, enabling increased performance. In this way, operations can be performed across threads within a warp, also known as a wavefront. The following code example is an example with��
]]>By using descriptor types, you can bind resources to shaders and specify how those resources are accessed. This creates efficient communication between the CPU and GPU and enables shaders to access the necessary data during rendering.
]]>NVIDIA offers a large suite of tools for graphics debugging, including NVIDIA Nsight System for CPU debugging, and Nsight Graphics for GPU debugging. Nsight Aftermath is useful for analyzing crash dumps. Thanks to Patrick Neill, Jeffrey Kiel, Justin Kim, Andrew Allan, and Louis Bavoil for their help with this post.
]]>This post covers best practices when working with shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Shaders play a critical role in graphics programming by enabling you to control various aspects of the rendering process. They run on the GPU and are responsible for manipulating vertices, pixels, and other data.
]]>This post covers best practices when working with pipeline state objects on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Pipeline state objects (PSOs) define how input data is interpreted and rendered by the hardware when submitting work to the GPUs. Proper management of PSOs is essential for optimal usage of system��
]]>This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. To get the best performance from your NVIDIA GPU, pair it with efficient work delegation on the CPU. Frame-rate caps, stutter, and other subpar application performance events can often be traced back to a bottleneck on the CPU.
]]>This post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Sampler feedback is a DirectX 12 Ultimate feature for capturing and recording texture sampling information and locations. Sampler feedback was designed to provide better support for streaming and texture-space shading.
]]>This post covers best practices for Vulkan clearing and presenting on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. With the recent Vulkan 1.3 release, it��s timely to add some Vulkan-specific tips that are not necessarily explicitly covered by the other Advanced API Performance posts. In addition to introducing new Vulkan 1.3��
]]>This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Most modern processors, including GPUs, change processor core and memory clock rates during application execution. These changes can vary performance, introducing errors in measurements and rendering comparisons��
]]>This post covers best practices for variable rate shading on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Variable rate shading (VRS) is a graphics feature allowing applications to control the frequency of pixel shader invocations independent of the resolution of the render target. It is available in both D3D12 and Vulkan.
]]>This post covers best practices for clears on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Surface clearing is a widely used accessory operation. Thanks to Michael Murphy, Maurice Harris, Dmitry Zhdan, and Patric Neil for their advice and feedback.
]]>This post covers best practices for mesh shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Mesh shaders are a recent addition to the programmatical pipeline and aim to overcome the bottlenecks of the fixed layout used by the classical geometry pipeline. This post covers best practices for both DirectX and Vulkan��
]]>This post covers best practices for memory and resources on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Optimal memory management in DirectX 12 is critical to a performant application. The following advice should be followed for the best performance while avoiding stuttering.
]]>This post covers best practices for command buffers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Command buffers are the main mechanism for sending commands from the CPU to be executed on the GPU. By following the best practices listed in this post, you can achieve performance gains on both the CPU and the GPU by��
]]>This post covers best practices for barriers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. For the best performance on our hardware, here��s what you should and shouldn��t do when you��re using barriers with DX12 or Vulkan. This is updated from DX12 Do��s And Don��ts.
]]>This post covers best practices for async copy on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Async copy runs on completely independent hardware but you have to schedule it onto the separate queue. You can consider turning an async copy into an async compute as a performance strategy. NVIDIA has a dedicated async copy��
]]>This post covers best practices for async compute and overlap on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. The general principle behind async compute is to increase the overall unit throughput by reducing the number of unused warp slots and to facilitate the simultaneous use of nonconflicting datapaths.
]]>