Many workloads can be sped up greatly by offloading compute-intensive parts onto GPUs. In CUDA terms, this is known as launching kernels. When those kernels are many and of short duration, launch overhead sometimes becomes a problem. One way of reducing that overhead is offered by CUDA Graphs. Graphs work because they combine arbitrary numbers of asynchronous CUDA API calls��
]]>