Maxim Milakov – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2023-03-14T19:00:03Z http://www.open-lab.net/blog/feed/ Maxim Milakov <![CDATA[Neural Machine Translation Inference with TensorRT 4]]> http://www.open-lab.net/blog/?p=17146 2023-03-14T19:00:03Z 2018-07-18T19:00:00Z Neural machine translation exists across a wide variety consumer applications, including web sites, road signs, generating subtitles in foreign languages, and...]]>

Neural machine translation exists across a wide variety consumer applications, including web sites, road signs, generating subtitles in foreign languages, and more. TensorRT, NVIDIA’s programmable inference accelerator, helps optimize and generate runtime engines for deploying deep learning inference apps to production environments. NVIDIA released TensorRT 4 with new features to accelerate…

Source

]]>
2
Maxim Milakov <![CDATA[GPU Pro Tip: Fast Dynamic Indexing of Private Arrays in CUDA]]> http://www.open-lab.net/blog/parallelforall/?p=4893 2022-08-21T23:37:30Z 2015-02-11T09:16:03Z Sometimes you need to use small per-thread arrays in your GPU kernels. The performance of accessing elements in these arrays can vary depending on a number of...]]>

Sometimes you need to use small per-thread arrays in your GPU kernels. The performance of accessing elements in these arrays can vary depending on a number of factors. In this post I’ll cover several common scenarios ranging from fast static indexing to more complex and challenging use cases. Before discussing dynamic indexing let’s briefly look at static indexing. For small arrays where all…

Source

]]>
8
���˳���97caoporen����