Andy Adinets – NVIDIA Technical Blog

Andy Adinets – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2022-08-21T23:51:42Z http://www.open-lab.net/blog/feed/ Andy Adinets <![CDATA[Sparse Forests with FIL]]> http://www.open-lab.net/blog/?p=31794 2022-08-21T23:51:42Z 2021-05-21T20:30:00Z

Introduction The RAPIDS Forest Inference Library, affectionately known as FIL, dramatically accelerates inference (prediction) for tree-based models, including...]]>

This post was originally published on the RAPIDS AI blog. The RAPIDS Forest Inference Library, affectionately known as FIL, dramatically accelerates inference (prediction) for tree-based models, including gradient-boosted decision tree models (like those from XGBoost and LightGBM) and random forests. (For a deeper dive into the library overall, check out the original FIL blog.

]]> 0 Andy Adinets <![CDATA[CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics]]> http://www.open-lab.net/blog/parallelforall/?p=3906 2022-08-21T23:37:27Z 2014-10-02T05:57:09Z

Note: This post has been updated (November 2017) for CUDA 9 and the latest GPUs. The NVCC compiler now performs warp aggregation for atomics automatically in...]]>

Note: This post has been updated (November 2017) for CUDA 9 and the latest GPUs. The NVCC compiler now performs warp aggregation for atomics automatically in many cases, so you can get higher performance with no extra effort. In fact, the code generated by the compiler is actually faster than the manually-written warp aggregation code. This post is mainly intended for those who want to learn how…

]]> 8 Andy Adinets <![CDATA[A CUDA Dynamic Parallelism Case Study: PANDA]]> http://www.open-lab.net/blog/parallelforall/?p=3278 2022-08-21T23:37:05Z 2014-06-12T16:15:12Z

This post concludes an introductory series on CUDA dynamic parallelism. In this post, I finish the series with a case study on an?online track reconstruction...]]>

This post concludes an introductory series on CUDA dynamic parallelism. In this post, I finish the series with a case study on an online track reconstruction algorithm for the high-energy physics PANDA experiment. The PANDA work was carried out in the scope of the NVIDIA Application Lab at Jülich. PANDA (anti-Proton ANnihilation at DArmstadt) is a state-of-the-art hadron…

]]> 2 Andy Adinets <![CDATA[CUDA Dynamic Parallelism API and Principles]]> http://www.open-lab.net/blog/parallelforall/?p=3163 2022-08-21T23:37:04Z 2014-05-20T18:06:01Z

This post is the second in a series on CUDA Dynamic Parallelism. In my first post, I introduced Dynamic Parallelism by using it to compute images of the...]]>

]]> 19 Andy Adinets <![CDATA[Adaptive Parallel Computation with CUDA Dynamic Parallelism]]> http://www.open-lab.net/blog/parallelforall/?p=2523 2022-08-21T23:37:02Z 2014-05-06T20:48:13Z

Early CUDA programs had to conform to a flat, bulk parallel programming model. Programs had to perform a sequence of kernel launches, and for best performance...]]>

Early CUDA programs had to conform to a flat, bulk parallel programming model. Programs had to perform a sequence of kernel launches, and for best performance each kernel had to expose enough parallelism to efficiently use the GPU. For applications consisting of “parallel for” loops the bulk parallel model is not too limiting, but some parallel patterns—such as nested parallelism—cannot be…

]]> 7 ��˳��97caoporen��