Sometimes you need to use small per-thread arrays in your GPU kernels. The performance of accessing elements in these arrays can vary depending on a number of factors. In this post I��ll cover several common scenarios ranging from fast static indexing to more complex and challenging use cases. Before discussing dynamic indexing let��s briefly look at static indexing. For small arrays where all��
]]>