In this post we introduce the “register cache”, an optimization technique that develops a virtual caching layer for threads in a single warp. It is a software abstraction implemented on top of the NVIDIA GPU shuffle primitive. This abstraction helps optimize kernels that use shared memory to cache thread inputs. When the kernel is transformed by applying this optimization, the data ends up being…
]]>The adoption of unmanned aerial systems (UAS) has been steadily growing over the last decade, and they have proven to be beneficial in a variety of fields, including agriculture, geographical mapping, aerial photography, and search and rescue. These systems, however, require a person in the loop for remote control, scene recognition, and data acquisition. This increases the cost of operation and…
]]>