This is the second part of a two-part series about NVIDIA tools that allow you to run large transformer models for accelerated inference. For an introduction to the FasterTransformer library (Part 1), see Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates…
]]>Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages and libraries. Since then, programming paradigms have evolved and so has the NVIDIA HPC SDK. In this post, I demonstrate five ways to implement a simple SAXPY computation using NVIDIA GPUs. Why is this interesting?
]]>If you’re building unique AI/DL application, you are constantly looking to train and deploy AI models from various frameworks like TensorFlow, PyTorch, TensorRT, and others quickly and effectively. Whether it’s deployment using the cloud, datacenters, or the edge, NVIDIA Triton Inference Server enables developers to deploy trained models from any major framework such as TensorFlow, TensorRT…
]]>With the introduction of Intel Thunderbolt 3 in laptops, you can now use an external GPU (eGPU) enclosure to use a dedicated GPU for gaming, production, and data science. A Thunderbolt 3 eGPU setup consists of Most enclosures provide all of these, so all you need to use them is a laptop with Thunderbolt 3. Because I value the portability of a thin and light laptop but want the raw…
]]>