NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The framework boosts the number of requests served by up to 30x, when running the open-source DeepSeek-R1 models on NVIDIA Blackwell.
]]>Editor’s Note: An updated version of this, with additional tutorial content, is now available. See “How to Speed Up Deep Learning Using TensorRT“. NVIDIA TensorRT is a high-performance deep learning inference library for production environments. Power efficiency and speed of response are two key metrics for deployed deep learning applications, because they directly affect the user experience…
]]>Over the last few years there has been a dramatic rise in the use of containers for deploying data center applications at scale. The reason for this is simple: containers encapsulate an application’s dependencies to provide reproducible and reliable execution of applications and services without the overhead of a full virtual machine. If you have ever spent a day provisioning a server with a…
]]>[Update September 13, 2016: GPU Inference Engine is now TensorRT] Today at ICML 2016, NVIDIA announced its latest Deep Learning SDK updates, including DIGITS 4, cuDNN 5.1 (CUDA Deep Neural Network Library) and the new GPU Inference Engine. NVIDIA GPU Inference Engine (GIE) is a high-performance deep learning inference solution for production environments. Power efficiency and speed of response…
]]>