Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Deep learning is revolutionizing the way that industries are delivering products and services. These services include object detection, classification, and segmentation for computer vision, and text extraction, classification��
]]>If there��s one constant in AI and deep learning, it��s never-ending optimization to wring every possible bit of performance out of a given platform. Many inference applications benefit from reduced precision, whether it��s mixed precision for recurrent neural networks (RNNs) or INT8 for convolutional neural networks (CNNs), where applications can get 3x+ speedups. NVIDIA��s Turing architecture��
]]>Inference is where AI goes to work. Identifying diseases. Answering questions. Recommending products and services. The inference market is also diffuse, and will happen everywhere from the data center to edge to IoT devices across multiple use-cases including image, speech and recommender systems to name a few. As a result, creating a benchmark to measure the performance of these diverse platforms��
]]>Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require substantial training using vast datasets to achieve high levels of accuracy. NVIDIA GPUs excel at the parallel compute performance required to train large networks in order to generate datasets for object detection inference.
]]>Our most popular question is ��What can I do to get great GPU performance for deep learning?�� We��ve recently published a detailed Deep Learning Performance Guide to help answer this question. The guide explains how GPUs process data and gives tips on how to design networks for better performance. We also take a close look at Tensor Core optimization to help improve performance. This post takes a��
]]>