Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of applications, including translation, digital assistants, recommendation systems, context analysis, code generation, cybersecurity, and more. In automotive applications, there is growing demand for LLM-based solutions for both autonomous driving and…
]]>The past decade has seen a remarkable surge in the adoption of deep learning techniques for computer vision (CV) tasks. Convolutional neural networks (CNNs) have been the cornerstone of this revolution, exhibiting exceptional performance and enabling significant advancements in visual perception. By employing localized filters and hierarchical architectures, CNNs have proven adept at…
]]>The training stage of deep learning (DL) models consists of learning numerous dense floating-point weight matrices, which results in a massive amount of floating-point computations during inference. Research has shown that many of those computations can be skipped by forcing some weights to be zero, with little impact on the final accuracy. In parallel to that, previous posts have shown that…
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. We’re excited to announce the NVIDIA Quantization-Aware Training (QAT) Toolkit for TensorFlow 2 with the goal of accelerating the quantized networks with NVIDIA TensorRT on NVIDIA GPUs. This toolkit provides you with an easy-to-use API to quantize…
]]>This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. NVIDIA TensorRT is an SDK for deep learning inference. TensorRT provides APIs and parsers to import trained models from all major deep learning frameworks. It then generates optimized runtime engines deployable in the datacenter as well as in automotive and embedded environments. This post provides a simple…
]]>After the first successes of deep learning, designing neural network architectures with desirable performance criteria for a given task (for example, high accuracy or low latency) has been a challenging problem. Some call it alchemy and some intuition, but the task of discovering a novel architecture often involves a tedious and costly trial-and-error process of searching in an exponentially large…
]]>TensorRT is an SDK for high performance, deep learning inference. It includes a deep learning inference optimizer and a runtime that delivers low latency and high throughput for deep learning applications. TensorRT uses the ONNX format as an intermediate representation for converting models from major frameworks such as TensorFlow and PyTorch. In this post, you learn how to convert PyTorch…
]]>Autonomous vehicles require fast and accurate perception of the surrounding environment in order to accomplish a wide set of tasks concurrently in real time. Systems need to handle the detection of obstacles, determine the boundaries of lanes, intersection detection, and sign recognition among many more functions over a large variety of environments, conditions, and situations and do this work…
]]>