NVIDIA Releases Updates to CUDA-X AI Software

NVIDIA CUDA-X AI are deep learning libraries for researchers and software developers to build high performance GPU-accelerated applications for conversational AI, recommendation systems and computer vision.

Learn what’s new in the latest releases of CUDA-X AI libraries.

Refer to each package’s release notes in documentation for additional information.

NVIDIA Riva Open Beta

NVIDIA Riva is an application framework for multimodal conversational AI services that delivers real-time performance on GPUs. This version of Riva includes:

ASR, NLU, and TTS models trained on thousands of hours of speech data.
Transfer Learning Toolkit with zero coding approach to re-train on custom data.
Fully accelerated deep learning pipelines optimized to run as scalable services.
End-to-end workflow and tools to deploy services using one line of code.

Transfer Learning Toolkit 3.0 Developer Preview

NVIDIA released new pre-trained models for computer vision and conversational AI that can be easily fine-tuned with Transfer Learning Toolkit (TLT) 3.0 with a zero-coding approach.

Key highlights:

New vision AI pre-trained models: license plate detection and recognition, heart rate monitoring, gesture recognition, gaze estimation, emotion recognition, face detection, and facial landmark estimation?
Newly added support for automatic speech recognition (ASR) and natural language processing (NLP)?
Choice of training with popular network architectures such as EfficientNet, YoloV4, and UNET
Support for NVIDIA Ampere GPUs with third-generation tensor cores for performance boost?

Triton Inference Server 2.7

Triton Inference Server is an open source multi-framework, cross platform inference serving software designed to simplify model production deployment. Version 2.7 includes:

Model Analyzer – automatically finds best model configuration to maximize performance based on user-specified requirements?
Model Repo Agent API? – enables custom operations to be performed to models being loaded (such as decrypting, checksumming, applying TF-TRT optimization, etc)
Added support for ONNX Runtime backend in Triton Windows build
Added an example Java and Scala client based on GRPC-generated API

Read full release notes here.

TensorRT 7.2 is Now Available

NVIDIA TensorRT is a platform for high-performance deep learning inference. This version of TensorRT includes:

New Polygraphy toolkit, assists in prototyping and debugging deep learning models in various frameworks
Support for Python 3.8

Merlin Open Beta

Merlin is an application framework and ecosystem that enables end-to-end development of recommender systems, accelerated on NVIDIA GPUs. Merlin Open Beta highlights include:

NVTabular and HugeCTR inference support in Triton Inference Server
Cloud configurations and cloud support (AWS/GCP)
Dataset analysis and generation tools
New PythonAPI for HugeCTR similar to Keras with no JSON configuration anymore

DeepStream SDK 5.1

NVIDIA DeepStream SDK is a streaming analytics toolkit for AI-based multi-sensor processing.

Key highlights for DeepStream SDK 5.1 (General Availability)

New Python apps for using optical flow, segmentation networks, and analytics using ROI and line crossing
Support for audio analytics with a sample application highlighting audio classifier usage
Support for NVIDIA Ampere GPUs with third-generation tensor cores and various performance optimizations

nvJPEG2000 0.2

nvJPEG2000 is a new library for GPU-accelerated JPEG2000 image decoding. This version of nvJPEG2000 includes:

Support for multi-tile and multi-layer decoding.?
Partial decode by specifying an area of interest for increased efficiency
New API for commonly used GDAL interface for geospatial images
4x faster lossless decoding for 5-3 wavelet decoding and 7x faster loss decoding for 9-7 wavelet transform. Achieve further speed up by pipelining decoding of multiple images.

NVIDIA NeMo 1.0.0b4

NVIDIA NeMo is a toolkit to build, train and fine-tune state-of-the-art speech and language models easily. Highlights of this version include:

Compatible with Riva 1.0.0b2 public beta and TLT 3.0 releases

Deep Learning Examples

Deep Learning Examples provide state-of-the-art reference examples that are easy to train and deploy, achieving the best reproducible accuracy and performance with NVIDIA CUDA-X AI software stack running on NVIDIA Volta, Turing, and Ampere GPUs.

New Model Scripts available from the NGC Catalog:

nnUNet/PyT: A Self-adapting Framework for U-Net for state-of-the-art Segmentation across distinct entities, image modalities, image geometries, and dataset sizes, with no manual adjustments between datasets.?
Wide and Deep/TF2: Wide & Deep refers to a class of networks that use the output of two parts working in parallel – wide model and deep model – to make a binary prediction of CTR.?
EfficientNet PyT & TF2: A model that scales the depth, width, and resolution to achieve better performance across different datasets. EfficientNet B4 achieves state-of-the-art 82.78% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet.
Electra: A novel pre-training method for language representations which outperforms existing techniques, given the same compute budget on a wide array of Natural Language Processing (NLP) tasks.