The rapid development of solutions using retrieval augmented generation (RAG) for question-and-answer LLM workflows has led to new types of system architectures. Our work at NVIDIA using AI for internal operations has led to several important findings for finding alignment between system capabilities and user expectations. We found that regardless of the intended scope or use case…
]]>Expanding the open-source Meta Llama collection of models, the Llama 3.2 collection includes vision language models (VLMs), small language models (SLMs), and an updated Llama Guard model with support for vision. When paired with the NVIDIA accelerated computing platform, Llama 3.2 offers developers, researchers, and enterprises valuable new capabilities and optimizations to realize their…
]]>Join our contest that runs through June 17 and showcase your innovation using cutting-edge generative AI-powered applications using NVIDIA and LangChain technologies. To get you started, we explore a few applications for inspiring your creative journey, while sharing tips and best practices to help you succeed in the development process. There are many different practical applications…
]]>At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code. Update: RAPIDS cuDF now instantly accelerates pandas with zero code changes in Google Colab. Try out the tutorial in a Colab notebook today. pandas, a flexible and powerful data analysis and manipulation library for Python…
]]>If you are looking to take your machine learning (ML) projects to new levels of speed and scalability, GPU-accelerated data analytics can help you deliver insights quickly with breakthrough performance. From faster computation to efficient model training, GPUs bring many benefits to everyday ML tasks. Update: The below blog describes how to use GPU-only RAPIDS cuDF…
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. Imagine that you have trained your model with PyTorch, TensorFlow, or the framework of…
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Today NVIDIA released TensorRT 8.2, with optimizations for billion parameter NLU models. These include T5 and GPT-2, used for translation and text generation, making it possible to run NLU apps in real time. TensorRT is a high-performance…
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. The transformer architecture has wholly transformed (pun intended) the domain of natural language processing (NLP). Over the recent years, many novel network architectures have been built on the transformer building blocks: BERT, GPT, and T5…
]]>At NVIDIA GTC this November, new software tools were announced that help developers build real-time speech applications, optimize inference for a variety of use-cases, optimize open-source interoperability for recommender systems, and more. Watch the keynote from CEO, Jensen Huang, to learn about the latest NVIDIA breakthroughs. Today, NVIDIA unveiled a new version of NVIDIA Riva with a…
]]>Join NVIDIA November 8-11, showcasing over 500 GTC sessions covering the latest breakthroughs in AI and deep learning, as well as many other GPU technology interest areas. Below is a preview of some of the top AI and deep learning sessions including topics such as training, inference, frameworks, and tools—featuring speakers from NVIDIA. Deep Learning Demystified: AI has evolved and…
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Today, NVIDIA announced TensorRT 8.0 which brings BERT-Large inference latency down to 1.2 ms with new optimizations. This version also delivers 2x the accuracy for INT8 precision with Quantization Aware Training…
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Deep learning is revolutionizing the way that industries are delivering products and services. These services include object detection, classification, and segmentation for computer vision, and text extraction, classification…
]]>This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. When deploying a neural network, it’s useful to think about how the network could be made to run faster or take less space. A more efficient network can make better…
]]>