NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined approach for building flexible and cost-effective accelerated infrastructure while ensuring compatibility and interoperability. The latest Enterprise RA details an optimized cluster configuration for systems integrated with NVIDIA GH200��
]]>NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the NVIDIA Blackwell architecture, fifth-generation Tensor Cores and fourth-generation RT Cores, the GeForce RTX 50 Series delivers breakthroughs in AI-driven rendering, including neural shaders, digital human technologies, geometry and lighting.
]]>Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.
]]>Data loading is a critical aspect of deep learning workflows, whether you��re focused on training or inference. However, it often presents a paradox: the need for a highly convenient solution that is simultaneously customizable. These two goals are notoriously difficult to reconcile. One of the traditional solutions to this problem is to scale out the processing and parallelize the user��
]]>Scientists have enabled a stroke survivor, who is unable to speak, to communicate in both Spanish and English by training a neuroprosthesis implant to decode his bilingual brain activity. The research, published in Nature Biomedical Engineering, comes from the lab of University of California, San Francisco professor Dr. Edward Chang. It builds on his groundbreaking work from 2021 with the��
]]>Due to the adoption of multicamera inputs and deep convolutional backbone networks, the GPU memory footprint for training autonomous driving perception models is large. Existing methods for reducing memory usage often result in additional computational overheads or imbalanced workloads. This post describes joint research between NVIDIA and NIO, a developer of smart electric vehicles.
]]>The NVIDIA PyG container, now generally available, packages PyTorch Geometric with accelerations for GNN models, dataloading, and pre-processing using cuGraph-Ops, cuGraph, and cuDF from NVIDIA RAPIDS, all with an effortless out-of-the-box experience.
]]>From credit card transactions, social networks, and recommendation systems to transportation networks and protein-protein interactions in biology, graphs are the go-to data structure for modeling and analyzing intricate connections. Graph neural networks (GNNs), with their ability to learn and reason over graph-structured data, have emerged as a game-changer across various domains. However��
]]>Efficiency is paramount in industrial manufacturing, where even minor gains can have significant financial implications. According to the American Society of Quality, ��Many organizations will have true quality-related costs as high as 15-20% of sales revenue, some going as high as 40% of total operations.�� These staggering statistics reveal a stark reality: defects in industrial applications not��
]]>Interested in developing LLM-based applications? Get started with this exploration of the open-source ecosystem.
]]>Learn how to train the largest neural networks and deploy them to production.
]]>The NVIDIA AI Red Team is focused on scaling secure development practices across the data, science, and AI ecosystems. We participate in open-source security initiatives, release tools, present at industry conferences, host educational competitions, and provide innovative training. Covering 3 years and totaling almost 140GB of source code, the recently released Meta Kaggle for Code dataset is��
]]>NVIDIA PhysicsNeMo is now part of the NVIDIA AI Enterprise suite, supporting PyTorch 2.0, CUDA 12, and new samples.
]]>A primary goal in the field of neuroscience is understanding how the brain controls movement. By improving pose estimation, neurobiologists can more precisely quantify natural movement and in turn, better understand the neural activity that drives it. This enhances scientists�� ability to characterize animal intelligence, social interaction, and health.
]]>NVIDIA PhysicsNeMo is a framework for building, training, and fine-tuning deep learning models for physical systems, otherwise known as physics-informed machine learning (physics-ML) models. PhysicsNeMo is available as OSS (Apache 2.0 license) to support the growing physics-ML community. The latest PhysicsNeMo software update, version 23.05, brings together new capabilities��
]]>The training stage of deep learning (DL) models consists of learning numerous dense floating-point weight matrices, which results in a massive amount of floating-point computations during inference. Research has shown that many of those computations can be skipped by forcing some weights to be zero, with little impact on the final accuracy. In parallel to that, previous posts have shown that��
]]>Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of examples in datasets. The traditional approach to data augmentation dates to statistical learning when the choice of augmentation relied on the domain knowledge, skill, and intuition of the engineers that set up the model training.
]]>Transformers are one of the most influential AI model architectures today and are shaping the direction of future AI R&D. First invented as a tool for natural language processing (NLP), transformers are now used in almost every AI task, including computer vision, automatic speech recognition, molecular structure classification, and financial data processing. In Korea��
]]>The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language processing (NLP). Dataiku helps you with labeling, model training, explainability, model deployment, and centralized management of code and code environments. This post dives into high-level Dataiku and NVIDIA integrations for image��
]]>Learn the basic concepts, implementations, and applications of graph neural networks (GNNs) in this new self-paced course from NVIDIA Deep Learning Institute.
]]>Learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.
]]>Join us on October 24 for a deep dive into MONAI, the essential framework for AI workflows in healthcare��including use cases, building blocks, and more.
]]>Join us for these GTC 2022 sessions to learn about optimizing PyTorch models, accelerating graph neural networks, improving GPU performance, and more.
]]>Research on neural fields has been an increasingly hot topic in computer graphics and computer vision in recent years. Neural fields can represent 3D data like shape, appearance, motion, and other physical quantities by using a neural network that takes coordinates as input and outputs the corresponding data at that location. These representations have been proven to be useful in various��
]]>If you��re an active Jetson developer, you know that one of the key benefits of NVIDIA Jetson is that it combines a CPU and GPU into a single module, giving you the expansive NVIDIA software stack in a small, low-power package that can be deployed at the edge. Jetson also features a variety of other processors, including hardware accelerated encoders and decoders, an image signal processor��
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. Imagine that you have trained your model with PyTorch, TensorFlow, or the framework of your choice, are satisfied with its accuracy, and are considering deploying it as a��
]]>Computer vision is a rapidly growing field in research and applications. Advances in computer vision research are now more directly and immediately applicable to the commercial world. AI developers are implementing computer vision solutions that identify and classify objects and even react to them in real time. Image classification, face detection, pose estimation, and optical flow are some��
]]>Big data, new algorithms, and fast computation are three main factors that make the modern AI revolution possible. However, data poses many challenges for enterprises: difficulty in data labeling, ineffective data governance, limited data availability, data privacy, and so on. Synthetically generated data is a potential solution to address these challenges because it generates data points by��
]]>In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary combinations of forecasting models, time-series datasets, and other configurations. The TSPP also provides functionality to explore the hyperparameter search space, run accelerated model training using distributed training and Automatic Mixed��
]]>The NVIDIA NGC catalog is a hub for GPU-optimized deep learning, machine learning, and HPC applications. With highly performant software containers, pretrained models, industry-specific SDKs, and Jupyter Notebooks the content helps simplify and accelerate end-to-end workflows. New features, software, and updates to help you streamline your workflow and build your solutions faster on NGC��
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Today NVIDIA released TensorRT 8.2, with optimizations for billion parameter NLU models. These include T5 and GPT-2, used for translation and text generation, making it possible to run NLU apps in real time. TensorRT is a high-performance��
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. I��m excited about Torch-TensorRT, the new integration of PyTorch with NVIDIA TensorRT, which accelerates the inference with one line of code. PyTorch is a leading deep learning framework today, with millions of users worldwide. TensorRT is an SDK��
]]>In MLPerf HPC v1.0, NVIDIA-powered systems won four of five new industry metrics focused on AI performance in HPC. As an industry-wide AI consortium, MLPerf HPC evaluates a suite of performance benchmarks covering a range of widely used AI workloads. In this round, NVIDIA delivered 5x better results for CosmoFlow, and 7x more performance on DeepCAM, compared to strong scaling results from��
]]>NVIDIA GTC is the must attend AI conference for developers. It��s a place where practitioners, leaders, and innovators share their ideas about the latest trends in data science. Here are six top data science GTC sessions worth attending. Thursday, Nov 11, 5:00 AM �C 5:25 AM PST Domino��s Pizza delivers thousands of pizzas a day and requires real-time planning and logistics capabilities.
]]>SE(3)-Transformers are versatile graph neural networks unveiled at NeurIPS 2020. NVIDIA just released an open-source optimized implementation that uses 43x less memory and is up to 21x faster than the baseline official implementation. SE(3)-Transformers are useful in dealing with problems with geometric symmetries, like small molecules processing, protein refinement��
]]>The NGC team is hosting a webinar with live Q&A to dive into how to build AI models using PyTorch Lightning, an AI framework built on top of PyTorch, from the NGC catalog. Simplify and Accelerate AI Model Development with PyTorch Lightning, NGC, and AWS September 2 at 10 a.m. PT Organizations across industries are using AI to help build better products, streamline operations��
]]>The NGC team is hosting a webinar with live Q&A to dive into how to build AI models using PyTorch Lightning, an AI framework built on top of PyTorch, from the NGC catalog. Simplify and Accelerate AI Model Development with PyTorch Lightning, NGC, and AWS September 2 at 10 a.m. PT Organizations across industries are using AI to help build better products, streamline operations��
]]>In part 1 of this series, we introduced new API functions, and , that enable memory allocation and deallocation to be stream-ordered operations. In this post, we highlight the benefits of this new capability by sharing some big data benchmark results and provide a code migration guide for modifying your existing applications. We also cover advanced topics to take advantage of stream-ordered��
]]>Most CUDA developers are familiar with the and API functions to allocate GPU accessible memory. However, there has long been an obstacle with these API functions: they aren��t stream ordered. In this post, we introduce new API functions, and , that enable memory allocation and deallocation to be stream-ordered operations. In part 2 of this series, we highlight the benefits of this new��
]]>King��s College London, along with partner hospitals and university collaborators, unveiled new details today about one of the first projects on Cambridge-1, the United Kingdom��s most powerful supercomputer. The Synthetic Brain Project is focused on building deep learning models that can synthesize artificial 3D MRI images of human brains. These models can help scientists understand what a human��
]]>Solving a mystery that stumped scientists for decades, last November a group of computational biologists from Alphabet��s DeepMind used AI to predict a protein��s structure from its amino acid sequence. Not even a year later, a new study offers a more powerful model, capable of computing protein structures in as little as 10 minutes, on one gaming computer. The research��
]]>Deep learning research requires working at scale. Training on massive data sets or multilayered deep networks is computationally intensive and can take an impractically long time as deep learning models are bound by memory. The key here is to compose the deep learning models in a structured way so that they are decoupled from the engineering and data, enabling researchers to conduct fast research.
]]>NVIDIA recently announced Morpheus, an AI application framework that provides cybersecurity developers with a highly optimized AI pipeline and pre-trained AI capabilities. Morpheus allows developers for the first time to instantaneously inspect all IP network communications through their data center fabric. Attacks are becoming more and more frequent and dangerous despite the advancements in��
]]>This post was originally published on the RAPIDS AI blog. In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a format that PyTorch can process so that we can explore the performance of deep learning on tabular data and compare it to the xgboost method. Deep learning is��
]]>Recently, NVIDIA CEO Jensen Huang announced updates to the open beta of NVIDIA Merlin, an end-to-end framework that democratizes the development of large-scale deep learning recommenders. With NVIDIA Merlin, data scientists, machine learning engineers, and researchers can accelerate their entire workflow pipeline from ingesting and training to deploying GPU-accelerated recommenders (Figure 1).
]]>Software profiling is key for achieving the best performance on a system and that��s true for the data science and machine learning applications as well. In the era of GPU-accelerated deep learning, when profiling deep neural networks, it is important to understand CPU, GPU, and even memory bottlenecks, which could cause slowdowns in training or inference. In this post��
]]>Imagine that you have just finished implementing an awesome, interactive, deep learning pipeline on your NVIDIA-accelerated data science workstation using OpenCV for capturing your webcam stream and rendering the output. A colleague of yours mentions that exploiting the novel TF32 compute mode of the Ampere microarchitecture third-generation Tensor Cores might significantly accelerate your��
]]>AI is going mainstream and is quickly becoming pervasive in every industry��from autonomous vehicles to drug discovery. However, developing and deploying AI applications is a challenging endeavor. The process requires building a scalable infrastructure by combining hardware, software, and intricate workflows, which can be time-consuming as well as error-prone. To accelerate the end-to-end AI��
]]>WSL2 is available on Windows 11 outside the Windows Insider Preview. For more information about what is supported, see the CUDA on WSL User Guide. In response to popular demand, Microsoft announced a new feature of the Windows Subsystem for Linux 2 (WSL 2)��GPU acceleration��at the Build conference in May 2020. This feature opens the gate for many compute applications, professional tools��
]]>This post is the second in a series (Part 1) that addresses the challenges of training an accurate deep learning model using a large public dataset and deploying the model on the edge for real-time inference using NVIDIA DeepStream. In the previous post, you learned how to train a RetinaNet network with a ResNet34 backbone for object detection. This included pulling a container��
]]>Some of the biggest challenges in deploying an AI-based application are the accuracy of the model and being able to extract insights in real time. There��s a trade-off between accuracy and inference throughput. Making the model more accurate makes the model larger which reduces the inference throughput. This post series addresses both challenges. In part 1, you train an accurate��
]]>Most deep learning frameworks, including PyTorch, train using 32-bit floating point (FP32) arithmetic by default. However, using FP32 for all operations is not essential to achieve full accuracy for many state-of-the-art deep neural networks (DNNs). In 2017, NVIDIA researchers developed a methodology for mixed-precision training in which a few operations are executed in FP32 while the majority��
]]>The pace of AI adoption across diverse industries depends on maximizing data scientists�� productivity. NVIDIA releases optimized NGC containers every month with improved performance for deep learning frameworks and libraries, helping scientists maximize their potential. NVIDIA continuously invests in the full data science stack, including GPU architecture, systems, and software stacks.
]]>From Siri to Google Translate, deep neural networks have enabled breakthroughs in machine understanding of natural language. Most of these models treat language as a flat sequence of words or characters, and use a kind of model called a recurrent neural network (RNN) to process this sequence. But many linguists think that language is best understood as a hierarchical tree of phrases��
]]>