PyTorch – NVIDIA Technical Blog

PyTorch – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-30T21:51:33Z http://www.open-lab.net/blog/feed/ Karin Sevegnani <![CDATA[Advanced Optimization Strategies for LLM Training on NVIDIA Grace Hopper]]> http://www.open-lab.net/blog/?p=100702 2025-05-29T17:30:40Z 2025-05-27T17:31:00Z

In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training...]]>

In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training...

grace-hopper-superchip

In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training workflows and analyzed bottlenecks using NVIDIA Nsight Systems. We also discussed how the NVIDIA GH200 Grace Hopper Superchip enables efficient training processes. While profiling helps identify inefficiencies��

]]> 0 Janusz Lisiecki <![CDATA[Unlock Efficient Data Processing with the Latest from NVIDIA DALI]]> http://www.open-lab.net/blog/?p=100756 2025-05-29T17:30:42Z 2025-05-23T19:27:21Z

NVIDIA DALI, a portable, open source software library for decoding and augmenting images, videos, and speech, recently introduced several features that improve...]]>

NVIDIA DALI, a portable, open source software library for decoding and augmenting images, videos, and speech, recently introduced several features that improve... A decorative image.

A decorative image.

NVIDIA DALI, a portable, open source software library for decoding and augmenting images, videos, and speech, recently introduced several features that improve performance and enable DALI with new use cases. These updates aim at simplifying the integration of DALI into existing PyTorch data processing logic, improving flexibility in building data processing pipelines by enabling CPU-to-GPU flows��

]]> 0 Leigh Engel <![CDATA[Simplify System Memory Management with the Latest NVIDIA GH200 NVL2 Enterprise RA]]> http://www.open-lab.net/blog/?p=96079 2025-04-23T02:45:13Z 2025-02-13T21:26:30Z

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined...]]>

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined...

nvidia-gh200-nvl2

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined approach for building flexible and cost-effective accelerated infrastructure while ensuring compatibility and interoperability. The latest Enterprise RA details an optimized cluster configuration for systems integrated with NVIDIA GH200��

]]> 2 Annamalai Chockalingam <![CDATA[New AI SDKs and Tools Released for NVIDIA Blackwell GeForce RTX 50 Series GPUs]]> http://www.open-lab.net/blog/?p=95526 2025-04-23T15:00:41Z 2025-01-30T14:00:00Z

NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the...]]>

NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the...

media-entertainment-laptop-desktop-gpus

NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the NVIDIA Blackwell architecture, fifth-generation Tensor Cores and fourth-generation RT Cores, the GeForce RTX 50 Series delivers breakthroughs in AI-driven rendering, including neural shaders, digital human technologies, geometry and lighting.

]]> 0 William Hill <![CDATA[Just Released: Torch-TensorRT v2.4.0]]> http://www.open-lab.net/blog/?p=89229 2024-09-19T17:50:49Z 2024-09-19T17:50:46Z

Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.]]>

Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.

Torch-TensorRT v2.4.0

Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.

]]> 0 Micha? Szo?ucha <![CDATA[Improved Data Loading with Threads]]> http://www.open-lab.net/blog/?p=88657 2024-09-19T19:30:59Z 2024-09-13T16:00:00Z

Data loading is a critical aspect of deep learning workflows, whether you're focused on training or inference. However, it often presents a paradox: the need...]]>

Data loading is a critical aspect of deep learning workflows, whether you're focused on training or inference. However, it often presents a paradox: the need... Decorative image

Decorative image

Data loading is a critical aspect of deep learning workflows, whether you��re focused on training or inference. However, it often presents a paradox: the need for a highly convenient solution that is simultaneously customizable. These two goals are notoriously difficult to reconcile. One of the traditional solutions to this problem is to scale out the processing and parallelize the user��

]]> 0 Elias Wolfberg <![CDATA[AI Brain Implant Restores Bilingual Communication for Stroke Survivor]]> http://www.open-lab.net/blog/?p=84040 2024-06-27T18:17:55Z 2024-06-20T15:57:05Z

Scientists have enabled a stroke survivor, who is unable to speak, to communicate in both Spanish and English by training a neuroprosthesis implant to decode...]]>

Scientists have enabled a stroke survivor, who is unable to speak, to communicate in both Spanish and English by training a neuroprosthesis implant to decode...

bilingual-ezgif.com-optimize

Scientists have enabled a stroke survivor, who is unable to speak, to communicate in both Spanish and English by training a neuroprosthesis implant to decode his bilingual brain activity. The research, published in Nature Biomedical Engineering, comes from the lab of University of California, San Francisco professor Dr. Edward Chang. It builds on his groundbreaking work from 2021 with the��

]]> 0 Tian Cao <![CDATA[Perception Model Training for Autonomous Vehicles with Tensor Parallelism]]> http://www.open-lab.net/blog/?p=81464 2024-05-02T19:01:07Z 2024-04-27T05:00:00Z

Due to the adoption of multicamera inputs and deep convolutional backbone networks, the GPU memory footprint for training autonomous driving perception models...]]>

Due to the adoption of multicamera inputs and deep convolutional backbone networks, the GPU memory footprint for training autonomous driving perception models...

car-on-road-cityscape

Due to the adoption of multicamera inputs and deep convolutional backbone networks, the GPU memory footprint for training autonomous driving perception models is large. Existing methods for reducing memory usage often result in additional computational overheads or imbalanced workloads. This post describes joint research between NVIDIA and NIO, a developer of smart electric vehicles.

]]> 0 Rishi Puri <![CDATA[Release: PyTorch Geometric Container for GNNs on NGC]]> http://www.open-lab.net/blog/?p=76597 2024-06-06T16:17:50Z 2024-01-17T23:05:40Z

The NVIDIA PyG container, now generally available, packages PyTorch Geometric with accelerations for GNN models, dataloading, and pre-processing using...]]>

The NVIDIA PyG container, now generally available, packages PyTorch Geometric with accelerations for GNN models, dataloading, and pre-processing using... PyG and Accelerated with NVIDIA logos on a white background.

PyG and Accelerated with NVIDIA logos on a white background.

The NVIDIA PyG container, now generally available, packages PyTorch Geometric with accelerations for GNN models, dataloading, and pre-processing using cuGraph-Ops, cuGraph, and cuDF from NVIDIA RAPIDS, all with an effortless out-of-the-box experience.

]]> 0 Nirmal Kumar Juluru <![CDATA[Available Now: NVIDIA AI Accelerated DGL and PyG Containers for GNNs]]> http://www.open-lab.net/blog/?p=74698 2023-12-14T19:27:28Z 2023-12-08T22:07:12Z

From credit card transactions, social networks, and recommendation systems to transportation networks and protein-protein interactions in biology, graphs are...]]>

From credit card transactions, social networks, and recommendation systems to transportation networks and protein-protein interactions in biology, graphs are... Image is the DGL and PyG logos with the words Accelerated by NVIDIA.

Image is the DGL and PyG logos with the words Accelerated by NVIDIA.

From credit card transactions, social networks, and recommendation systems to transportation networks and protein-protein interactions in biology, graphs are the go-to data structure for modeling and analyzing intricate connections. Graph neural networks (GNNs), with their ability to learn and reason over graph-structured data, have emerged as a game-changer across various domains. However��

]]> 0 Nirmal Kumar Juluru <![CDATA[Transforming Industrial Defect Detection with NVIDIA TAO and Vision AI Models]]> http://www.open-lab.net/blog/?p=73760 2023-12-07T16:59:55Z 2023-11-20T17:00:00Z

Efficiency is paramount in industrial manufacturing, where even minor gains can have significant financial implications. According to the American Society of...]]>

Efficiency is paramount in industrial manufacturing, where even minor gains can have significant financial implications. According to the American Society of... An image of bottles in a manufacturing processing plant.

An image of bottles in a manufacturing processing plant.

Efficiency is paramount in industrial manufacturing, where even minor gains can have significant financial implications. According to the American Society of Quality, ��Many organizations will have true quality-related costs as high as 15-20% of sales revenue, some going as high as 40% of total operations.�� These staggering statistics reveal a stark reality: defects in industrial applications not��

]]> 1 Tanya Lenz <![CDATA[New Workshop: Rapid Application Development Using Large Language Models]]> http://www.open-lab.net/blog/?p=72570 2023-11-16T19:36:02Z 2023-11-08T21:30:00Z

Interested in developing LLM-based applications? Get started with this exploration of the open-source ecosystem.]]>

Interested in developing LLM-based applications? Get started with this exploration of the open-source ecosystem.

rapid-application-development-graphic

Interested in developing LLM-based applications? Get started with this exploration of the open-source ecosystem.

]]> 0 Tanya Lenz <![CDATA[Workshop: Model Parallelism: Building and Deploying Large Neural Networks]]> http://www.open-lab.net/blog/?p=71572 2024-08-28T17:34:21Z 2023-10-12T17:00:00Z

Learn how to train the largest neural networks and deploy them to production.]]>

Learn how to train the largest neural networks and deploy them to production.

nvt-model-parallelism-graphic

Learn how to train the largest neural networks and deploy them to production.

]]> 0 Joseph Lucas <![CDATA[Analyzing the Security of Machine Learning Research Code]]> http://www.open-lab.net/blog/?p=71113 2024-07-08T21:33:52Z 2023-10-04T18:00:00Z

The NVIDIA AI Red Team is focused on scaling secure development practices across the data, science, and AI ecosystems. We participate in open-source security...]]>

The NVIDIA AI Red Team is focused on scaling secure development practices across the data, science, and AI ecosystems. We participate in open-source security...

man-with-laptop

The NVIDIA AI Red Team is focused on scaling secure development practices across the data, science, and AI ecosystems. We participate in open-source security initiatives, release tools, present at industry conferences, host educational competitions, and provide innovative training. Covering 3 years and totaling almost 140GB of source code, the recently released Meta Kaggle for Code dataset is��

]]> 2 Ram Cherukuri <![CDATA[Just Released: NVIDIA PhysicsNeMo 23.08]]> http://www.open-lab.net/blog/?p=69269 2023-08-24T18:03:44Z 2023-08-09T20:00:00Z

NVIDIA PhysicsNeMo is now part of the NVIDIA AI Enterprise suite, supporting PyTorch 2.0, CUDA 12, and new samples.]]>

NVIDIA PhysicsNeMo is now part of the NVIDIA AI Enterprise suite, supporting PyTorch 2.0, CUDA 12, and new samples. An abstract image representing PhysicsNeMo.

An abstract image representing PhysicsNeMo.

NVIDIA PhysicsNeMo is now part of the NVIDIA AI Enterprise suite, supporting PyTorch 2.0, CUDA 12, and new samples.

]]> 0 Janusz Lisiecki <![CDATA[Research Unveils Breakthrough Deep Learning Tool for Understanding Neural Activity and Movement Control]]> http://www.open-lab.net/blog/?p=67932 2023-10-20T18:13:46Z 2023-07-18T16:00:00Z

A primary goal in the field of neuroscience is understanding how the brain controls movement. By improving pose estimation, neurobiologists can more precisely...]]>

A primary goal in the field of neuroscience is understanding how the brain controls movement. By improving pose estimation, neurobiologists can more precisely... A black and white GIF out a mouse walking on a wheel.

A black and white GIF out a mouse walking on a wheel.

A primary goal in the field of neuroscience is understanding how the brain controls movement. By improving pose estimation, neurobiologists can more precisely quantify natural movement and in turn, better understand the neural activity that drives it. This enhances scientists�� ability to characterize animal intelligence, social interaction, and health.

]]> 1 Bhoomi Gadhia <![CDATA[Develop Physics-Informed Machine Learning Models with Graph Neural Networks]]> http://www.open-lab.net/blog/?p=66096 2023-06-14T19:45:19Z 2023-06-06T18:30:00Z

NVIDIA PhysicsNeMo is a framework for building, training, and fine-tuning deep learning models for physical systems, otherwise known as physics-informed machine...]]>

NVIDIA PhysicsNeMo is a framework for building, training, and fine-tuning deep learning models for physical systems, otherwise known as physics-informed machine...

gnn-gif-parameterized-vortex-shedding

NVIDIA PhysicsNeMo is a framework for building, training, and fine-tuning deep learning models for physical systems, otherwise known as physics-informed machine learning (physics-ML) models. PhysicsNeMo is available as OSS (Apache 2.0 license) to support the growing physics-ML community. The latest PhysicsNeMo software update, version 23.05, brings together new capabilities��

]]> 2 Gwena Cunha Sergio <![CDATA[Sparsity in INT8: Training Workflow and Best Practices for NVIDIA TensorRT Acceleration]]> http://www.open-lab.net/blog/?p=64658 2023-06-09T20:26:40Z 2023-05-16T16:00:00Z

The training stage of deep learning (DL) models consists of learning numerous dense floating-point weight matrices, which results in a massive amount of...]]>

The training stage of deep learning (DL) models consists of learning numerous dense floating-point weight matrices, which results in a massive amount of...

dense_sparse_sparseQAT_highlighted

The training stage of deep learning (DL) models consists of learning numerous dense floating-point weight matrices, which results in a massive amount of floating-point computations during inference. Research has shown that many of those computations can be skipped by forcing some weights to be zero, with little impact on the final accuracy. In parallel to that, previous posts have shown that��

]]> 0 Kamil Tokarski <![CDATA[Why Automatic Augmentation Matters]]> http://www.open-lab.net/blog/?p=64036 2023-06-06T23:22:25Z 2023-05-05T20:32:52Z

Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of...]]>

Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of...

Why Automatic Augmentation Matters

Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of examples in datasets. The traditional approach to data augmentation dates to statistical learning when the choice of augmentation relied on the domain knowledge, skill, and intuition of the engineers that set up the model training.

]]> 0 Daemyung Jang <![CDATA[Increasing Inference Acceleration of KoGPT with NVIDIA FasterTransformer]]> http://www.open-lab.net/blog/?p=63766 2023-05-04T17:23:06Z 2023-04-25T16:31:04Z

Transformers are one of the most influential AI model architectures today and are shaping the direction of future AI R&D. First invented as a tool for...]]>

Transformers are one of the most influential AI model architectures today and are shaping the direction of future AI R&D. First invented as a tool for...

Increase inference acceleration of KoGPT by 11x with FasterTransformer

Transformers are one of the most influential AI model architectures today and are shaping the direction of future AI R&D. First invented as a tool for natural language processing (NLP), transformers are now used in almost every AI task, including computer vision, automatic speech recognition, molecular structure classification, and financial data processing. In Korea��

]]> 0 Shashank Gaur <![CDATA[Topic Modeling and Image Classification with Dataiku and NVIDIA Data Science]]> http://www.open-lab.net/blog/?p=62857 2023-11-03T07:15:04Z 2023-04-04T18:30:00Z

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...]]>

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...

Twitter topic model Dataiku diagram

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language processing (NLP). Dataiku helps you with labeling, model training, explainability, model deployment, and centralized management of code and code environments. This post dives into high-level Dataiku and NVIDIA integrations for image��

]]> 0 Tanya Lenz <![CDATA[New Course: Introduction to Graph Neural Networks]]> http://www.open-lab.net/blog/?p=59284 2023-01-26T19:29:45Z 2023-01-17T18:00:00Z

Learn the basic concepts, implementations, and applications of graph neural networks (GNNs) in this new self-paced course from NVIDIA Deep Learning Institute.]]>

Learn the basic concepts, implementations, and applications of graph neural networks (GNNs) in this new self-paced course from NVIDIA Deep Learning Institute.

Graph neural network course

Learn the basic concepts, implementations, and applications of graph neural networks (GNNs) in this new self-paced course from NVIDIA Deep Learning Institute.

]]> 0 Tanya Lenz <![CDATA[New Workshop: Data Parallelism: How to Train Deep Learning Models on Multiple GPUs]]> http://www.open-lab.net/blog/?p=57722 2023-06-12T08:32:16Z 2022-11-29T18:30:00Z

Learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.]]>

Learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.

dli-social-workshops-data-parallelism-train-deep-learning-models-multiple-gpus-2500300-2048x1024-r1 (1)

Learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.

]]> 0 Tanya Lenz <![CDATA[Upcoming Webinar: A Deep Dive into MONAI]]> http://www.open-lab.net/blog/?p=56371 2023-08-18T20:36:29Z 2022-10-21T17:12:31Z

Join us on October 24 for a deep dive into MONAI, the essential framework for AI workflows in healthcare��including use cases, building blocks, and more.]]>

Join us on October 24 for a deep dive into MONAI, the essential framework for AI workflows in healthcare��including use cases, building blocks, and more.

MONAI_Blog_16_9_16x9

Join us on October 24 for a deep dive into MONAI, the essential framework for AI workflows in healthcare��including use cases, building blocks, and more.

]]> 0 Tanya Lenz <![CDATA[Upcoming Event: Deep Learning Framework Sessions at GTC 2022]]> http://www.open-lab.net/blog/?p=54612 2022-09-15T19:33:09Z 2022-09-14T20:00:00Z

Join us for these GTC 2022 sessions to learn about optimizing PyTorch models, accelerating graph neural networks, improving GPU performance, and more.]]>

Join us for these GTC 2022 sessions to learn about optimizing PyTorch models, accelerating graph neural networks, improving GPU performance, and more.

newsletter-gtc22-fall-deep-learning-playlist-600x338

Join us for these GTC 2022 sessions to learn about optimizing PyTorch models, accelerating graph neural networks, improving GPU performance, and more.

]]> 0 Nathan Horrocks <![CDATA[Research Neural Fields Your Way with NVIDIA Kaolin Wisp]]> http://www.open-lab.net/blog/?p=51254 2023-10-20T18:03:12Z 2022-08-03T16:30:00Z

Research on neural fields has been an increasingly hot topic in computer graphics and computer vision in recent years. Neural fields can represent 3D data like...]]>

Research on neural fields has been an increasingly hot topic in computer graphics and computer vision in recent years. Neural fields can represent 3D data like...

Research on neural fields has been an increasingly hot topic in computer graphics and computer vision in recent years. Neural fields can represent 3D data like shape, appearance, motion, and other physical quantities by using a neural network that takes coordinates as input and outputs the corresponding data at that location. These representations have been proven to be useful in various��

]]> 2 John Welsh <![CDATA[Getting Started with the Deep Learning Accelerator on NVIDIA Jetson Orin]]> http://www.open-lab.net/blog/?p=50980 2023-08-15T16:33:40Z 2022-07-29T16:00:00Z

If you��re an active Jetson developer, you know that one of the key benefits of NVIDIA Jetson is that it combines a CPU and GPU into a single module, giving...]]>

If you��re an active Jetson developer, you know that one of the key benefits of NVIDIA Jetson is that it combines a CPU and GPU into a single module, giving... Image of the Deep Learning Accelerator.

Image of the Deep Learning Accelerator.

If you��re an active Jetson developer, you know that one of the key benefits of NVIDIA Jetson is that it combines a CPU and GPU into a single module, giving you the expansive NVIDIA software stack in a small, low-power package that can be deployed at the edge. Jetson also features a variety of other processors, including hardware accelerated encoders and decoders, an image signal processor��

]]> 0 Tanay Varshney <![CDATA[Optimizing and Serving Models with NVIDIA TensorRT and NVIDIA Triton]]> http://www.open-lab.net/blog/?p=50553 2025-03-18T18:23:55Z 2022-07-20T16:00:00Z

Imagine that you have trained your model with PyTorch, TensorFlow, or the framework of your choice, are satisfied with its accuracy, and are considering...]]>

Imagine that you have trained your model with PyTorch, TensorFlow, or the framework of your choice, are satisfied with its accuracy, and are considering...

Optimizing and Serving Models with NVIDIA TensorRT and NVIDIA Triton

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. Imagine that you have trained your model with PyTorch, TensorFlow, or the framework of��

]]> 1 Jason Black <![CDATA[Upcoming Webinar : VPI and Pytorch Interoperability Demo]]> http://www.open-lab.net/blog/?p=48809 2023-08-18T19:42:25Z 2022-06-01T18:38:47Z

]]>

VPIKeyVisual169

]]> 0 Richmond Alake <![CDATA[The Future of Computer Vision]]> http://www.open-lab.net/blog/?p=47149 2023-03-14T18:49:42Z 2022-05-23T16:00:00Z

Computer vision is a rapidly growing field in research and applications. Advances in computer vision research are now more directly and immediately applicable...]]>

Computer vision is a rapidly growing field in research and applications. Advances in computer vision research are now more directly and immediately applicable...

ComputerVision_FeaturedImage

Computer vision is a rapidly growing field in research and applications. Advances in computer vision research are now more directly and immediately applicable to the commercial world. AI developers are implementing computer vision solutions that identify and classify objects and even react to them in real time. Image classification, face detection, pose estimation, and optical flow are some��

]]> 2 Yi Dong <![CDATA[Generating Synthetic Data with Transformers: A Solution for Enterprise Data Challenges]]> http://www.open-lab.net/blog/?p=47308 2023-03-14T23:24:51Z 2022-05-09T16:00:00Z

Big data, new algorithms, and fast computation are three main factors that make the modern AI revolution possible. However, data poses many challenges for...]]>

Big data, new algorithms, and fast computation are three main factors that make the modern AI revolution possible. However, data poses many challenges for...

rendered2

Big data, new algorithms, and fast computation are three main factors that make the modern AI revolution possible. However, data poses many challenges for enterprises: difficulty in data labeling, ineffective data governance, limited data availability, data privacy, and so on. Synthetically generated data is a potential solution to address these challenges because it generates data points by��

]]> 0 Kyle Kranen <![CDATA[Time Series Forecasting with the NVIDIA Time Series Prediction Platform and Triton Inference Server]]> http://www.open-lab.net/blog/?p=44168 2022-08-21T23:53:25Z 2022-02-15T16:00:00Z

In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary...]]>

In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary...

Time Series Forcasting_Featured Image

In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary combinations of forecasting models, time-series datasets, and other configurations. The TSPP also provides functionality to explore the hyperparameter search space, run accelerated model training using distributed training and Automatic Mixed��

]]> 3 Chintan Patel <![CDATA[New on NGC: Security Reports, Latest Containers for PyTorch, TensorFlow, HPC and More]]> http://www.open-lab.net/blog/?p=43583 2023-02-13T18:55:40Z 2022-01-26T22:54:42Z

The NVIDIA NGC catalog is a hub for GPU-optimized deep learning, machine learning, and HPC applications. With highly performant software containers, pretrained...]]>

The NVIDIA NGC catalog is a hub for GPU-optimized deep learning, machine learning, and HPC applications. With highly performant software containers, pretrained...

NGC-NVIDIA-PyTorch-Tensorflow

The NVIDIA NGC catalog is a hub for GPU-optimized deep learning, machine learning, and HPC applications. With highly performant software containers, pretrained models, industry-specific SDKs, and Jupyter Notebooks the content helps simplify and accelerate end-to-end workflows. New features, software, and updates to help you streamline your workflow and build your solutions faster on NGC��

]]> 0 Ashish Sardana <![CDATA[Accelerating Inference Up to 6x Faster in PyTorch with Torch-TensorRT]]> http://www.open-lab.net/blog/?p=41854 2022-11-14T22:22:48Z 2021-12-02T17:00:00Z

I'm excited about Torch-TensorRT, the new integration of PyTorch with NVIDIA TensorRT, which accelerates the inference with one line of code. PyTorch is a...]]>

I'm excited about Torch-TensorRT, the new integration of PyTorch with NVIDIA TensorRT, which accelerates the inference with one line of code. PyTorch is a...

speed-up-inference-in-pytorch

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. I��m excited about Torch-TensorRT, the new integration of PyTorch with NVIDIA TensorRT, which accelerates the inference with one line of code. PyTorch is a leading deep learning framework today, with millions of users worldwide. TensorRT is an SDK��

]]> 18 Jay Rodge <![CDATA[NVIDIA Announces TensorRT 8.2 and Integrations with PyTorch and TensorFlow]]> http://www.open-lab.net/blog/?p=41607 2022-11-14T22:22:08Z 2021-12-02T17:00:00Z

Today NVIDIA released TensorRT 8.2, with optimizations for billion parameter NLU models. These include T5 and GPT-2, used for translation and text generation,...]]>

Today NVIDIA released TensorRT 8.2, with optimizations for billion parameter NLU models. These include T5 and GPT-2, used for translation and text generation,... Diagram of Torch-TensorRT and TensorFlow-TensorRT.

Diagram of Torch-TensorRT and TensorFlow-TensorRT.

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Today NVIDIA released TensorRT 8.2, with optimizations for billion parameter NLU models. These include T5 and GPT-2, used for translation and text generation, making it possible to run NLU apps in real time. TensorRT is a high-performance��

]]> 0 Sukru Burc Eryilmaz <![CDATA[MLPerf HPC v1.0: Deep Dive into Optimizations Leading to Record-Setting NVIDIA Performance]]> http://www.open-lab.net/blog/?p=41306 2023-07-05T19:29:32Z 2021-11-17T16:00:00Z

In MLPerf HPC v1.0, NVIDIA-powered systems won four of five new industry metrics focused on AI performance in HPC. As an industry-wide AI...]]>

In MLPerf HPC v1.0, NVIDIA-powered systems won four of five new industry metrics focused on AI performance in HPC. As an industry-wide AI... Data server room. Courtesy of Forschungszentrum J��lich/Sascha Kreklau.

Data server room. Courtesy of Forschungszentrum J��lich/Sascha Kreklau.

In MLPerf HPC v1.0, NVIDIA-powered systems won four of five new industry metrics focused on AI performance in HPC. As an industry-wide AI consortium, MLPerf HPC evaluates a suite of performance benchmarks covering a range of widely used AI workloads. In this round, NVIDIA delivered 5x better results for CosmoFlow, and 7x more performance on DeepCAM, compared to strong scaling results from��

]]> 1 Jacob Schmitt <![CDATA[NVIDIA GTC: Top Data Science Sessions]]> http://www.open-lab.net/blog/?p=39383 2022-08-21T23:52:57Z 2021-11-01T19:00:00Z

NVIDIA GTC is the must attend AI conference for developers. It��s a place where practitioners, leaders, and innovators share their ideas about the latest...]]>

NVIDIA GTC is the must attend AI conference for developers. It��s a place where practitioners, leaders, and innovators share their ideas about the latest...

Image from iOS

NVIDIA GTC is the must attend AI conference for developers. It��s a place where practitioners, leaders, and innovators share their ideas about the latest trends in data science. Here are six top data science GTC sessions worth attending. Thursday, Nov 11, 5:00 AM �C 5:25 AM PST Domino��s Pizza delivers thousands of pizzas a day and requires real-time planning and logistics capabilities.

]]> 4 Alexandre Milesi <![CDATA[Accelerating SE(3)-Transformers Training Using an NVIDIA Open-Source Model Implementation]]> http://www.open-lab.net/blog/?p=36411 2022-08-31T23:54:46Z 2021-08-24T17:50:03Z

SE(3)-Transformers are versatile graph neural networks unveiled at NeurIPS 2020. NVIDIA just released an open-source optimized implementation that uses 43x less...]]>

SE(3)-Transformers are versatile graph neural networks unveiled at NeurIPS 2020. NVIDIA just released an open-source optimized implementation that uses 43x less...

Accelerated-featured

SE(3)-Transformers are versatile graph neural networks unveiled at NeurIPS 2020. NVIDIA just released an open-source optimized implementation that uses 43x less memory and is up to 21x faster than the baseline official implementation. SE(3)-Transformers are useful in dealing with problems with geometric symmetries, like small molecules processing, protein refinement��

]]> 1 Michelle Horton <![CDATA[Upcoming Webinar: Accelerate AI Model Development with PyTorch Lightning]]> http://www.open-lab.net/blog/?p=36296 2023-08-18T19:34:12Z 2021-08-20T17:25:00Z

The NGC team is hosting a webinar with live Q&A to dive into how to build AI models using PyTorch Lightning, an AI framework built on top of PyTorch, from...]]>

The NGC team is hosting a webinar with live Q&A to dive into how to build AI models using PyTorch Lightning, an AI framework built on top of PyTorch, from...

Webinare-PyTorch-Lightning

The NGC team is hosting a webinar with live Q&A to dive into how to build AI models using PyTorch Lightning, an AI framework built on top of PyTorch, from the NGC catalog. Simplify and Accelerate AI Model Development with PyTorch Lightning, NGC, and AWS September 2 at 10 a.m. PT Organizations across industries are using AI to help build better products, streamline operations��

]]> 0 Michelle Horton <![CDATA[Upcoming Webinar: Accelerate AI Model Development with PyTorch Lightning]]> http://www.open-lab.net/blog/?p=35780 2023-08-18T19:33:18Z 2021-08-11T16:00:00Z

The NGC team is hosting a webinar with live Q&A to dive into how to build AI models using PyTorch Lightning, an AI framework built on top of PyTorch, from...]]>

The NGC team is hosting a webinar with live Q&A to dive into how to build AI models using PyTorch Lightning, an AI framework built on top of PyTorch, from...

Webinare-PyTorch-Lightning

The NGC team is hosting a webinar with live Q&A to dive into how to build AI models using PyTorch Lightning, an AI framework built on top of PyTorch, from the NGC catalog. Simplify and Accelerate AI Model Development with PyTorch Lightning, NGC, and AWS September 2 at 10 a.m. PT Organizations across industries are using AI to help build better products, streamline operations��

]]> 0 Vivek Kini <![CDATA[Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 2]]> http://www.open-lab.net/blog/?p=35152 2022-08-21T23:52:21Z 2021-07-27T20:47:33Z

In part 1 of this series, we introduced new API functions, cudaMallocAsync and cudaFreeAsync, that enable memory allocation and deallocation to be...]]>

In part 1 of this series, we introduced new API functions, cudaMallocAsync and cudaFreeAsync, that enable memory allocation and deallocation to be...

CUDA-malloc-FeaturedImage

In part 1 of this series, we introduced new API functions, and , that enable memory allocation and deallocation to be stream-ordered operations. In this post, we highlight the benefits of this new capability by sharing some big data benchmark results and provide a code migration guide for modifying your existing applications. We also cover advanced topics to take advantage of stream-ordered��

]]> 12 Vivek Kini <![CDATA[Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 1]]> http://www.open-lab.net/blog/?p=35109 2022-08-21T23:52:19Z 2021-07-27T20:46:43Z

Most CUDA developers are familiar with the cudaMalloc and cudaFree API functions to allocate GPU accessible memory. However, there has long been an obstacle...]]>

Most CUDA developers are familiar with the cudaMalloc and cudaFree API functions to allocate GPU accessible memory. However, there has long been an obstacle...

CUDA-malloc-FeaturedImage

Most CUDA developers are familiar with the and API functions to allocate GPU accessible memory. However, there has long been an obstacle with these API functions: they aren��t stream ordered. In this post, we introduce new API functions, and , that enable memory allocation and deallocation to be stream-ordered operations. In part 2 of this series, we highlight the benefits of this new��

]]> 1 Vanessa Braunstein <![CDATA[King��s College London Accelerates Synthetic Brain 3D Image Creation Using AI Models Powered by Cambridge-1 Supercomputer]]> http://www.open-lab.net/blog/?p=35108 2023-01-04T21:49:58Z 2021-07-26T08:00:00Z

King��s College London, along with partner hospitals and university collaborators, unveiled new details today about one of the first projects on Cambridge-1,...]]>

King��s College London, along with partner hospitals and university collaborators, unveiled new details today about one of the first projects on Cambridge-1,... 3D image of a brain from 3 angles

3D image of a brain from 3 angles

King��s College London, along with partner hospitals and university collaborators, unveiled new details today about one of the first projects on Cambridge-1, the United Kingdom��s most powerful supercomputer. The Synthetic Brain Project is focused on building deep learning models that can synthesize artificial 3D MRI images of human brains. These models can help scientists understand what a human��

]]> 0 Michelle Horton <![CDATA[Predicting Protein Structures with Deep Learning]]> http://www.open-lab.net/blog/?p=35051 2024-08-12T17:57:49Z 2021-07-21T22:06:34Z

Solving a mystery that stumped scientists for decades, last November a group of computational biologists from Alphabet��s DeepMind used AI to predict a...]]>

Solving a mystery that stumped scientists for decades, last November a group of computational biologists from Alphabet��s DeepMind used AI to predict a...

Protein_folding

Solving a mystery that stumped scientists for decades, last November a group of computational biologists from Alphabet��s DeepMind used AI to predict a protein��s structure from its amino acid sequence. Not even a year later, a new study offers a more powerful model, capable of computing protein structures in as little as 10 minutes, on one gaming computer. The research��

]]> 0 Akhil Docca <![CDATA[New on NGC: PyTorch Lightning Container Speeds Up Deep Learning Research]]> http://www.open-lab.net/blog/?p=33637 2022-08-21T23:52:01Z 2021-06-23T21:26:36Z

Deep learning research requires working at scale. Training on massive data sets or multilayered deep networks is computationally intensive and can take an...]]>

Deep learning research requires working at scale. Training on massive data sets or multilayered deep networks is computationally intensive and can take an...

PyTorch Lightning

Deep learning research requires working at scale. Training on massive data sets or multilayered deep networks is computationally intensive and can take an impractically long time as deep learning models are bound by memory. The key here is to compose the deep learning models in a structured way so that they are decoupled from the engineering and data, enabling researchers to conduct fast research.

]]> 0 Bartley Richardson https://www.linkedin.com/in/bartleyrichardson/%20 <![CDATA[Cybersecurity Framework: An Introduction to NVIDIA Morpheus]]> http://www.open-lab.net/blog/?p=30294 2023-03-22T01:11:56Z 2021-04-12T19:13:00Z

NVIDIA recently announced Morpheus, an AI application framework that provides cybersecurity developers with a highly optimized AI pipeline and pre-trained AI...]]>

NVIDIA recently announced Morpheus, an AI application framework that provides cybersecurity developers with a highly optimized AI pipeline and pre-trained AI...

NVIDIA Morpheus featured

NVIDIA recently announced Morpheus, an AI application framework that provides cybersecurity developers with a highly optimized AI pipeline and pre-trained AI capabilities. Morpheus allows developers for the first time to instantaneously inspect all IP network communications through their data center fabric. Attacks are becoming more and more frequent and dangerous despite the advancements in��

]]> 1 Even Oldridge <![CDATA[Using RAPIDS with PyTorch]]> http://www.open-lab.net/blog/?p=23153 2022-08-21T23:40:53Z 2021-03-12T23:07:37Z

In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a...]]>

In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a...

RAPIDSPyTorch_Image1

This post was originally published on the RAPIDS AI blog. In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a format that PyTorch can process so that we can explore the performance of deep learning on tabular data and compare it to the xgboost method. Deep learning is��

]]> 0 Vinh Nguyen <![CDATA[Announcing the NVIDIA NVTabular Open Beta with Multi-GPU Support and New Data Loaders]]> http://www.open-lab.net/blog/?p=21200 2024-10-28T18:24:20Z 2020-10-05T13:00:00Z

Recently, NVIDIA CEO Jensen Huang announced updates to the open beta of NVIDIA Merlin, an end-to-end framework that democratizes the development of large-scale...]]>

Recently, NVIDIA CEO Jensen Huang announced updates to the open beta of NVIDIA Merlin, an end-to-end framework that democratizes the development of large-scale...

merlin etl feature image_recommender-systems-dev-news-merlin-stack-2048x1024

Recently, NVIDIA CEO Jensen Huang announced updates to the open beta of NVIDIA Merlin, an end-to-end framework that democratizes the development of large-scale deep learning recommenders. With NVIDIA Merlin, data scientists, machine learning engineers, and researchers can accelerate their entire workflow pipeline from ingesting and training to deploying GPU-accelerated recommenders (Figure 1).

]]> 0 Ethem Can <![CDATA[Profiling and Optimizing Deep Neural Networks with DLProf and PyProf]]> http://www.open-lab.net/blog/?p=21005 2024-08-28T17:55:38Z 2020-09-28T18:33:08Z

Software profiling is key for achieving the best performance on a system and that��s true for the data science and machine learning applications as well. In...]]>

Software profiling is key for achieving the best performance on a system and that��s true for the data science and machine learning applications as well. In...

math-heavy-ops-benefit-from-tensor-cores-edit

Software profiling is key for achieving the best performance on a system and that��s true for the data science and machine learning applications as well. In the era of GPU-accelerated deep learning, when profiling deep neural networks, it is important to understand CPU, GPU, and even memory bottlenecks, which could cause slowdowns in training or inference. In this post��

]]> 13 Christian Hundt <![CDATA[Streaming Interactive Deep Learning Applications at Peak Performance]]> http://www.open-lab.net/blog/?p=20528 2022-08-21T23:40:37Z 2020-09-01T17:25:39Z

Imagine that you have just finished implementing an awesome, interactive, deep learning pipeline on your NVIDIA-accelerated data science workstation using...]]>

Imagine that you have just finished implementing an awesome, interactive, deep learning pipeline on your NVIDIA-accelerated data science workstation using...

nvss-library

Imagine that you have just finished implementing an awesome, interactive, deep learning pipeline on your NVIDIA-accelerated data science workstation using OpenCV for capturing your webcam stream and rendering the output. A colleague of yours mentions that exploiting the novel TF32 compute mode of the Ampere microarchitecture third-generation Tensor Cores might significantly accelerate your��

]]> 2 Akhil Docca <![CDATA[Accelerating AI and ML Workflows with Amazon SageMaker and NVIDIA NGC]]> http://www.open-lab.net/blog/?p=19448 2022-10-20T21:49:02Z 2020-08-07T19:33:05Z

AI is going mainstream and is quickly becoming pervasive in every industry��from autonomous vehicles to drug discovery. However, developing and deploying AI...]]>

AI is going mainstream and is quickly becoming pervasive in every industry��from autonomous vehicles to drug discovery. However, developing and deploying AI...

stack-featured-image

AI is going mainstream and is quickly becoming pervasive in every industry��from autonomous vehicles to drug discovery. However, developing and deploying AI applications is a challenging endeavor. The process requires building a scalable infrastructure by combining hardware, software, and intricate workflows, which can be time-consuming as well as error-prone. To accelerate the end-to-end AI��

]]> 0 Raphael Boissel <![CDATA[Announcing CUDA on Windows Subsystem for Linux 2]]> http://www.open-lab.net/blog/?p=18337 2022-08-21T23:40:15Z 2020-06-17T17:00:00Z

[stextbox id="info"]WSL2 is available on Windows 11 outside the Windows Insider Preview. For more information about what is supported, see the CUDA on WSL User...]]>

[stextbox id="info"]WSL2 is available on Windows 11 outside the Windows Insider Preview. For more information about what is supported, see the CUDA on WSL User...

wddm-model-supporting-cuda-user-mode-linux-guest

WSL2 is available on Windows 11 outside the Windows Insider Preview. For more information about what is supported, see the CUDA on WSL User Guide. In response to popular demand, Microsoft announced a new feature of the Windows Subsystem for Linux 2 (WSL 2)��GPU acceleration��at the Build conference in May 2020. This feature opens the gate for many compute applications, professional tools��

]]> 6 Chintan Shah <![CDATA[Building a Real-time Redaction App Using NVIDIA DeepStream, Part 2: Deployment]]> http://www.open-lab.net/blog/?p=16309 2022-08-21T23:39:45Z 2020-02-12T17:29:31Z

This post is the second in a series (Part 1) that addresses the challenges of training an accurate deep learning model using a large public dataset and...]]>

This post is the second in a series (Part 1) that addresses the challenges of training an accurate deep learning model using a large public dataset and...

redacted_video

This post is the second in a series (Part 1) that addresses the challenges of training an accurate deep learning model using a large public dataset and deploying the model on the edge for real-time inference using NVIDIA DeepStream. In the previous post, you learned how to train a RetinaNet network with a ResNet34 backbone for object detection. This included pulling a container��

]]> 6 Chintan Shah <![CDATA[Building a Real-time Redaction App Using NVIDIA DeepStream, Part 1: Training]]> http://www.open-lab.net/blog/?p=16344 2022-08-21T23:39:45Z 2020-02-12T17:28:28Z

Some of the biggest challenges in deploying an AI-based application are the accuracy of the model and being able to extract insights in real time. There��s a...]]>

Some of the biggest challenges in deploying an AI-based application are the accuracy of the model and being able to extract insights in real time. There��s a...

redacted_video

Some of the biggest challenges in deploying an AI-based application are the accuracy of the model and being able to extract insights in real time. There��s a trade-off between accuracy and inference throughput. Making the model more accurate makes the model larger which reduces the inference throughput. This post series addresses both challenges. In part 1, you train an accurate��

]]> 7 Carl Case <![CDATA[NVIDIA Apex: Tools for Easy Mixed-Precision Training in PyTorch]]> http://www.open-lab.net/blog/?p=12951 2022-08-21T23:39:14Z 2018-12-03T16:00:57Z

Most deep learning frameworks, including PyTorch, train using 32-bit floating point (FP32) arithmetic by default. However, using FP32 for all operations is not...]]>

Most deep learning frameworks, including PyTorch, train using 32-bit floating point (FP32) arithmetic by default. However, using FP32 for all operations is not...

tensor_cube_white-1280

Most deep learning frameworks, including PyTorch, train using 32-bit floating point (FP32) arithmetic by default. However, using FP32 for all operations is not essential to achieve full accuracy for many state-of-the-art deep neural networks (DNNs). In 2017, NVIDIA researchers developed a methodology for mixed-precision training in which a few operations are executed in FP32 while the majority��

]]> 0 Michael Carilli <![CDATA[New Optimizations To Accelerate Deep Learning Training on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=12964 2023-02-13T17:46:37Z 2018-12-03T16:00:36Z

The pace of AI adoption across diverse industries depends on maximizing data scientists�� productivity. NVIDIA releases optimized NGC containers every month...]]>

The pace of AI adoption across diverse industries depends on maximizing data scientists�� productivity. NVIDIA releases optimized NGC containers every month...

container_hero_1

The pace of AI adoption across diverse industries depends on maximizing data scientists�� productivity. NVIDIA releases optimized NGC containers every month with improved performance for deep learning frameworks and libraries, helping scientists maximize their potential. NVIDIA continuously invests in the full data science stack, including GPU architecture, systems, and software stacks.

]]> 0 James Bradbury <![CDATA[Recursive Neural Networks with PyTorch]]> http://www.open-lab.net/blog/parallelforall/?p=7712 2022-08-21T23:38:09Z 2017-04-10T03:56:21Z

From Siri to Google Translate, deep neural networks have enabled breakthroughs in machine understanding of natural language. Most of these models treat language...]]>

From Siri to Google Translate, deep neural networks have enabled breakthroughs in machine understanding of natural language. Most of these models treat language...

From Siri to Google Translate, deep neural networks have enabled breakthroughs in machine understanding of natural language. Most of these models treat language as a flat sequence of words or characters, and use a kind of model called a recurrent neural network (RNN) to process this sequence. But many linguists think that language is best understood as a hierarchical tree of phrases��

]]> 6 ��˳��97caoporen��