data preprocessing – NVIDIA Technical Blog

data preprocessing – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-21T20:30:26Z http://www.open-lab.net/blog/feed/ Amit Bleiweiss <![CDATA[Mastering LLM Techniques: Data Preprocessing]]> http://www.open-lab.net/blog/?p=91738 2025-02-04T19:54:19Z 2024-11-13T18:05:06Z

The advent of large language models (LLMs) marks a significant shift in how industries leverage AI to enhance operations and services. By automating routine...]]>

The advent of large language models (LLMs) marks a significant shift in how industries leverage AI to enhance operations and services. By automating routine...

llm-nemo-curator-data-preprocessing

The advent of large language models (LLMs) marks a significant shift in how industries leverage AI to enhance operations and services. By automating routine tasks and streamlining processes, LLMs free up human resources for more strategic endeavors, thus improving overall efficiency and productivity. Training and customizing LLMs for high accuracy is fraught with challenges��

]]> 0 Nirmal Kumar Juluru <![CDATA[Upcoming Webinar: Enhance Generative AI Model Accuracy Through High-Quality Data Processing]]> http://www.open-lab.net/blog/?p=91036 2024-10-31T18:32:58Z 2024-10-28T19:23:49Z

Learn how to build scalable data processing pipelines to create high-quality datasets.]]>

Learn how to build scalable data processing pipelines to create high-quality datasets.

llm-social-nemo-retriever-webinar-content-3450414-1600x900

Learn how to build scalable data processing pipelines to create high-quality datasets.

]]> 0 Amr Elmeleegy <![CDATA[NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1]]> http://www.open-lab.net/blog/?p=89401 2024-11-06T02:27:00Z 2024-09-24T16:36:57Z

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding...]]>

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding...

NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding performance across the board. Among the many submissions made using the NVIDIA platform were results using the NVIDIA GH200 Grace Hopper Superchip. GH200 tightly couples an NVIDIA Grace CPU with an NVIDIA Hopper GPU using NVIDIA NVLink-C2C��

]]> 0 Kamil Tokarski <![CDATA[Why Automatic Augmentation Matters]]> http://www.open-lab.net/blog/?p=64036 2023-06-06T23:22:25Z 2023-05-05T20:32:52Z

Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of...]]>

Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of...

Why Automatic Augmentation Matters

Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of examples in datasets. The traditional approach to data augmentation dates to statistical learning when the choice of augmentation relied on the domain knowledge, skill, and intuition of the engineers that set up the model training.

]]> 0 David Wendt <![CDATA[Mastering String Transformations in RAPIDS libcudf]]> http://www.open-lab.net/blog/?p=56138 2023-06-12T08:45:27Z 2022-10-17T14:00:00Z

Efficient processing of string data is vital for many data science applications. To extract valuable information from string data, RAPIDS libcudf provides...]]>

Efficient processing of string data is vital for many data science applications. To extract valuable information from string data, RAPIDS libcudf provides...

image3 (2)

Efficient processing of string data is vital for many data science applications. To extract valuable information from string data, RAPIDS libcudf provides powerful tools for accelerating string data transformations. libcudf is a C++ GPU DataFrame library used for loading, joining, aggregating, and filtering data. In data science, string data represents speech, text, genetic sequences��

]]> 5 Janusz Lisiecki <![CDATA[Accelerating Medical Image Processing with NVIDIA DALI]]> http://www.open-lab.net/blog/?p=42987 2022-08-21T23:53:17Z 2022-01-18T23:07:12Z

Deep learning models require vast amounts of data to produce accurate predictions, and this need becomes more acute every day as models grow in size and...]]>

Deep learning models require vast amounts of data to produce accurate predictions, and this need becomes more acute every day as models grow in size and...

brain_scan

Deep learning models require vast amounts of data to produce accurate predictions, and this need becomes more acute every day as models grow in size and complexity. Even large datasets, such as the well-known ImageNet with more than a million images, are not sufficient to achieve state-of-the-art results in modern computer vision tasks. For this purpose, data augmentation techniques are��

]]> 0 Joaquin Anton Guirao <![CDATA[Rapid Data Pre-Processing with NVIDIA DALI]]> http://www.open-lab.net/blog/?p=38139 2022-08-21T23:52:47Z 2021-10-07T17:30:00Z

This post is an update to an older post. Deep learning models require training with vast amounts of data to achieve accurate results. Raw data usually cannot be...]]>

This post is an update to an older post. Deep learning models require training with vast amounts of data to achieve accurate results. Raw data usually cannot be...

dali-social-tw-li-2048x1024

This post is an update to an older post. Deep learning models require training with vast amounts of data to achieve accurate results. Raw data usually cannot be directly fed into a neural network due to various reasons such as different storage formats, compression, data format and size, and limited amount of high-quality data. Addressing these issues requires extensive data preparation��

]]> 0 Michelle Horton <![CDATA[Accelerating Billion Vector Similarity Searches with GPUs]]> http://www.open-lab.net/blog/?p=35502 2023-10-06T08:15:47Z 2021-07-29T15:47:48Z

Relying on the capabilities of GPUs, a team from Facebook AI Research has developed a faster, more efficient way for AI to run similarity searches. The study,...]]>

Relying on the capabilities of GPUs, a team from Facebook AI Research has developed a faster, more efficient way for AI to run similarity searches. The study,... A collection of images.

A collection of images.

Relying on the capabilities of GPUs, a team from Facebook AI Research has developed a faster, more efficient way for AI to run similarity searches. The study, published in IEEE Transactions on Big Data, creates a deep learning algorithm capable of handling and comparing high-dimensional data from media that is notably faster, while just as accurate as previous techniques. In a world with an��

]]> 0 Jiwei Liu <![CDATA[Gauss Rank Transformation Is 100x Faster with RAPIDS and CuPy]]> http://www.open-lab.net/blog/?p=32741 2022-08-21T23:51:54Z 2021-06-11T15:00:00Z

As explained in the Batch Normalization paper, training neural networks becomes way easier if its input is Gaussian. This is clear. And if your model inputs are...]]>

As explained in the Batch Normalization paper, training neural networks becomes way easier if its input is Gaussian. This is clear. And if your model inputs are...

Gauss Rank_Featured Image

As explained in the Batch Normalization paper, training neural networks becomes way easier if its input is Gaussian. This is clear. And if your model inputs are not Gaussian, RAPIDS will just transform it to Gaussian in the blink of an eye. Gauss rank transformation is a novel standardization technique to transform input data for training deep neural networks. Recently, we used this technique��

]]> 0 Piotr Bigaj <![CDATA[Accelerating the Wide & Deep Model Workflow from 25 Hours to 10 Minutes Using NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=29663 2024-10-28T19:02:41Z 2021-04-29T22:15:38Z

Recommender systems drive engagement on many of the most popular online platforms. As data volume grows exponentially, data scientists increasingly turn from...]]>

Recommender systems drive engagement on many of the most popular online platforms. As data volume grows exponentially, data scientists increasingly turn from...

Training with NVTabular

Recommender systems drive engagement on many of the most popular online platforms. As data volume grows exponentially, data scientists increasingly turn from traditional machine learning methods to highly expressive, deep learning models to improve recommendation quality. Often, the recommendations are framed as modeling the completion of a user-item matrix, in which the user-item entry is the��

]]> 1 Rafal Banas <![CDATA[Accelerating Inference with NVIDIA Triton Inference Server and NVIDIA DALI]]> http://www.open-lab.net/blog/?p=30560 2023-03-22T01:11:52Z 2021-04-13T21:19:41Z

When you are working on optimizing inference scenarios for the best performance, you may underestimate the effect of data preprocessing. These are the...]]>

When you are working on optimizing inference scenarios for the best performance, you may underestimate the effect of data preprocessing. These are the...

dalitriton_169

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. When you are working on optimizing inference scenarios for the best performance, you may underestimate the effect of data preprocessing. These are the operations required before forwarding an input sample through the model. This post highlights the��

]]> 2 Joaquin Anton Guirao <![CDATA[Fast AI Data Preprocessing with NVIDIA DALI]]> http://www.open-lab.net/blog/?p=13395 2022-08-21T23:39:18Z 2019-01-28T18:16:54Z

Editor's Note: This post has been updated. Here is the revised post. Training deep learning models with vast amounts of data is necessary to achieve accurate...]]>

Editor's Note: This post has been updated. Here is the revised post. Training deep learning models with vast amounts of data is necessary to achieve accurate...

Editor��s Note: This post has been updated. Here is the revised post. Training deep learning models with vast amounts of data is necessary to achieve accurate results. Data in the wild, or even prepared data sets, is usually not in the form that can be directly fed into neural network. This is where NVIDIA DALI data preprocessing comes into play. There are various reasons for that��

]]> 0 ��˳��97caoporen��