Corey Nolet – NVIDIA Technical Blog

Corey Nolet – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-11-18T21:16:14Z http://www.open-lab.net/blog/feed/ Corey Nolet <![CDATA[Even Faster and More Scalable UMAP on the GPU with RAPIDS cuML]]> http://www.open-lab.net/blog/?p=91198 2024-11-14T17:10:53Z 2024-10-31T20:24:07Z

UMAP is a popular dimension reduction algorithm used in fields like bioinformatics, NLP topic modeling, and ML preprocessing. It works by creating a k-nearest...]]>

UMAP is a popular dimension reduction algorithm used in fields like bioinformatics, NLP topic modeling, and ML preprocessing. It works by creating a k-nearest neighbors (k-NN) graph, which is known in literature as an all-neighbors graph, to build a fuzzy topological representation of the data, which is used to embed high-dimensional data into lower dimensions. RAPIDS cuML already contained…

]]> 1 Corey Nolet <![CDATA[Event: Community Over Code]]> http://www.open-lab.net/blog/?p=89692 2024-10-17T19:06:59Z 2024-10-03T20:00:00Z

Learn about accelerating vector search with NVIDIA cuVS and Apache Solr on October 10 at Community Over Code.]]>

Learn about accelerating vector search with NVIDIA cuVS and Apache Solr on October 10 at Community Over Code.

]]> Corey Nolet <![CDATA[Bringing Confidentiality to Vector Search with Cyborg and NVIDIA cuVS]]> http://www.open-lab.net/blog/?p=87131 2024-10-03T21:17:06Z 2024-08-15T16:00:00Z

In the era of generative AI, vector databases have become indispensable for storing and querying high-dimensional data efficiently. However, like all databases,...]]>

In the era of generative AI, vector databases have become indispensable for storing and querying high-dimensional data efficiently. However, like all databases, vector databases are vulnerable to a range of attacks, including cyber threats, phishing attempts, and unauthorized access. This vulnerability is particularly concerning considering that these databases often contain sensitive and…

]]> Corey Nolet <![CDATA[Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 2, Performance Tuning]]> http://www.open-lab.net/blog/?p=81681 2024-10-03T21:18:45Z 2024-07-18T17:10:03Z

In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the...]]>

In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the Product Quantization (PQ) technique to compress the index and support larger datasets. In this part two of the IVF-PQ post, we cover the practical aspects of tuning IVF-PQ performance. It’s worth noting again that IVF-PQ uses a lossy…

]]> Corey Nolet <![CDATA[Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 1, Deep Dive]]> http://www.open-lab.net/blog/?p=81652 2024-10-03T21:19:09Z 2024-07-18T17:09:45Z

In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for...]]>

In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for accelerating approximate nearest neighbors (ANN) search on GPUs. We discussed how using an inverted file index (IVF) provides an intuitive way to reduce the complexity of the nearest neighbor search by limiting it to only a small subset of…

]]> Corey Nolet <![CDATA[Accelerating Vector Search: Fine-Tuning GPU Index Algorithms]]> http://www.open-lab.net/blog/?p=69885 2024-11-18T21:16:14Z 2023-09-11T16:00:00Z

In this post, we dive deeper into each of the GPU-accelerated indexes mentioned in part 1 and give a brief explanation of how the algorithms work, along with a...]]>

In this post, we dive deeper into each of the GPU-accelerated indexes mentioned in part 1 and give a brief explanation of how the algorithms work, along with a summary of important parameters to fine-tune their behavior. We then go through a simple end-to-end example to demonstrate cuVS’ Python APIs on a question-and-answer problem with a pretrained large language model and provide a…

]]> 0 Corey Nolet <![CDATA[Accelerating Vector Search: Using GPU-Powered Indexes with NVIDIA cuVS]]> http://www.open-lab.net/blog/?p=69884 2024-11-07T05:04:43Z 2023-09-11T15:59:00Z

In the current AI landscape, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic...]]>

In the current AI landscape, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic vector search enables a broad range of important tasks like detecting fraudulent transactions, recommending products to users, using contextual information to augment full-text searches, and finding actors that pose potential security risks.

]]> 0 Corey Nolet <![CDATA[GPU-Accelerated Single-Cell RNA Analysis with RAPIDS-singlecell]]> http://www.open-lab.net/blog/?p=67047 2023-07-24T15:50:05Z 2023-06-27T14:00:00Z

Single-cell sequencing has become one of the most prominent technologies used in biomedical research. Its ability to decipher changes in the transcriptome and...]]>

Single-cell sequencing has become one of the most prominent technologies used in biomedical research. Its ability to decipher changes in the transcriptome and epigenome on a cell level has enabled researchers to gain valuable new insights. As a result, single-cell experiments have grown in size and complexity by a factor of over 100, with experiments involving more than 1 million cells becoming…

]]> 0 Corey Nolet <![CDATA[Reusable Computational Patterns for Machine Learning and Information Retrieval with RAPIDS RAFT]]> http://www.open-lab.net/blog/?p=62315 2023-10-13T05:52:25Z 2023-03-22T15:00:00Z

RAPIDS is a suite of accelerated libraries for data science and machine learning on GPUs: cuDF for pandas-like data structures cuGraph for graph data cuML for...]]>

RAPIDS is a suite of accelerated libraries for data science and machine learning on GPUs: In many data analytics and machine learning algorithms, computational bottlenecks tend to come from a small subset of steps that dominate the end-to-end performance. Reusable solutions for these steps often require low-level primitives that are non-trivial and time-consuming to write well.

]]> 0 Corey Nolet <![CDATA[Faster HDBSCAN Soft Clustering with RAPIDS cuML]]> http://www.open-lab.net/blog/?p=58016 2023-07-11T23:26:06Z 2022-12-06T19:00:00Z

HDBSCAN is a state-of-the-art, density-based clustering algorithm that has become popular in domains as varied as topic modeling, genomics, and geospatial...]]>

HDBSCAN is a state-of-the-art, density-based clustering algorithm that has become popular in domains as varied as topic modeling, genomics, and geospatial analytics. RAPIDS cuML has provided accelerated HDBSCAN since the 21.10 release in October 2021, as detailed in GPU-Accelerated Hierarchical DBSCAN with RAPIDS cuML – Let’s Get Back To The Future. However, support for soft clustering (also…

]]> 0 Corey Nolet <![CDATA[Faster Text Classification with Naive Bayes and GPUs]]> http://www.open-lab.net/blog/?p=49926 2022-08-22T17:45:51Z 2022-07-25T16:00:00Z

Naive Bayes (NB) is a simple but powerful probabilistic classification technique that parallelizes well and can scale to datasets of massive size. If you...]]>

Naive Bayes (NB) is a simple but powerful probabilistic classification technique that parallelizes well and can scale to datasets of massive size. If you have been working with text processing tasks in data science, you know that machine learning models can take a long time to train. Using GPU-accelerated computing on those models has often resulted in significant gains in time performance…

]]> 0 Corey Nolet <![CDATA[GPU-Accelerated Hierarchical DBSCAN with RAPIDS cuML �C Let��s Get Back To The Future]]> http://www.open-lab.net/blog/?p=38121 2022-09-29T17:16:02Z 2021-10-06T23:29:44Z

Data scientists across various domains use clustering methods to find naturally ��similar�� groups of observations in their datasets. Popular clustering...]]>

Data scientists across various domains use clustering methods to find naturally ‘similar’ groups of observations in their datasets. Popular clustering methods can be: The Hierarchical Density-Based Spatial Clustering of Applications w/ Noise (HDBSCAN) algorithm is a density-based clustering method that is robust to noise (accounting for points in sparser regions as either cluster…

]]> 0 Corey Nolet <![CDATA[Analyzing the RNA-Sequence of 1.3M Mouse Brain Cells with RAPIDS on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=37158 2022-08-21T23:52:37Z 2021-09-08T19:40:00Z

Single-cell genomics research continues to advance drug discovery for disease prevention. For example, it has been pivotal in developing treatments for the...]]>

Single-cell genomics research continues to advance drug discovery for disease prevention. For example, it has been pivotal in developing treatments for the current COVID-19 pandemic, identifying cells susceptible to infection, and revealing changes in the immune systems of infected patients. However, with the growing availability of large-scale single-cell datasets, it’s clear that computing…

]]> 0 ��˳��97caoporen��