NVIDIA cuVS

NVIDIA cuVS is an open-source library for GPU-accelerated vector search and data clustering that enables faster vector searches and index builds.

It supports scalable data analysis, enhances semantic search efficiency, and helps developers accelerate existing systems or compose new ones from the ground up. Integrated with key libraries and databases, cuVS also manages complex code updates as new NVIDIA architectures and NVIDIA? CUDA? versions are released, ensuring peak performance and seamless scalability.

Download Now Integrations Documentation

How NVIDIA cuVS Works

NVIDIA cuVS is designed to accelerate and optimize vector index builds and vector search for existing databases and vector search libraries. It enables developers to enhance data mining and semantic search workloads, such as recommender systems and retrieval-augmented generation (RAG). Built on top of the NVIDIA CUDA software stack, it contains many building blocks for composing vector search systems and exposes easy-to-use APIs for C, C++, Rust, Java, Python, and Go.

Introductory Blog

Get an intro into accelerating vector search with cuVS, popular applications, and performance comparison of GPU-accelerated vector search indexes vs. CPU.

Read the Blog

Getting Started Guide

Understand the differences between vector search indexes and fully-fledged vector databases.

Get the Primer

Notebooks

Build IVF-PQ index and use it to search approximate nearest neighbors (ANN) or learn how to run approximate nearest neighbor search using cuVS IVF-Flat algorithm.

Get Started on GitHub

Examples

Get access to drop-in samples to build a new application with cuVS, or use it in an existing project. See cuVS installation docs.

Check Out on GitHub

Key Features

GPU-Accelerated Indexing Algorithms

Optimized GPU indexing enables high-quality index builds and low-latency search. cuVS delivers advanced algorithms for indexing vector embeddings, including exact, tree-based, and graph-based indexes.

Real-Time Updates for Large Language Models (LLMs)
cuVS enables real-time updates to search indexes by dynamically integrating new embeddings and data without rebuilding the entire index. By integrating cuVS with LLMs, search results remain fresh and relevant.
High-Efficiency Indexing
GPU indexing lowers cost compared to CPU-only workflows while maintaining quality at scale. Additionally, the ability to build large indexes out-of-core enables more flexible GPU selection and ultimately lower costs per gigabyte.
Scalable Index Building
For real-time applications and large-scale deployments, cuVS enables both scale-up and scale-out for index creation and search at a fraction of the time it takes on a CPU without compromising quality.

GPU-Accelerated Search Algorithms

cuVS transforms vector search by integrating optimized CUDA-based algorithms for approximate nearest neighbors and clustering, ideal for large-scale, time-sensitive workloads.

Real-Time Updates for Large Language Models (LLMs)
cuVS enables real-time updates to search indexes by dynamically integrating new embeddings and data without rebuilding the entire index. By integrating cuVS with LLMs, search results remain fresh and relevant.
Low-Latency Performance
cuVS provides ultra-fast response times for applications such as semantic search, where speed and accuracy are critical. Furthermore, support for binary, 8-, 16-, and 32-bit types means memory use is optimized for high-throughput applications.
High-Throughput Processing
GPUs handle hundreds of thousands of queries per second, making cuVS perfect for demanding use cases like machine learning, data mining, and real-time analytics.

Get Started

Select the right path to get started using cuVS. Integrate it into your existing vector search systems, pipelines, or applications and accelerate your semantic search for data mining use cases in production.

Evaluate with cuVS Bench

Evaluate

Start using cuVS as a benchmarking tool designed for reproducible comparisons of ANN search implementations, especially between GPU and CPU, by optimizing index configurations and analyzing performance across different hardware environments.

Start Evaluating With cuVS Bench

Download Library (GitHub)

Develop

NVIDIA cuVS is available on GitHub with end-to-end examples and an automated tuning guide. Access the source code to get started.

Download Library (GitHub)

Launch Through Integrations

Launch

cuVS can be used as a standalone library or deployed through a number of SDK and vector database integrations like FAISS, Milvus, Lucene, Kinetica, and more.

Launch Through Integrations

Performanceâ€”World's Fastest Vector Search

NVIDIA cuVS exploits the parallel architecture of NVIDIA GPUs, allowing for easy deployment of popular and performance-critical algorithms. GPU-acceleration of vector similarity search sets benchmark records for large-scale, high-performance solutions.

21x Faster Indexing

Lower is Better.

Time to build an index on GPU (8x A10g) vs CPU (Intel Ice Lake) in the cloud (AWS), reducing from hours to minutes.

12.5x Lower Cost

Lower is Better.

Cost to build an index on the GPU (8x A10g) vs CPU (Intel Ice Lake) in the cloud (AWS).

29x Higher Throughput

Higher is Better.

Number of vectors that can be queried per second on a GPU (H100) vs CPU (Intel Xeon Platinum 8470Q) when submitted 10,000 at a time.

11x Lower Latency

Lower is Better.

Average time to process each query on a GPU (H100) vs CPU (Intel Xeon Platinum 8470Q) when submitted one at a time.

Starter Kits for NVIDIA cuVS

Start accelerating your libraries, databases, and applications with cuVS by accessing tutorials, notebooks, forums, release notes, and comprehensive documentation.

For Library Development

cuVS provides easy-to-use Python APIs, which enable straightforward integration into libraries for data mining and analysis. cuVS is also integrated into the popular FAISS library for CPU and GPU interoperability.

Explore Example Python Notebooks
Read the Getting Started Guide
Read API Documentation

For Database Development

cuVS building blocks are built in C++ and wrapped in popular languages like C, Python, Rust, Java, and Go, making them easy to integrate into existing databases and vector indexing tools.