NVIDIA cuVS
NVIDIA cuVS is an open-source library for GPU-accelerated vector search and data clustering that enables faster vector searches and index builds.
It supports scalable data analysis, enhances semantic search efficiency, and helps developers accelerate existing systems or compose new ones from the ground up. Integrated with key libraries and databases, cuVS also manages complex code updates as new NVIDIA architectures and NVIDIA? CUDA? versions are released, ensuring peak performance and seamless scalability.
How NVIDIA cuVS Works
NVIDIA cuVS is designed to accelerate and optimize vector index builds and vector search for existing databases and vector search libraries. It enables developers to enhance data mining and semantic search workloads, such as recommender systems and retrieval-augmented generation (RAG). Built on top of the NVIDIA CUDA software stack, it contains many building blocks for composing vector search systems and exposes easy-to-use APIs for C, C++, Rust, Java, Python, and Go.
Introductory Blog
Get an intro into accelerating vector search with cuVS, popular applications, and performance comparison of GPU-accelerated vector search indexes vs. CPU.
Getting Started Guide
Understand the differences between vector search indexes and fully-fledged vector databases.
Notebooks
Build IVF-PQ index and use it to search approximate nearest neighbors (ANN) or learn how to run approximate nearest neighbor search using cuVS IVF-Flat algorithm.
Examples
Get access to drop-in samples to build a new application with cuVS, or use it in an existing project. See cuVS installation docs.
Key Features
GPU-Accelerated Indexing Algorithms
Optimized GPU indexing enables high-quality index builds and low-latency search. cuVS delivers advanced algorithms for indexing vector embeddings, including exact, tree-based, and graph-based indexes.
Real-Time Updates for Large Language Models (LLMs)
cuVS enables real-time updates to search indexes by dynamically integrating new embeddings and data without rebuilding the entire index. By integrating cuVS with LLMs, search results remain fresh and relevant.
High-Efficiency Indexing
GPU indexing lowers cost compared to CPU-only workflows while maintaining quality at scale. Additionally, the ability to build large indexes out-of-core enables more flexible GPU selection and ultimately lower costs per gigabyte.
Scalable Index Building
For real-time applications and large-scale deployments, cuVS enables both scale-up and scale-out for index creation and search at a fraction of the time it takes on a CPU without compromising quality.
GPU-Accelerated Search Algorithms
cuVS transforms vector search by integrating optimized CUDA-based algorithms for approximate nearest neighbors and clustering, ideal for large-scale, time-sensitive workloads.
cuVS enables real-time updates to search indexes by dynamically integrating new embeddings and data without rebuilding the entire index. By integrating cuVS with LLMs, search results remain fresh and relevant.Real-Time Updates for Large Language Models (LLMs)
Low-Latency Performance
cuVS provides ultra-fast response times for applications such as semantic search, where speed and accuracy are critical. Furthermore, support for binary, 8-, 16-, and 32-bit types means memory use is optimized for high-throughput applications.
GPUs handle hundreds of thousands of queries per second, making cuVS perfect for demanding use cases like machine learning, data mining, and real-time analytics.High-Throughput Processing
Get Started
Select the right path to get started using cuVS. Integrate it into your existing vector search systems, pipelines, or applications and accelerate your semantic search for data mining use cases in production.
Evaluate
Start using cuVS as a benchmarking tool designed for reproducible comparisons of ANN search implementations, especially between GPU and CPU, by optimizing index configurations and analyzing performance across different hardware environments.
Develop
NVIDIA cuVS is available on GitHub with end-to-end examples and an automated tuning guide. Access the source code to get started.
Download Library (GitHub)Launch
cuVS can be used as a standalone library or deployed through a number of SDK and vector database integrations like FAISS, Milvus, Lucene, Kinetica, and more.
Launch Through IntegrationsPerformance—World's Fastest Vector Search
NVIDIA cuVS exploits the parallel architecture of NVIDIA GPUs, allowing for easy deployment of popular and performance-critical algorithms. GPU-acceleration of vector similarity search sets benchmark records for large-scale, high-performance solutions.
21x Faster Indexing
Lower is Better.
Time to build an index on GPU (8x A10g) vs CPU (Intel Ice Lake) in the cloud (AWS), reducing from hours to minutes.
12.5x Lower Cost
Lower is Better.
Cost to build an index on the GPU (8x A10g) vs CPU (Intel Ice Lake) in the cloud (AWS).
29x Higher Throughput
Higher is Better.
Number of vectors that can be queried per second on a GPU (H100) vs CPU (Intel Xeon Platinum 8470Q) when submitted 10,000 at a time.
11x Lower Latency
Lower is Better.
Average time to process each query on a GPU (H100) vs CPU (Intel Xeon Platinum 8470Q) when submitted one at a time.
Starter Kits for NVIDIA cuVS
Start accelerating your libraries, databases, and applications with cuVS by accessing tutorials, notebooks, forums, release notes, and comprehensive documentation.
For Library Development
cuVS provides easy-to-use Python APIs, which enable straightforward integration into libraries for data mining and analysis. cuVS is also integrated into the popular FAISS library for CPU and GPU interoperability.
For Database Development
cuVS building blocks are built in C++ and wrapped in popular languages like C, Python, Rust, Java, and Go, making them easy to integrate into existing databases and vector indexing tools.
For Application Development
cuVS can be used directly or through several database and library integrations to supercharge your applications and workflows with GPU acceleration.