CUDA

Oct 09, 2024
Just Released: Updated Math Libraries in CUDA Toolkit 12.6.2
CUDA Toolkit 12.6.2 improves performance and provides new features in cuBLAS, cuSOLVER, and cuFFT LTO libraries.
1 MIN READ

Oct 07, 2024
Accelerating Reality Capture Workflows with AI and NVIDIA RTX GPUs
Reality capture creates highly accurate, detailed, and immersive digital representations of environments. Innovations in site scanning and accelerated data...
10 MIN READ

Oct 02, 2024
Webinar: Accelerating Python with GPUs
Join us on October 9 to learn how your applications can benefit from NVIDIA CUDA Python software initiatives.
1 MIN READ

Oct 02, 2024
Accelerating LLMs with llama.cpp on NVIDIA RTX Systems
The NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate...
5 MIN READ

Sep 30, 2024
Advancing Quantum Algorithm Design with GPTs
AI techniques like large language models (LLMs) are rapidly transforming many scientific disciplines. Quantum computing is no exception. A collaboration between...
8 MIN READ

Sep 24, 2024
Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo
NVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging...
13 MIN READ

Sep 19, 2024
Just Released: Torch-TensorRT v2.4.0
Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.
1 MIN READ

Sep 17, 2024
Accelerating Oracle Database Generative AI Workloads with NVIDIA NIM and NVIDIA cuVS
The vast majority of the world's data remains untapped, and enterprises are looking to generate value from this data by creating the next wave of generative AI...
6 MIN READ

Sep 11, 2024
Advanced Strategies for High-Performance GPU Programming with NVIDIA CUDA
Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping...
2 MIN READ

Sep 11, 2024
Constant Time Launch for Straight-Line CUDA Graphs and Other Performance Enhancements
CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...
8 MIN READ

Sep 10, 2024
Accelerating the HPCG Benchmark with NVIDIA Math Sparse Libraries
In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate...
9 MIN READ

Sep 06, 2024
Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0
NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
7 MIN READ

Aug 29, 2024
Spotlight: clicOH Accelerates Last-Mile Delivery 20x with NVIDIA cuOpt
Driven by shifts in consumer behavior and the pandemic, e-commerce continues its explosive growth and transformation. As a result, logistics and transportation...
3 MIN READ

Aug 29, 2024
Boosting CUDA Efficiency with Essential Techniques for New Developers
To fully harness the capabilities of NVIDIA GPUs, optimizing NVIDIA CUDA performance is essential, particularly for developers new to GPU programming. This talk...
2 MIN READ

Aug 08, 2024
Improving GPU Performance by Reducing Instruction Cache Misses
GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...
11 MIN READ

Aug 07, 2024
Optimizing llama.cpp AI Inference with CUDA Graphs
The open-source llama.cpp code base was originally released in 2023 as a lightweight but efficient framework for performing inference on Meta Llama models....
8 MIN READ