CUDA

Mar 13, 2025
Networking Reliability and Observability at Scale with NCCL 2.24
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking....
14 MIN READ

Mar 12, 2025
Understanding PTX, the Assembly Language of CUDA GPU Computing
Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...
13 MIN READ

Mar 10, 2025
Optimizing Compile Times for CUDA C++
In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on...
10 MIN READ

Mar 04, 2025
GPU-Accelerate Algorithmic Trading Simulations by over 100x with Numba
Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical...
12 MIN READ

Feb 25, 2025
NVIDIA cuDSS Advances Solver Technologies for Engineering and Scientific Computing
NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in...
12 MIN READ

Feb 10, 2025
NVIDIA Open GPU Datacenter Drivers for RHEL9 Signed by Red Hat
NVIDIA and Red Hat have partnered to bring continued improvements to the precompiled NVIDIA Driver introduced in 2020. Last month, NVIDIA announced that the...
4 MIN READ

Feb 04, 2025
AI Foundation Model Enhances Cancer Diagnosis and Tailors Treatment
A new study and AI model from researchers at Stanford University is streamlining cancer diagnostics, treatment planning, and prognosis prediction. Named MUSK...
4 MIN READ

Feb 03, 2025
Just Released: CUTLASS 3.8
Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance...
1 MIN READ

Jan 31, 2025
New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...
9 MIN READ

Jan 31, 2025
Dynamic Loading in the CUDA Runtime
Historically, the GPU device code is compiled alongside the application with offline tools such as nvcc. In this case, the GPU device code is managed internally...
8 MIN READ

Jan 31, 2025
CUDA Toolkit Now Available for NVIDIA Blackwell?
The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...
9 MIN READ

Jan 30, 2025
New AI SDKs and Tools Released for NVIDIA Blackwell GeForce RTX 50 Series GPUs
NVIDIA recently announced a new generation of PC GPUs—the GeForce RTX 50 Series—alongside new AI-powered SDKs and tools for developers. Powered by the...
6 MIN READ

Jan 29, 2025
Advancing Rare Disease Detection with AI-Powered Cellular Profiling
Rare diseases are difficult to diagnose due to limitations in traditional genomic sequencing. Wolfgang Pernice, assistant professor at Columbia University, is...
3 MIN READ

Jan 14, 2025
Upcoming Event: CUDA Developer Meet Up in Silicon Valley
Whether you’re just starting your GPU programming journey or you’re a CUDA ninja looking to share advanced techniques, join us in San Jose on 1/30/25.
1 MIN READ

Dec 19, 2024
RAPIDS 24.12 Introduces cuDF on PyPI, CUDA Unified Memory for Polars, and Faster GNNs
RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up groupby aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the...
8 MIN READ

Dec 18, 2024
Security for Data Privacy in Federated Learning with CUDA-Accelerated Homomorphic Encryption in XGBoost
XGBoost is a machine learning algorithm widely used for tabular data modeling. To expand the XGBoost model from single-site learning to multisite collaborative...
10 MIN READ