CUDA – NVIDIA Technical BlogNews and tutorials for developers, data scientists, and IT admins2025-04-29T17:23:06Zhttp://www.open-lab.net/blog/feed/Bo Dong<![CDATA[NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support]]>http://www.open-lab.net/blog/?p=990892025-04-23T19:26:15Z2025-04-23T19:26:07ZNVIDIA cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework. It brings...
]]>0Daniel Rodriguez<![CDATA[Announcing ComputeEval, an Open-Source Framework for Evaluating LLMs on CUDA]]>http://www.open-lab.net/blog/?p=988852025-04-22T23:39:35Z2025-04-16T16:48:07ZLarge language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today��s...
]]>0Ben Williams<![CDATA[Networking Reliability and Observability at Scale with NCCL 2.24]]>http://www.open-lab.net/blog/?p=967312025-04-23T00:32:27Z2025-03-13T16:30:00ZThe NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking....
]]>0Tony Scudiero<![CDATA[Understanding PTX, the Assembly Language of CUDA GPU Computing]]>http://www.open-lab.net/blog/?p=968912025-04-23T00:32:55Z2025-03-12T18:00:00ZParallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...
]]>0Nikhil Gupta<![CDATA[Optimizing Compile Times for CUDA C++]]>http://www.open-lab.net/blog/?p=967752025-04-23T00:36:07Z2025-03-10T18:02:27ZIn modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on...
]]>0Mark J. Bennett<![CDATA[GPU-Accelerate Algorithmic Trading Simulations by over 100x with Numba]]>http://www.open-lab.net/blog/?p=966522025-03-10T23:13:45Z2025-03-04T21:44:01ZQuantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical...
]]>0Anton Anders<![CDATA[NVIDIA cuDSS Advances Solver Technologies for Engineering and Scientific Computing]]>http://www.open-lab.net/blog/?p=964662025-04-23T02:36:28Z2025-02-25T18:30:56ZNVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in...
]]>0Jesus Alvarez<![CDATA[NVIDIA Open GPU Datacenter Drivers for RHEL9 Signed by Red Hat]]>http://www.open-lab.net/blog/?p=950692025-04-23T02:52:36Z2025-02-10T17:48:26ZNVIDIA and Red Hat have partnered to bring continued improvements to the precompiled NVIDIA Driver introduced in 2020. Last month, NVIDIA announced that the...
]]>3Michelle Horton<![CDATA[AI Foundation Model Enhances Cancer Diagnosis and Tailors Treatment]]>http://www.open-lab.net/blog/?p=957222025-04-23T02:48:13Z2025-02-04T17:16:54ZA new study and AI model from researchers at Stanford University is streamlining cancer diagnostics, treatment planning, and prognosis prediction. Named MUSK...
]]>1Matthew Nicely<![CDATA[Just Released: CUTLASS 3.8]]>http://www.open-lab.net/blog/?p=957162025-02-06T19:33:50Z2025-02-03T23:54:16ZProvides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance...
]]>0Sylvain Jeaugey<![CDATA[New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23]]>http://www.open-lab.net/blog/?p=954122025-04-23T02:48:19Z2025-01-31T22:47:37ZThe NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...
]]>0Zachary Bourque<![CDATA[Dynamic Loading in the CUDA Runtime]]>http://www.open-lab.net/blog/?p=939582025-04-23T14:57:41Z2025-01-31T20:03:32ZHistorically, the GPU device code is compiled alongside the application with offline tools such as nvcc. In this case, the GPU device code is managed internally...
]]>0Jonathan Bentz<![CDATA[CUDA Toolkit Now Available for NVIDIA Blackwell?]]>http://www.open-lab.net/blog/?p=953582025-04-23T14:58:16Z2025-01-31T19:17:12ZThe latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...
]]>0Annamalai Chockalingam<![CDATA[New AI SDKs and Tools Released for NVIDIA Blackwell GeForce RTX 50 Series GPUs]]>http://www.open-lab.net/blog/?p=955262025-04-23T15:00:41Z2025-01-30T14:00:00ZNVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the...
]]>0Michelle Horton<![CDATA[Advancing Rare Disease Detection with AI-Powered Cellular Profiling]]>http://www.open-lab.net/blog/?p=954982025-04-23T15:01:14Z2025-01-29T20:45:46ZRare diseases are difficult to diagnose due to limitations in traditional genomic sequencing. Wolfgang Pernice, assistant professor at Columbia University, is...
]]>0Fred Oh<![CDATA[Upcoming Event: CUDA Developer Meet Up in Silicon Valley]]>http://www.open-lab.net/blog/?p=950352025-01-23T19:54:25Z2025-01-15T04:25:31ZWhether you're just starting your GPU programming journey or you're a CUDA ninja looking to share advanced techniques, join us in San Jose on 1/30/25.
]]>0Nick Becker<![CDATA[RAPIDS 24.12 Introduces cuDF on PyPI, CUDA Unified Memory for Polars, and Faster GNNs]]>http://www.open-lab.net/blog/?p=944152024-12-19T21:46:07Z2024-12-19T21:21:42ZRAPIDS 24.12 introduces cuDF packages to PyPI, speeds up groupby aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the...
]]>0Ziyue Xu<![CDATA[Security for Data Privacy in Federated Learning with CUDA-Accelerated Homomorphic Encryption in XGBoost]]>http://www.open-lab.net/blog/?p=938702024-12-17T19:33:44Z2024-12-18T21:30:00ZXGBoost is a machine learning algorithm widely used for tabular data modeling. To expand the XGBoost model from single-site learning to multisite collaborative...
]]>0Miles Macklin<![CDATA[Introducing Tile-Based Programming in Warp 1.5.0]]>http://www.open-lab.net/blog/?p=940022025-03-11T23:13:10Z2024-12-14T21:15:45ZWith the latest release of Warp 1.5.0, developers now have access to new tile-based programming primitives in Python. Leveraging cuBLASDx and cuFFTDx, these new...
]]>0Amr Elmeleegy<![CDATA[Spotlight: Perplexity AI Serves 400 Million Search Queries a Month Using NVIDIA Inference Stack]]>http://www.open-lab.net/blog/?p=933962025-03-18T18:26:38Z2024-12-05T17:58:43ZThe demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...
]]>0Ben Zaitlenhttps://www.linkedin.com/in/benjamin-zaitlen-62ab7b4/<![CDATA[Best Practices for Multi-GPU Data Analysis Using RAPIDS with Dask]]>http://www.open-lab.net/blog/?p=924802024-12-12T19:38:40Z2024-11-21T19:02:03ZAs we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis...
]]>0Sungho Shin<![CDATA[NVIDIA cuDSS Library Removes Barriers to Optimizing the US Power Grid]]>http://www.open-lab.net/blog/?p=920652024-11-19T18:26:47Z2024-11-19T17:00:00ZIn the wake of ever-growing power demands, power systems optimization (PSO) of power grids is crucial for ensuring efficient resource management,...
]]>0Alex McCaskey<![CDATA[Introducing NVIDIA CUDA-QX Libraries for Accelerated Quantum Supercomputing]]>http://www.open-lab.net/blog/?p=919292024-11-19T23:17:40Z2024-11-18T18:30:00ZAccelerated quantum supercomputing combines the benefits of AI supercomputing with quantum processing units (QPUs) to develop solutions to some of the world��s...
]]>0Szymon Karpi��ski<![CDATA[Fusing Epilog Operations with Matrix Multiplication Using nvmath-python]]>http://www.open-lab.net/blog/?p=920982025-04-01T18:19:57Z2024-11-18T18:30:00Znvmath-python (Beta) is an open-source Python library, providing Python programmers with access to high-performance mathematical operations from NVIDIA CUDA-X...
]]>1Wonchan Lee<![CDATA[Effortlessly Scale NumPy from Laptops to Supercomputers with NVIDIA cuPyNumeric]]>http://www.open-lab.net/blog/?p=916822025-04-10T23:02:00Z2024-11-18T17:00:00ZPython is the most common programming language for data science, machine learning, and numerical computing. It continues to grow in popularity among scientists...
]]>1Kyle Tretina<![CDATA[Boost Alphafold2 Protein Structure Prediction with GPU-Accelerated MMseqs2]]>http://www.open-lab.net/blog/?p=916232024-11-14T17:10:35Z2024-11-13T17:00:00ZThe ability to compare the sequences of multiple related proteins is a foundational task for many life science researchers. This is often done in the form of a...
]]>0Michelle Horton<![CDATA[AI That ��Hears�� Heart Disease May Help Vets Diagnose Dogs]]>http://www.open-lab.net/blog/?p=916192024-11-14T17:10:40Z2024-11-12T15:49:17ZA new machine-learning algorithm that listens to digital heartbeat data could help veterinarians diagnose murmurs and early-stage heart disease in dogs....
]]>0Michael Yh Wang<![CDATA[Bridging the CUDA C++ Ecosystem and Python Developers with Numbast]]>http://www.open-lab.net/blog/?p=900862024-10-31T16:26:15Z2024-10-24T16:30:00ZBy enabling CUDA kernels to be written in Python similar to how they can be implemented within C++, Numba bridges the gap between the Python ecosystem and the...
]]>0Elias Wolfberg<![CDATA[AI Medical Imagery Model Offers Fast, Cost-Efficient Expert Analysis?]]>http://www.open-lab.net/blog/?p=903922025-01-07T20:23:08Z2024-10-17T18:28:20ZResearchers at UCLA have developed a new AI model that can expertly analyze 3D medical images of diseases in a fraction of the time it would otherwise take a...
]]>0Brad Nemire<![CDATA[Just Released: Updated Math Libraries in CUDA Toolkit 12.6.2]]>http://www.open-lab.net/blog/?p=901272024-10-17T18:19:05Z2024-10-09T16:53:54ZCUDA Toolkit 12.6.2 improves performance and provides new features in cuBLAS, cuSOLVER, and cuFFT LTO libraries.
]]>0Paul Logan<![CDATA[Accelerating Reality Capture Workflows with AI and NVIDIA RTX GPUs]]>http://www.open-lab.net/blog/?p=897192024-10-17T18:19:11Z2024-10-07T23:03:48ZReality capture creates highly accurate, detailed, and immersive digital representations of environments. Innovations in site scanning and accelerated data...
]]>0Tanya Lenz<![CDATA[Webinar: Accelerating Python with GPUs]]>http://www.open-lab.net/blog/?p=896592024-10-17T19:07:02Z2024-10-02T18:00:00ZJoin us on October 9 to learn how your applications can benefit from NVIDIA CUDA Python software initiatives.
]]>0Annamalai Chockalingam<![CDATA[Accelerating LLMs with llama.cpp on NVIDIA RTX Systems]]>http://www.open-lab.net/blog/?p=896632024-11-22T23:11:17Z2024-10-02T13:00:00ZThe NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate...
]]>0Mark Wolf<![CDATA[Advancing Quantum Algorithm Design with GPTs]]>http://www.open-lab.net/blog/?p=891732024-10-17T19:07:09Z2024-09-30T16:00:00ZAI techniques like large language models (LLMs) are rapidly transforming many scientific disciplines. Quantum computing is no exception. A collaboration between...
]]>0Daniel Galvez<![CDATA[Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo]]>http://www.open-lab.net/blog/?p=893302024-10-17T19:07:17Z2024-09-24T18:27:35ZNVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging...
]]>0William Hill<![CDATA[Just Released: Torch-TensorRT v2.4.0]]>http://www.open-lab.net/blog/?p=892292024-09-19T17:50:49Z2024-09-19T17:50:46ZIncludes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.
]]>0Richard Wang<![CDATA[Accelerating Oracle Database Generative AI Workloads with NVIDIA NIM and NVIDIA cuVS]]>http://www.open-lab.net/blog/?p=889632024-10-28T21:54:43Z2024-09-17T19:04:16ZThe vast majority of the world's data remains untapped, and enterprises are looking to generate value from this data by creating the next wave of generative AI...
]]>0Michelle Horton<![CDATA[Advanced Strategies for High-Performance GPU Programming with NVIDIA CUDA]]>http://www.open-lab.net/blog/?p=880692024-09-19T19:31:59Z2024-09-11T16:25:00ZStephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping...
]]>1Houston Hoffman<![CDATA[Constant Time Launch for Straight-Line CUDA Graphs and Other Performance Enhancements]]>http://www.open-lab.net/blog/?p=886312024-09-19T19:32:10Z2024-09-11T16:00:00ZCUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...
]]>1Mohammad Almasri<![CDATA[Accelerating the HPCG Benchmark with NVIDIA Math Sparse Libraries]]>http://www.open-lab.net/blog/?p=885662024-09-19T19:32:22Z2024-09-10T16:30:00ZIn the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate...
]]>0Akhil Langer<![CDATA[Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0]]>http://www.open-lab.net/blog/?p=885502024-09-19T19:34:01Z2024-09-06T20:30:09ZNVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
]]>0Oscar Javier Aldana<![CDATA[Spotlight: clicOH Accelerates Last-Mile Delivery 20x with NVIDIA cuOpt]]>http://www.open-lab.net/blog/?p=883632024-09-05T17:57:11Z2024-08-29T22:18:14ZDriven by shifts in consumer behavior and the pandemic, e-commerce continues its explosive growth and transformation. As a result, logistics and transportation...
]]>0Michelle Horton<![CDATA[Boosting CUDA Efficiency with Essential Techniques for New Developers]]>http://www.open-lab.net/blog/?p=878232024-09-05T17:57:12Z2024-08-29T17:00:00ZTo fully harness the capabilities of NVIDIA GPUs, optimizing NVIDIA CUDA performance is essential, particularly for developers new to GPU programming. This talk...
]]>1Rob Van der Wijngaart<![CDATA[Improving GPU Performance by Reducing Instruction Cache Misses]]>http://www.open-lab.net/blog/?p=868682025-01-22T17:57:59Z2024-08-08T16:30:00ZGPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...
]]>6Alan Gray<![CDATA[Optimizing llama.cpp AI Inference with CUDA Graphs]]>http://www.open-lab.net/blog/?p=868452024-11-14T16:03:17Z2024-08-07T20:00:00ZThe open-source llama.cpp code base was originally released in 2023 as a lightweight but efficient framework for performing inference on Meta Llama models....
]]>0Rob Armstrong<![CDATA[Just Released: CUDA Toolkit 12.6]]>http://www.open-lab.net/blog/?p=866752024-08-28T17:29:07Z2024-08-01T20:00:00ZThe release supports GB100 capabilities and new library enhancements to cuBLAS, cuFFT, cuSOLVER, cuSPARSE, as well as the release of Nsight Compute 2024.3.
]]>0Rob Armstrong<![CDATA[NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules]]>http://www.open-lab.net/blog/?p=853312024-08-08T18:48:48Z2024-07-17T16:40:27ZWith the R515 driver, NVIDIA released a set of Linux GPU kernel modules in May 2022 as open source with dual GPL and MIT licensing. The initial release targeted...
]]>5Vijay Thakkar<![CDATA[Next Generation of FlashAttention]]>http://www.open-lab.net/blog/?p=852192024-07-25T18:19:05Z2024-07-11T17:46:06ZNVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and...
]]>0Robert Jensen<![CDATA[Just Released: nvmath-python]]>http://www.open-lab.net/blog/?p=844392024-07-25T18:19:11Z2024-07-09T16:00:00Znvmath-python is an open-source Python library that provides high performance access to the core mathematical operations in the NVIDIA Math Libraries. Available...
]]>0Robert Jensen<![CDATA[Just Released: cuDSS 0.3.0]]>http://www.open-lab.net/blog/?p=844342024-07-25T18:19:14Z2024-07-03T15:00:00ZcuDSS (Preview) is an accelerated direct sparse solver. It now supports multi-GPU multi-node platforms, and introduces a hybrid memory mode.
]]>0Steven Gurfinkel<![CDATA[Checkpointing CUDA Applications with CRIU]]>http://www.open-lab.net/blog/?p=842362024-07-25T18:19:18Z2024-07-02T16:00:00ZCheckpoint and restore functionality for CUDA is exposed through a command-line utility called cuda-checkpoint. This utility can be used to transparently...
]]>1Jon Waxman<![CDATA[Runtime Fatbin Creation Using the NVIDIA CUDA Toolkit 12.4 Compiler]]>http://www.open-lab.net/blog/?p=839922024-06-27T18:17:56Z2024-06-18T17:28:55ZCUDA Toolkit 12.4 introduced a new nvFatbin library for creating fatbins at runtime. Fatbins, otherwise known as NVIDIA device code fat binaries, are containers...
]]>1Elena Agostini<![CDATA[Unlocking GPU-Accelerated RDMA with NVIDIA DOCA GPUNetIO]]>http://www.open-lab.net/blog/?p=839982024-06-27T23:59:16Z2024-06-13T20:43:59ZNVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like...
]]>4Babak Hejazi<![CDATA[Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates]]>http://www.open-lab.net/blog/?p=838882024-07-16T17:19:07Z2024-06-12T20:30:00ZThe latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...
]]>0Jen Witsoe<![CDATA[Just Released: Nsight Compute 2024.2]]>http://www.open-lab.net/blog/?p=827892024-08-28T17:29:39Z2024-05-22T16:57:50ZNsight Compute 2024.2 adds Python syntax highlighting and call stacks, a redesigned report header, and source page statistics to make CUDA optimization easier.
]]>0Tanya Lenz<![CDATA[Just Released: CUDA Toolkit 12.5]]>http://www.open-lab.net/blog/?p=828402024-05-30T19:55:49Z2024-05-21T20:29:43ZCUDA Toolkit 12.5 supports new NVIDIA L20 and H20 GPUs and simultaneous compute and graphics to DirectX, and updates Nsight Compute and CUDA-X Libraries.
]]>0Jason Gaiser<![CDATA[Dynamic Control Flow in CUDA Graphs with Conditional Nodes]]>http://www.open-lab.net/blog/?p=810122025-02-03T22:25:21Z2024-05-10T18:43:37ZPost updated on February 3, 2025 with details about CUDA 12.8. CUDA Graphs can provide a significant performance increase, as the driver is able to optimize...
]]>2Tianna Nguy<![CDATA[NVIDIA GTC Training Labs On Demand Available Now]]>http://www.open-lab.net/blog/?p=821572024-05-07T17:47:17Z2024-05-07T17:02:57ZMissed GTC or want to replay your favorite training labs? Find it on demand with the NVIDIA GTC Training Labs playlist.
]]>0Rob Van der Wijngaart<![CDATA[Measuring the GPU Occupancy of Multi-stream Workloads]]>http://www.open-lab.net/blog/?p=810742025-01-03T00:33:09Z2024-04-19T16:00:00ZNVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM), the...
]]>0Paul Graham<![CDATA[Efficient CUDA Debugging: Using NVIDIA Compute Sanitizer with NVIDIA Tools Extension and Creating Custom Tools]]>http://www.open-lab.net/blog/?p=803832024-08-28T17:30:34Z2024-03-27T20:29:15ZNVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications....
]]>1Robert Jensen<![CDATA[Building High-Performance Applications in the Era of Accelerated Computing]]>http://www.open-lab.net/blog/?p=800672024-08-28T17:32:20Z2024-03-25T16:00:00ZAI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...
]]>0Robert Jensen<![CDATA[Just Released: NVIDIA cuSPARSELt 0.6]]>http://www.open-lab.net/blog/?p=786832024-04-09T23:45:24Z2024-03-14T16:00:00ZNVIDIA cuSPARSELt harnesses Sparse Tensor Cores to accelerate general matrix multiplications. Version 0.6. adds support for the NVIDIA Hopper architecture.
]]>0Rob Armstrong<![CDATA[CUDA Toolkit 12.4 Enhances Support for NVIDIA Grace Hopper and Confidential Computing]]>http://www.open-lab.net/blog/?p=791192024-08-28T17:32:44Z2024-03-06T19:55:00ZThe latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new...
]]>0Feiwen Zhu<![CDATA[Optimizing OpenFold Training for Drug Discovery]]>http://www.open-lab.net/blog/?p=783462024-03-07T19:18:52Z2024-02-28T19:29:02ZPredicting 3D protein structures from amino acid sequences has been an important long-standing question in bioinformatics. In recent years, deep...
]]>0Robert Jensen<![CDATA[Just Released: cuBLASDx]]>http://www.open-lab.net/blog/?p=765352024-01-25T18:17:35Z2024-01-12T18:58:48ZcuBLASDx allows you to perform BLAS calculations inside your CUDA kernel, improving the performance of your application. Available to download in Preview...
]]>0Rahul Ramasubramanian<![CDATA[Improving CUDA Initialization Times Using cgroups in Certain Scenarios]]>http://www.open-lab.net/blog/?p=755342024-01-11T19:49:33Z2024-01-05T22:14:41ZMany CUDA applications running on multi-GPU platforms usually use a single GPU for their compute needs. In such scenarios, a performance penalty is paid by...
]]>0Robert Jensen<![CDATA[Just Released: cuBLASMp]]>http://www.open-lab.net/blog/?p=751702023-12-20T18:06:09Z2023-12-20T18:00:00ZcuBLASMp is a high-performance, multi-process, GPU-accelerated library for distributed basic dense linear algebra. It is available to download in Preview now.
]]>0Efrat Shabtai<![CDATA[CUDA-Q 0.5 Delivers New Features for Quantum-Classical Computing]]>http://www.open-lab.net/blog/?p=743162024-05-07T19:29:27Z2023-11-29T17:00:00ZNVIDIA CUDA-Q is a platform for building quantum-classical computing applications. It is an open-source programming model for heterogeneous computing such as...
]]>0Alexey Panteleev<![CDATA[Unlocking GPU Intrinsics in HLSL]]>http://www.open-lab.net/blog/?p=720952023-11-30T19:43:27Z2023-11-21T18:08:42ZThere are some useful intrinsic functions in the NVIDIA GPU instruction set that are not included in standard graphics APIs. Updated from the original 2016 post...
]]>0Asawaree Bhide<![CDATA[Boosting Custom ROS Graphs Using NVIDIA Isaac Transport for ROS]]>http://www.open-lab.net/blog/?p=740082023-11-30T19:43:29Z2023-11-17T21:11:29ZNVIDIA Isaac Transport for ROS (NITROS) is the implementation of two hardware-acceleration features introduced with ROS 2 Humble-type adaptation and type...
]]>1Graham Lopez<![CDATA[Unlock the Power of NVIDIA Grace and NVIDIA Hopper Architectures with Foundational HPC Software]]>http://www.open-lab.net/blog/?p=729772024-08-28T17:33:20Z2023-11-16T19:07:51ZHigh-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern...
]]>0Balakumar Sundaralingam<![CDATA[CUDA-Accelerated Robot Motion Generation in Milliseconds with NVIDIA cuRobo]]>http://www.open-lab.net/blog/?p=724242023-12-05T19:04:45Z2023-11-07T22:22:37ZReal-time autonomous robot navigation powered by a fast motion-generation algorithm can enable applications in several industries such as food and services,...
]]>0Joseph Chandler<![CDATA[ICYMI: Leveraging the Power of GPUs with CuPy in Python]]>http://www.open-lab.net/blog/?p=726372023-11-16T19:16:47Z2023-11-06T19:17:06ZSee how KDNuggets achieved 500x speedup using CuPy and NVIDIA CUDA on 3D arrays.
]]>0Rob Armstrong<![CDATA[CUDA Toolkit 12.3 Delivers New Features for Accelerated Computing]]>http://www.open-lab.net/blog/?p=717352024-08-28T17:33:55Z2023-11-01T16:00:00ZThe latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this...
]]>0Mozhgan Kabiri Chimeh<![CDATA[Efficient CUDA Debugging: Memory Initialization and Thread Synchronization with NVIDIA Compute Sanitizer]]>http://www.open-lab.net/blog/?p=719252024-03-21T22:25:40Z2023-10-24T16:00:00ZNVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications.? In...
]]>1Sai Bangaru<![CDATA[Differentiable Slang: Example Applications]]>http://www.open-lab.net/blog/?p=720182023-11-02T20:23:30Z2023-10-23T04:03:00ZDifferentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable...
]]>0Sai Bangaru<![CDATA[Differentiable Slang: A Shading Language for Renderers That Learn]]>http://www.open-lab.net/blog/?p=720112023-11-02T20:23:44Z2023-10-23T04:02:00ZNVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...
]]>0Prabhu Ramamoorthy<![CDATA[NVIDIA H100 System for HPC and Generative AI Sets Record for Financial Risk Calculations]]>http://www.open-lab.net/blog/?p=711962024-08-28T17:35:19Z2023-09-28T15:25:49ZGenerative AI is taking the world by storm, from large language models (LLMs) to generative pretrained transformer (GPT) models to diffusion models. NVIDIA is...
]]>1Jackson Marusarz<![CDATA[New Video Series: CUDA Developer Tools Tutorials]]>http://www.open-lab.net/blog/?p=710582024-08-28T17:35:38Z2023-09-25T17:00:00ZGPU acceleration is enabling faster and more intelligent applications than ever before, and the CUDA Toolkit is key to harnessing acceleration on NVIDIA GPUs....
]]>0Zachary Bourque<![CDATA[NVIDIA CUDA Toolkit Symbol Server]]>http://www.open-lab.net/blog/?p=704932023-09-21T17:56:27Z2023-09-07T19:10:21ZNVIDIA has already made available a GPU driver binary symbols server for Windows. Now, NVIDIA is making available a repository of CUDA Toolkit symbols for...
]]>2Robert Jensen<![CDATA[New Video Tutorial: Profiling and Debugging NVIDIA CUDA Applications]]>http://www.open-lab.net/blog/?p=700942024-08-28T17:36:19Z2023-08-30T16:00:00ZEpisode 5 of the NVIDIA CUDA Tutorials Video series is out. Jackson Marusarz, product manager for Compute Developer Tools at NVIDIA, introduces a suite of tools...
]]>0John Hubbard<![CDATA[Simplifying GPU Application Development with Heterogeneous Memory Management]]>http://www.open-lab.net/blog/?p=695422023-09-13T17:07:34Z2023-08-22T17:00:00ZHeterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...
]]>0Michelle Horton<![CDATA[Ask Me Anything: NVIDIA CUDA Toolkit 12]]>http://www.open-lab.net/blog/?p=684402023-08-10T17:11:18Z2023-07-25T18:21:38ZOn July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.?
]]>0Zohim Chandani<![CDATA[Programming the Quantum-Classical Supercomputer]]>http://www.open-lab.net/blog/?p=680442024-05-07T19:30:16Z2023-07-19T16:00:00ZHeterogeneous computing architectures��those that incorporate a variety of processor types working in tandem��have proven extremely valuable in the continued...
]]>0Joel Lashmore<![CDATA[GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks]]>http://www.open-lab.net/blog/?p=675032023-11-10T01:27:07Z2023-07-17T18:08:30ZWe were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on...
]]>0Guy Salton<![CDATA[Train Your AI Model Once and Deploy on Any Cloud with NVIDIA and Run:ai]]>http://www.open-lab.net/blog/?p=670352023-09-11T21:36:55Z2023-07-07T16:38:25ZOrganizations are increasingly adopting hybrid and multi-cloud strategies to access the latest compute resources, consistently support worldwide customers, and...
]]>2Rob Armstrong<![CDATA[CUDA Toolkit 12.2 Unleashes Powerful Features for Boosting Applications]]>http://www.open-lab.net/blog/?p=677052024-08-28T17:39:00Z2023-07-06T19:16:56ZThe latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware...
]]>0Fred Oh<![CDATA[Event: CUDA 12.2 YouTube Premiere]]>http://www.open-lab.net/blog/?p=675042023-07-27T18:54:26Z2023-07-03T19:00:00ZWatch on-demand as experts deep dive into CUDA 12.2, including support for confidential computing.
]]>0Paul Graham<![CDATA[Efficient CUDA Debugging: How to Hunt Bugs with NVIDIA Compute Sanitizer]]>http://www.open-lab.net/blog/?p=669152024-03-21T22:32:29Z2023-06-29T18:21:00ZDebugging code is a crucial aspect of software development but can be both challenging and time-consuming. Parallel programming with thousands of threads can...
]]>7Ashraf Eassa<![CDATA[Breaking MLPerf Training Records with NVIDIA H100 GPUs]]>http://www.open-lab.net/blog/?p=669192023-07-13T19:00:28Z2023-06-27T16:00:00ZAt the heart of the rapidly expanding set of AI-powered applications are powerful AI models. Before these models can be deployed, they must be trained through a...
]]>0Deepak Unnikrishnan<![CDATA[CUDA 12.1 Supports Large Kernel Parameters]]>http://www.open-lab.net/blog/?p=660582024-08-28T17:39:46Z2023-06-05T17:00:00ZCUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...
]]>4Michael Balint<![CDATA[Harnessing the Power of NVIDIA AI Enterprise on Azure Machine Learning]]>http://www.open-lab.net/blog/?p=660162023-06-14T19:45:42Z2023-06-02T18:08:43ZAI is transforming industries, automating processes, and opening new opportunities for innovation in the rapidly evolving technological landscape. As more...
]]>0Tom Lubowe<![CDATA[QHack Results Highlight Quantum Computing Applications and Tools on GPUs]]>http://www.open-lab.net/blog/?p=647812024-05-07T19:30:32Z2023-05-18T19:00:00ZQHack is an educational conference and the world��s largest quantum machine learning (QML) hackathon. This year at QHack 2023, 2,850 individuals from 105...
]]>0Michelle Horton<![CDATA[Webinar: Performant Multiphase Flow Simulation at Leadership-Class Scale]]>http://www.open-lab.net/blog/?p=649072023-08-18T20:53:46Z2023-05-17T22:10:15ZOn June 6, learn how researchers use OpenACC for GPU acceleration of multiphase and compressible flow solvers that obtain speedups at scale.
]]>0Joseph Cavanaugh<![CDATA[Advanced API Performance: CPUs]]>http://www.open-lab.net/blog/?p=641532023-10-02T05:00:51Z2023-05-17T18:00:00ZThis post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...
]]>0Jonathan Wong<![CDATA[Asynchronous Error Reporting: When printf Just Won��t Do]]>http://www.open-lab.net/blog/?p=603772023-06-12T07:56:12Z2023-05-16T18:18:39ZSome programming situations call for reporting ��soft�� errors asynchronously. While printf can be a useful tool, it can increase register use and impact...
]]>0Yury Uralsky<![CDATA[Advanced API Performance: Sampler Feedback]]>http://www.open-lab.net/blog/?p=629082023-10-02T05:02:21Z2023-05-04T17:11:42ZThis post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...
]]>0Gene Pache<![CDATA[Microsoft and TempoQuest Accelerate Wind Energy Forecasts with AceCast]]>http://www.open-lab.net/blog/?p=640912023-06-09T20:29:27Z2023-04-28T14:00:00ZAccurate weather modeling is essential for companies to properly forecast renewable energy production and plan for natural disasters. Ineffective and...
]]>1Peter Entschev<![CDATA[Debugging a Mixed Python and C Language Stack]]>http://www.open-lab.net/blog/?p=636412023-06-09T22:28:19Z2023-04-20T17:00:00ZDebugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill...