CUDA – NVIDIA Technical Blog

CUDA – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-04-29T17:23:06Z http://www.open-lab.net/blog/feed/ Bo Dong <![CDATA[NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support]]> http://www.open-lab.net/blog/?p=99089 2025-04-23T19:26:15Z 2025-04-23T19:26:07Z

NVIDIA cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework. It brings...

]]>

0 Daniel Rodriguez <![CDATA[Announcing ComputeEval, an Open-Source Framework for Evaluating LLMs on CUDA]]> http://www.open-lab.net/blog/?p=98885 2025-04-22T23:39:35Z 2025-04-16T16:48:07Z

Large language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today��s...

]]>

0 Ben Williams <![CDATA[Networking Reliability and Observability at Scale with NCCL 2.24]]> http://www.open-lab.net/blog/?p=96731 2025-04-23T00:32:27Z 2025-03-13T16:30:00Z

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking....

]]>

0 Tony Scudiero <![CDATA[Understanding PTX, the Assembly Language of CUDA GPU Computing]]> http://www.open-lab.net/blog/?p=96891 2025-04-23T00:32:55Z 2025-03-12T18:00:00Z

Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...

]]>

0 Nikhil Gupta <![CDATA[Optimizing Compile Times for CUDA C++]]> http://www.open-lab.net/blog/?p=96775 2025-04-23T00:36:07Z 2025-03-10T18:02:27Z

In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on...

]]>

0 Mark J. Bennett <![CDATA[GPU-Accelerate Algorithmic Trading Simulations by over 100x with Numba]]> http://www.open-lab.net/blog/?p=96652 2025-03-10T23:13:45Z 2025-03-04T21:44:01Z

Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical...

]]>

0 Anton Anders <![CDATA[NVIDIA cuDSS Advances Solver Technologies for Engineering and Scientific Computing]]> http://www.open-lab.net/blog/?p=96466 2025-04-23T02:36:28Z 2025-02-25T18:30:56Z

NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in...

]]>

0 Jesus Alvarez <![CDATA[NVIDIA Open GPU Datacenter Drivers for RHEL9 Signed by Red Hat]]> http://www.open-lab.net/blog/?p=95069 2025-04-23T02:52:36Z 2025-02-10T17:48:26Z

NVIDIA and Red Hat have partnered to bring continued improvements to the precompiled NVIDIA Driver introduced in 2020. Last month, NVIDIA announced that the...

]]>

3 Michelle Horton <![CDATA[AI Foundation Model Enhances Cancer Diagnosis and Tailors Treatment]]> http://www.open-lab.net/blog/?p=95722 2025-04-23T02:48:13Z 2025-02-04T17:16:54Z

A new study and AI model from researchers at Stanford University is streamlining cancer diagnostics, treatment planning, and prognosis prediction. Named MUSK...

]]>

1 Matthew Nicely <![CDATA[Just Released: CUTLASS 3.8]]> http://www.open-lab.net/blog/?p=95716 2025-02-06T19:33:50Z 2025-02-03T23:54:16Z

Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance...

]]>

0 Sylvain Jeaugey <![CDATA[New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23]]> http://www.open-lab.net/blog/?p=95412 2025-04-23T02:48:19Z 2025-01-31T22:47:37Z

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...

]]>

0 Zachary Bourque <![CDATA[Dynamic Loading in the CUDA Runtime]]> http://www.open-lab.net/blog/?p=93958 2025-04-23T14:57:41Z 2025-01-31T20:03:32Z

Historically, the GPU device code is compiled alongside the application with offline tools such as nvcc. In this case, the GPU device code is managed internally...

]]>

0 Jonathan Bentz <![CDATA[CUDA Toolkit Now Available for NVIDIA Blackwell?]]> http://www.open-lab.net/blog/?p=95358 2025-04-23T14:58:16Z 2025-01-31T19:17:12Z

The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...

]]>

0 Annamalai Chockalingam <![CDATA[New AI SDKs and Tools Released for NVIDIA Blackwell GeForce RTX 50 Series GPUs]]> http://www.open-lab.net/blog/?p=95526 2025-04-23T15:00:41Z 2025-01-30T14:00:00Z

NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the...

]]>

0 Michelle Horton <![CDATA[Advancing Rare Disease Detection with AI-Powered Cellular Profiling]]> http://www.open-lab.net/blog/?p=95498 2025-04-23T15:01:14Z 2025-01-29T20:45:46Z

Rare diseases are difficult to diagnose due to limitations in traditional genomic sequencing. Wolfgang Pernice, assistant professor at Columbia University, is...

]]>

0 Fred Oh <![CDATA[Upcoming Event: CUDA Developer Meet Up in Silicon Valley]]> http://www.open-lab.net/blog/?p=95035 2025-01-23T19:54:25Z 2025-01-15T04:25:31Z

Whether you're just starting your GPU programming journey or you're a CUDA ninja looking to share advanced techniques, join us in San Jose on 1/30/25.

]]>

0 Nick Becker <![CDATA[RAPIDS 24.12 Introduces cuDF on PyPI, CUDA Unified Memory for Polars, and Faster GNNs]]> http://www.open-lab.net/blog/?p=94415 2024-12-19T21:46:07Z 2024-12-19T21:21:42Z

RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up groupby aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the...

]]>

0 Ziyue Xu <![CDATA[Security for Data Privacy in Federated Learning with CUDA-Accelerated Homomorphic Encryption in XGBoost]]> http://www.open-lab.net/blog/?p=93870 2024-12-17T19:33:44Z 2024-12-18T21:30:00Z

XGBoost is a machine learning algorithm widely used for tabular data modeling. To expand the XGBoost model from single-site learning to multisite collaborative...

]]>

0 Miles Macklin <![CDATA[Introducing Tile-Based Programming in Warp 1.5.0]]> http://www.open-lab.net/blog/?p=94002 2025-03-11T23:13:10Z 2024-12-14T21:15:45Z

With the latest release of Warp 1.5.0, developers now have access to new tile-based programming primitives in Python. Leveraging cuBLASDx and cuFFTDx, these new...

]]>

0 Amr Elmeleegy <![CDATA[Spotlight: Perplexity AI Serves 400 Million Search Queries a Month Using NVIDIA Inference Stack]]> http://www.open-lab.net/blog/?p=93396 2025-03-18T18:26:38Z 2024-12-05T17:58:43Z

The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...

]]>

0 Ben Zaitlen https://www.linkedin.com/in/benjamin-zaitlen-62ab7b4/ <![CDATA[Best Practices for Multi-GPU Data Analysis Using RAPIDS with Dask]]> http://www.open-lab.net/blog/?p=92480 2024-12-12T19:38:40Z 2024-11-21T19:02:03Z

As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis...

]]>

0 Sungho Shin <![CDATA[NVIDIA cuDSS Library Removes Barriers to Optimizing the US Power Grid]]> http://www.open-lab.net/blog/?p=92065 2024-11-19T18:26:47Z 2024-11-19T17:00:00Z

In the wake of ever-growing power demands, power systems optimization (PSO) of power grids is crucial for ensuring efficient resource management,...

]]>

0 Alex McCaskey <![CDATA[Introducing NVIDIA CUDA-QX Libraries for Accelerated Quantum Supercomputing]]> http://www.open-lab.net/blog/?p=91929 2024-11-19T23:17:40Z 2024-11-18T18:30:00Z

Accelerated quantum supercomputing combines the benefits of AI supercomputing with quantum processing units (QPUs) to develop solutions to some of the world��s...

]]>

0 Szymon Karpi��ski <![CDATA[Fusing Epilog Operations with Matrix Multiplication Using nvmath-python]]> http://www.open-lab.net/blog/?p=92098 2025-04-01T18:19:57Z 2024-11-18T18:30:00Z

nvmath-python (Beta) is an open-source Python library, providing Python programmers with access to high-performance mathematical operations from NVIDIA CUDA-X...

]]>

1 Wonchan Lee <![CDATA[Effortlessly Scale NumPy from Laptops to Supercomputers with NVIDIA cuPyNumeric]]> http://www.open-lab.net/blog/?p=91682 2025-04-10T23:02:00Z 2024-11-18T17:00:00Z

Python is the most common programming language for data science, machine learning, and numerical computing. It continues to grow in popularity among scientists...

]]>

1 Kyle Tretina <![CDATA[Boost Alphafold2 Protein Structure Prediction with GPU-Accelerated MMseqs2]]> http://www.open-lab.net/blog/?p=91623 2024-11-14T17:10:35Z 2024-11-13T17:00:00Z

The ability to compare the sequences of multiple related proteins is a foundational task for many life science researchers. This is often done in the form of a...

]]>

0 Michelle Horton <![CDATA[AI That ��Hears�� Heart Disease May Help Vets Diagnose Dogs]]> http://www.open-lab.net/blog/?p=91619 2024-11-14T17:10:40Z 2024-11-12T15:49:17Z

A new machine-learning algorithm that listens to digital heartbeat data could help veterinarians diagnose murmurs and early-stage heart disease in dogs....

]]>

0 Michael Yh Wang <![CDATA[Bridging the CUDA C++ Ecosystem and Python Developers with Numbast]]> http://www.open-lab.net/blog/?p=90086 2024-10-31T16:26:15Z 2024-10-24T16:30:00Z

By enabling CUDA kernels to be written in Python similar to how they can be implemented within C++, Numba bridges the gap between the Python ecosystem and the...

]]>

0 Elias Wolfberg <![CDATA[AI Medical Imagery Model Offers Fast, Cost-Efficient Expert Analysis?]]> http://www.open-lab.net/blog/?p=90392 2025-01-07T20:23:08Z 2024-10-17T18:28:20Z

Researchers at UCLA have developed a new AI model that can expertly analyze 3D medical images of diseases in a fraction of the time it would otherwise take a...

]]>

0 Brad Nemire <![CDATA[Just Released: Updated Math Libraries in CUDA Toolkit 12.6.2]]> http://www.open-lab.net/blog/?p=90127 2024-10-17T18:19:05Z 2024-10-09T16:53:54Z

CUDA Toolkit 12.6.2 improves performance and provides new features in cuBLAS, cuSOLVER, and cuFFT LTO libraries.

]]>

0 Paul Logan <![CDATA[Accelerating Reality Capture Workflows with AI and NVIDIA RTX GPUs]]> http://www.open-lab.net/blog/?p=89719 2024-10-17T18:19:11Z 2024-10-07T23:03:48Z

Reality capture creates highly accurate, detailed, and immersive digital representations of environments. Innovations in site scanning and accelerated data...

]]>

0 Tanya Lenz <![CDATA[Webinar: Accelerating Python with GPUs]]> http://www.open-lab.net/blog/?p=89659 2024-10-17T19:07:02Z 2024-10-02T18:00:00Z

Join us on October 9 to learn how your applications can benefit from NVIDIA CUDA Python software initiatives.

]]>

0 Annamalai Chockalingam <![CDATA[Accelerating LLMs with llama.cpp on NVIDIA RTX Systems]]> http://www.open-lab.net/blog/?p=89663 2024-11-22T23:11:17Z 2024-10-02T13:00:00Z

The NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate...

]]>

0 Mark Wolf <![CDATA[Advancing Quantum Algorithm Design with GPTs]]> http://www.open-lab.net/blog/?p=89173 2024-10-17T19:07:09Z 2024-09-30T16:00:00Z

AI techniques like large language models (LLMs) are rapidly transforming many scientific disciplines. Quantum computing is no exception. A collaboration between...

]]>

0 Daniel Galvez <![CDATA[Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=89330 2024-10-17T19:07:17Z 2024-09-24T18:27:35Z

NVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging...

]]>

0 William Hill <![CDATA[Just Released: Torch-TensorRT v2.4.0]]> http://www.open-lab.net/blog/?p=89229 2024-09-19T17:50:49Z 2024-09-19T17:50:46Z

Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.

]]>

0 Richard Wang <![CDATA[Accelerating Oracle Database Generative AI Workloads with NVIDIA NIM and NVIDIA cuVS]]> http://www.open-lab.net/blog/?p=88963 2024-10-28T21:54:43Z 2024-09-17T19:04:16Z

The vast majority of the world's data remains untapped, and enterprises are looking to generate value from this data by creating the next wave of generative AI...

]]>

0 Michelle Horton <![CDATA[Advanced Strategies for High-Performance GPU Programming with NVIDIA CUDA]]> http://www.open-lab.net/blog/?p=88069 2024-09-19T19:31:59Z 2024-09-11T16:25:00Z

Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping...

]]>

1 Houston Hoffman <![CDATA[Constant Time Launch for Straight-Line CUDA Graphs and Other Performance Enhancements]]> http://www.open-lab.net/blog/?p=88631 2024-09-19T19:32:10Z 2024-09-11T16:00:00Z

CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...

]]>

1 Mohammad Almasri <![CDATA[Accelerating the HPCG Benchmark with NVIDIA Math Sparse Libraries]]> http://www.open-lab.net/blog/?p=88566 2024-09-19T19:32:22Z 2024-09-10T16:30:00Z

In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate...

]]>

0 Akhil Langer <![CDATA[Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0]]> http://www.open-lab.net/blog/?p=88550 2024-09-19T19:34:01Z 2024-09-06T20:30:09Z

NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...

]]>

0 Oscar Javier Aldana <![CDATA[Spotlight: clicOH Accelerates Last-Mile Delivery 20x with NVIDIA cuOpt]]> http://www.open-lab.net/blog/?p=88363 2024-09-05T17:57:11Z 2024-08-29T22:18:14Z

Driven by shifts in consumer behavior and the pandemic, e-commerce continues its explosive growth and transformation. As a result, logistics and transportation...

]]>

0 Michelle Horton <![CDATA[Boosting CUDA Efficiency with Essential Techniques for New Developers]]> http://www.open-lab.net/blog/?p=87823 2024-09-05T17:57:12Z 2024-08-29T17:00:00Z

To fully harness the capabilities of NVIDIA GPUs, optimizing NVIDIA CUDA performance is essential, particularly for developers new to GPU programming. This talk...

]]>

1 Rob Van der Wijngaart <![CDATA[Improving GPU Performance by Reducing Instruction Cache Misses]]> http://www.open-lab.net/blog/?p=86868 2025-01-22T17:57:59Z 2024-08-08T16:30:00Z

GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...

]]>

6 Alan Gray <![CDATA[Optimizing llama.cpp AI Inference with CUDA Graphs]]> http://www.open-lab.net/blog/?p=86845 2024-11-14T16:03:17Z 2024-08-07T20:00:00Z

The open-source llama.cpp code base was originally released in 2023 as a lightweight but efficient framework for performing inference on Meta Llama models....

]]>

0 Rob Armstrong <![CDATA[Just Released: CUDA Toolkit 12.6]]> http://www.open-lab.net/blog/?p=86675 2024-08-28T17:29:07Z 2024-08-01T20:00:00Z

The release supports GB100 capabilities and new library enhancements to cuBLAS, cuFFT, cuSOLVER, cuSPARSE, as well as the release of Nsight Compute 2024.3.

]]>

0 Rob Armstrong <![CDATA[NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules]]> http://www.open-lab.net/blog/?p=85331 2024-08-08T18:48:48Z 2024-07-17T16:40:27Z

With the R515 driver, NVIDIA released a set of Linux GPU kernel modules in May 2022 as open source with dual GPL and MIT licensing. The initial release targeted...

]]>

5 Vijay Thakkar <![CDATA[Next Generation of FlashAttention]]> http://www.open-lab.net/blog/?p=85219 2024-07-25T18:19:05Z 2024-07-11T17:46:06Z

NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and...

]]>

0 Robert Jensen <![CDATA[Just Released: nvmath-python]]> http://www.open-lab.net/blog/?p=84439 2024-07-25T18:19:11Z 2024-07-09T16:00:00Z

nvmath-python is an open-source Python library that provides high performance access to the core mathematical operations in the NVIDIA Math Libraries. Available...

]]>

0 Robert Jensen <![CDATA[Just Released: cuDSS 0.3.0]]> http://www.open-lab.net/blog/?p=84434 2024-07-25T18:19:14Z 2024-07-03T15:00:00Z

cuDSS (Preview) is an accelerated direct sparse solver. It now supports multi-GPU multi-node platforms, and introduces a hybrid memory mode.

]]>

0 Steven Gurfinkel <![CDATA[Checkpointing CUDA Applications with CRIU]]> http://www.open-lab.net/blog/?p=84236 2024-07-25T18:19:18Z 2024-07-02T16:00:00Z

Checkpoint and restore functionality for CUDA is exposed through a command-line utility called cuda-checkpoint. This utility can be used to transparently...

]]>

1 Jon Waxman <![CDATA[Runtime Fatbin Creation Using the NVIDIA CUDA Toolkit 12.4 Compiler]]> http://www.open-lab.net/blog/?p=83992 2024-06-27T18:17:56Z 2024-06-18T17:28:55Z

CUDA Toolkit 12.4 introduced a new nvFatbin library for creating fatbins at runtime. Fatbins, otherwise known as NVIDIA device code fat binaries, are containers...

]]>

1 Elena Agostini <![CDATA[Unlocking GPU-Accelerated RDMA with NVIDIA DOCA GPUNetIO]]> http://www.open-lab.net/blog/?p=83998 2024-06-27T23:59:16Z 2024-06-13T20:43:59Z

NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like...

]]>

4 Babak Hejazi <![CDATA[Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates]]> http://www.open-lab.net/blog/?p=83888 2024-07-16T17:19:07Z 2024-06-12T20:30:00Z

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...

]]>

0 Jen Witsoe <![CDATA[Just Released: Nsight Compute 2024.2]]> http://www.open-lab.net/blog/?p=82789 2024-08-28T17:29:39Z 2024-05-22T16:57:50Z

Nsight Compute 2024.2 adds Python syntax highlighting and call stacks, a redesigned report header, and source page statistics to make CUDA optimization easier.

]]>

0 Tanya Lenz <![CDATA[Just Released: CUDA Toolkit 12.5]]> http://www.open-lab.net/blog/?p=82840 2024-05-30T19:55:49Z 2024-05-21T20:29:43Z

CUDA Toolkit 12.5 supports new NVIDIA L20 and H20 GPUs and simultaneous compute and graphics to DirectX, and updates Nsight Compute and CUDA-X Libraries.

]]>

0 Jason Gaiser <![CDATA[Dynamic Control Flow in CUDA Graphs with Conditional Nodes]]> http://www.open-lab.net/blog/?p=81012 2025-02-03T22:25:21Z 2024-05-10T18:43:37Z

Post updated on February 3, 2025 with details about CUDA 12.8. CUDA Graphs can provide a significant performance increase, as the driver is able to optimize...

]]>

2 Tianna Nguy <![CDATA[NVIDIA GTC Training Labs On Demand Available Now]]> http://www.open-lab.net/blog/?p=82157 2024-05-07T17:47:17Z 2024-05-07T17:02:57Z

Missed GTC or want to replay your favorite training labs? Find it on demand with the NVIDIA GTC Training Labs playlist.

]]>

0 Rob Van der Wijngaart <![CDATA[Measuring the GPU Occupancy of Multi-stream Workloads]]> http://www.open-lab.net/blog/?p=81074 2025-01-03T00:33:09Z 2024-04-19T16:00:00Z

NVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM), the...

]]>

0 Paul Graham <![CDATA[Efficient CUDA Debugging: Using NVIDIA Compute Sanitizer with NVIDIA Tools Extension and Creating Custom Tools]]> http://www.open-lab.net/blog/?p=80383 2024-08-28T17:30:34Z 2024-03-27T20:29:15Z

NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications....

]]>

1 Robert Jensen <![CDATA[Building High-Performance Applications in the Era of Accelerated Computing]]> http://www.open-lab.net/blog/?p=80067 2024-08-28T17:32:20Z 2024-03-25T16:00:00Z

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...

]]>

0 Robert Jensen <![CDATA[Just Released: NVIDIA cuSPARSELt 0.6]]> http://www.open-lab.net/blog/?p=78683 2024-04-09T23:45:24Z 2024-03-14T16:00:00Z

NVIDIA cuSPARSELt harnesses Sparse Tensor Cores to accelerate general matrix multiplications. Version 0.6. adds support for the NVIDIA Hopper architecture.

]]>

0 Rob Armstrong <![CDATA[CUDA Toolkit 12.4 Enhances Support for NVIDIA Grace Hopper and Confidential Computing]]> http://www.open-lab.net/blog/?p=79119 2024-08-28T17:32:44Z 2024-03-06T19:55:00Z

The latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new...

]]>

0 Feiwen Zhu <![CDATA[Optimizing OpenFold Training for Drug Discovery]]> http://www.open-lab.net/blog/?p=78346 2024-03-07T19:18:52Z 2024-02-28T19:29:02Z

Predicting 3D protein structures from amino acid sequences has been an important long-standing question in bioinformatics. In recent years, deep...

]]>

0 Robert Jensen <![CDATA[Just Released: cuBLASDx]]> http://www.open-lab.net/blog/?p=76535 2024-01-25T18:17:35Z 2024-01-12T18:58:48Z

cuBLASDx allows you to perform BLAS calculations inside your CUDA kernel, improving the performance of your application. Available to download in Preview...

]]>

0 Rahul Ramasubramanian <![CDATA[Improving CUDA Initialization Times Using cgroups in Certain Scenarios]]> http://www.open-lab.net/blog/?p=75534 2024-01-11T19:49:33Z 2024-01-05T22:14:41Z

Many CUDA applications running on multi-GPU platforms usually use a single GPU for their compute needs. In such scenarios, a performance penalty is paid by...

]]>

0 Robert Jensen <![CDATA[Just Released: cuBLASMp]]> http://www.open-lab.net/blog/?p=75170 2023-12-20T18:06:09Z 2023-12-20T18:00:00Z

cuBLASMp is a high-performance, multi-process, GPU-accelerated library for distributed basic dense linear algebra. It is available to download in Preview now.

]]>

0 Efrat Shabtai <![CDATA[CUDA-Q 0.5 Delivers New Features for Quantum-Classical Computing]]> http://www.open-lab.net/blog/?p=74316 2024-05-07T19:29:27Z 2023-11-29T17:00:00Z

NVIDIA CUDA-Q is a platform for building quantum-classical computing applications. It is an open-source programming model for heterogeneous computing such as...

]]>

0 Alexey Panteleev <![CDATA[Unlocking GPU Intrinsics in HLSL]]> http://www.open-lab.net/blog/?p=72095 2023-11-30T19:43:27Z 2023-11-21T18:08:42Z

There are some useful intrinsic functions in the NVIDIA GPU instruction set that are not included in standard graphics APIs. Updated from the original 2016 post...

]]>

0 Asawaree Bhide <![CDATA[Boosting Custom ROS Graphs Using NVIDIA Isaac Transport for ROS]]> http://www.open-lab.net/blog/?p=74008 2023-11-30T19:43:29Z 2023-11-17T21:11:29Z

NVIDIA Isaac Transport for ROS (NITROS) is the implementation of two hardware-acceleration features introduced with ROS 2 Humble-type adaptation and type...

]]>

1 Graham Lopez <![CDATA[Unlock the Power of NVIDIA Grace and NVIDIA Hopper Architectures with Foundational HPC Software]]> http://www.open-lab.net/blog/?p=72977 2024-08-28T17:33:20Z 2023-11-16T19:07:51Z

High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern...

]]>

0 Balakumar Sundaralingam <![CDATA[CUDA-Accelerated Robot Motion Generation in Milliseconds with NVIDIA cuRobo]]> http://www.open-lab.net/blog/?p=72424 2023-12-05T19:04:45Z 2023-11-07T22:22:37Z

Real-time autonomous robot navigation powered by a fast motion-generation algorithm can enable applications in several industries such as food and services,...

]]>

0 Joseph Chandler <![CDATA[ICYMI: Leveraging the Power of GPUs with CuPy in Python]]> http://www.open-lab.net/blog/?p=72637 2023-11-16T19:16:47Z 2023-11-06T19:17:06Z

See how KDNuggets achieved 500x speedup using CuPy and NVIDIA CUDA on 3D arrays.

]]>

0 Rob Armstrong <![CDATA[CUDA Toolkit 12.3 Delivers New Features for Accelerated Computing]]> http://www.open-lab.net/blog/?p=71735 2024-08-28T17:33:55Z 2023-11-01T16:00:00Z

The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this...

]]>

0 Mozhgan Kabiri Chimeh <![CDATA[Efficient CUDA Debugging: Memory Initialization and Thread Synchronization with NVIDIA Compute Sanitizer]]> http://www.open-lab.net/blog/?p=71925 2024-03-21T22:25:40Z 2023-10-24T16:00:00Z

NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications.? In...

]]>

1 Sai Bangaru <![CDATA[Differentiable Slang: Example Applications]]> http://www.open-lab.net/blog/?p=72018 2023-11-02T20:23:30Z 2023-10-23T04:03:00Z

Differentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable...

]]>

0 Sai Bangaru <![CDATA[Differentiable Slang: A Shading Language for Renderers That Learn]]> http://www.open-lab.net/blog/?p=72011 2023-11-02T20:23:44Z 2023-10-23T04:02:00Z

NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...

]]>

0 Tanya Lenz <![CDATA[Just Released: NVIDIA HPC SDK 23.9]]> http://www.open-lab.net/blog/?p=71163 2023-11-02T18:14:44Z 2023-10-05T20:00:00Z

This NVIDIA HPC SDK 23.9 update expands platform support and provides minor updates.

]]>

0 Prabhu Ramamoorthy <![CDATA[NVIDIA H100 System for HPC and Generative AI Sets Record for Financial Risk Calculations]]> http://www.open-lab.net/blog/?p=71196 2024-08-28T17:35:19Z 2023-09-28T15:25:49Z

Generative AI is taking the world by storm, from large language models (LLMs) to generative pretrained transformer (GPT) models to diffusion models. NVIDIA is...

]]>

1 Jackson Marusarz <![CDATA[New Video Series: CUDA Developer Tools Tutorials]]> http://www.open-lab.net/blog/?p=71058 2024-08-28T17:35:38Z 2023-09-25T17:00:00Z

GPU acceleration is enabling faster and more intelligent applications than ever before, and the CUDA Toolkit is key to harnessing acceleration on NVIDIA GPUs....

]]>

0 Zachary Bourque <![CDATA[NVIDIA CUDA Toolkit Symbol Server]]> http://www.open-lab.net/blog/?p=70493 2023-09-21T17:56:27Z 2023-09-07T19:10:21Z

NVIDIA has already made available a GPU driver binary symbols server for Windows. Now, NVIDIA is making available a repository of CUDA Toolkit symbols for...

]]>

2 Robert Jensen <![CDATA[New Video Tutorial: Profiling and Debugging NVIDIA CUDA Applications]]> http://www.open-lab.net/blog/?p=70094 2024-08-28T17:36:19Z 2023-08-30T16:00:00Z

Episode 5 of the NVIDIA CUDA Tutorials Video series is out. Jackson Marusarz, product manager for Compute Developer Tools at NVIDIA, introduces a suite of tools...

]]>

0 John Hubbard <![CDATA[Simplifying GPU Application Development with Heterogeneous Memory Management]]> http://www.open-lab.net/blog/?p=69542 2023-09-13T17:07:34Z 2023-08-22T17:00:00Z

Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...

]]>

0 Michelle Horton <![CDATA[Ask Me Anything: NVIDIA CUDA Toolkit 12]]> http://www.open-lab.net/blog/?p=68440 2023-08-10T17:11:18Z 2023-07-25T18:21:38Z

On July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.?

]]>

0 Zohim Chandani <![CDATA[Programming the Quantum-Classical Supercomputer]]> http://www.open-lab.net/blog/?p=68044 2024-05-07T19:30:16Z 2023-07-19T16:00:00Z

Heterogeneous computing architectures��those that incorporate a variety of processor types working in tandem��have proven extremely valuable in the continued...

]]>

0 Joel Lashmore <![CDATA[GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks]]> http://www.open-lab.net/blog/?p=67503 2023-11-10T01:27:07Z 2023-07-17T18:08:30Z

We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on...

]]>

0 Guy Salton <![CDATA[Train Your AI Model Once and Deploy on Any Cloud with NVIDIA and Run:ai]]> http://www.open-lab.net/blog/?p=67035 2023-09-11T21:36:55Z 2023-07-07T16:38:25Z

Organizations are increasingly adopting hybrid and multi-cloud strategies to access the latest compute resources, consistently support worldwide customers, and...

]]>

2 Rob Armstrong <![CDATA[CUDA Toolkit 12.2 Unleashes Powerful Features for Boosting Applications]]> http://www.open-lab.net/blog/?p=67705 2024-08-28T17:39:00Z 2023-07-06T19:16:56Z

The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware...

]]>

0 Fred Oh <![CDATA[Event: CUDA 12.2 YouTube Premiere]]> http://www.open-lab.net/blog/?p=67504 2023-07-27T18:54:26Z 2023-07-03T19:00:00Z

Watch on-demand as experts deep dive into CUDA 12.2, including support for confidential computing.

]]>

0 Paul Graham <![CDATA[Efficient CUDA Debugging: How to Hunt Bugs with NVIDIA Compute Sanitizer]]> http://www.open-lab.net/blog/?p=66915 2024-03-21T22:32:29Z 2023-06-29T18:21:00Z

Debugging code is a crucial aspect of software development but can be both challenging and time-consuming. Parallel programming with thousands of threads can...

]]>

7 Ashraf Eassa <![CDATA[Breaking MLPerf Training Records with NVIDIA H100 GPUs]]> http://www.open-lab.net/blog/?p=66919 2023-07-13T19:00:28Z 2023-06-27T16:00:00Z

At the heart of the rapidly expanding set of AI-powered applications are powerful AI models. Before these models can be deployed, they must be trained through a...

]]>

0 Deepak Unnikrishnan <![CDATA[CUDA 12.1 Supports Large Kernel Parameters]]> http://www.open-lab.net/blog/?p=66058 2024-08-28T17:39:46Z 2023-06-05T17:00:00Z

CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...

]]>

4 Michael Balint <![CDATA[Harnessing the Power of NVIDIA AI Enterprise on Azure Machine Learning]]> http://www.open-lab.net/blog/?p=66016 2023-06-14T19:45:42Z 2023-06-02T18:08:43Z

AI is transforming industries, automating processes, and opening new opportunities for innovation in the rapidly evolving technological landscape. As more...

]]>

0 Tom Lubowe <![CDATA[QHack Results Highlight Quantum Computing Applications and Tools on GPUs]]> http://www.open-lab.net/blog/?p=64781 2024-05-07T19:30:32Z 2023-05-18T19:00:00Z

QHack is an educational conference and the world��s largest quantum machine learning (QML) hackathon. This year at QHack 2023, 2,850 individuals from 105...

]]>

0 Michelle Horton <![CDATA[Webinar: Performant Multiphase Flow Simulation at Leadership-Class Scale]]> http://www.open-lab.net/blog/?p=64907 2023-08-18T20:53:46Z 2023-05-17T22:10:15Z

On June 6, learn how researchers use OpenACC for GPU acceleration of multiphase and compressible flow solvers that obtain speedups at scale.

]]>

0 Joseph Cavanaugh <![CDATA[Advanced API Performance: CPUs]]> http://www.open-lab.net/blog/?p=64153 2023-10-02T05:00:51Z 2023-05-17T18:00:00Z

This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...

]]>

0 Jonathan Wong <![CDATA[Asynchronous Error Reporting: When printf Just Won��t Do]]> http://www.open-lab.net/blog/?p=60377 2023-06-12T07:56:12Z 2023-05-16T18:18:39Z

Some programming situations call for reporting ��soft�� errors asynchronously. While printf can be a useful tool, it can increase register use and impact...

]]>

0 Yury Uralsky <![CDATA[Advanced API Performance: Sampler Feedback]]> http://www.open-lab.net/blog/?p=62908 2023-10-02T05:02:21Z 2023-05-04T17:11:42Z

This post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...

]]>

0 Gene Pache <![CDATA[Microsoft and TempoQuest Accelerate Wind Energy Forecasts with AceCast]]> http://www.open-lab.net/blog/?p=64091 2023-06-09T20:29:27Z 2023-04-28T14:00:00Z

Accurate weather modeling is essential for companies to properly forecast renewable energy production and plan for natural disasters. Ineffective and...

]]>

1 Peter Entschev <![CDATA[Debugging a Mixed Python and C Language Stack]]> http://www.open-lab.net/blog/?p=63641 2023-06-09T22:28:19Z 2023-04-20T17:00:00Z

Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill...

]]>

0 ��˳��97caoporen��