CUDA – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-19T00:42:03Z http://www.open-lab.net/blog/feed/ Ben Williams <![CDATA[Networking Reliability and Observability at Scale with NCCL 2.24]]> http://www.open-lab.net/blog/?p=96731 2025-03-17T16:17:43Z 2025-03-13T16:30:00Z The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking....]]> The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking....

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVLink, or networking. It uses advanced topology detection, optimized communication graphs��

Source

]]>
0
Tony Scudiero <![CDATA[Understanding PTX, the Assembly Language of CUDA GPU Computing]]> http://www.open-lab.net/blog/?p=96891 2025-03-07T23:54:11Z 2025-03-12T18:00:00Z Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...]]> Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...

Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the assembly language of the NVIDIA CUDA GPU computing platform. In this post, we��ll explain what that means, what PTX is for, and what you need to know about it to make the most of CUDA for your applications. We��ll start by walking through��

Source

]]>
0
Nikhil Gupta <![CDATA[Optimizing Compile Times for CUDA C++]]> http://www.open-lab.net/blog/?p=96775 2025-03-10T18:02:30Z 2025-03-10T18:02:27Z In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on...]]> In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on...A person typing in front of several computer monitors.

In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on large-scale GPU-accelerated applications, optimizing compile times can significantly enhance productivity and streamline the entire development cycle. When using the compiler for offline compilation, efficient compilation times enable��

Source

]]>
0
Mark J. Bennett <![CDATA[GPU-Accelerate Algorithmic Trading Simulations by over 100x with Numba]]> http://www.open-lab.net/blog/?p=96652 2025-03-10T23:13:45Z 2025-03-04T21:44:01Z Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical...]]> Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical...Stock board

Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical techniques are important to visualize the possible outcomes of the algorithms in terms of the possible P&L paths. GPUs can greatly reduce the amount of time needed to do this. In the broader picture, mathematical modeling of financial��

Source

]]>
0
Anton Anders <![CDATA[NVIDIA cuDSS Advances Solver Technologies for Engineering and Scientific Computing]]> http://www.open-lab.net/blog/?p=96466 2025-03-06T19:26:49Z 2025-02-25T18:30:56Z NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in...]]> NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in...

NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in data centers and other environments and supports single-GPU, multi-GPU and multi-node (MGMN) configurations. cuDSS has become a key tool for accelerating computer-aided engineering (CAE) workflows and scientific computations across��

Source

]]>
0
Jesus Alvarez <![CDATA[NVIDIA Open GPU Datacenter Drivers for RHEL9 Signed by Red Hat]]> http://www.open-lab.net/blog/?p=95069 2025-02-20T15:55:25Z 2025-02-10T17:48:26Z NVIDIA and Red Hat have partnered to bring continued improvements to the precompiled NVIDIA Driver introduced in 2020. Last month, NVIDIA announced that the...]]> NVIDIA and Red Hat have partnered to bring continued improvements to the precompiled NVIDIA Driver introduced in 2020. Last month, NVIDIA announced that the...Decorative image of code with a 9 in highlights in the background.

NVIDIA and Red Hat have partnered to bring continued improvements to the precompiled NVIDIA Driver introduced in 2020. Last month, NVIDIA announced that the open GPU driver modules will become the default recommended way to enable NVIDIA graphics hardware. Today, NVIDIA announced that Red Hat is now compiling and signing the NVIDIA open GPU kernel modules to further streamline the usage for��

Source

]]>
1
Michelle Horton <![CDATA[AI Foundation Model Enhances Cancer Diagnosis and Tailors Treatment]]> http://www.open-lab.net/blog/?p=95722 2025-02-06T19:33:49Z 2025-02-04T17:16:54Z A new study and AI model from researchers at Stanford University is streamlining cancer diagnostics, treatment planning, and prognosis prediction. Named MUSK...]]> A new study and AI model from researchers at Stanford University is streamlining cancer diagnostics, treatment planning, and prognosis prediction. Named MUSK...An image of cancer cells up close.

A new study and AI model from researchers at Stanford University is streamlining cancer diagnostics, treatment planning, and prognosis prediction. Named MUSK (Multimodal transformer with Unified maSKed modeling), the research aims to advance precision oncology, tailoring treatment plans to each patient based on their unique medical data. ��Multimodal foundation models are a new frontier in��

Source

]]>
1
Matthew Nicely <![CDATA[Just Released: CUTLASS 3.8]]> http://www.open-lab.net/blog/?p=95716 2025-02-06T19:33:50Z 2025-02-03T23:54:16Z Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance...]]> Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance...

Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance GEMM computations.

Source

]]>
0
Sylvain Jeaugey <![CDATA[New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23]]> http://www.open-lab.net/blog/?p=95412 2025-02-06T19:33:51Z 2025-01-31T22:47:37Z The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...]]> The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVLink, or networking. It uses advanced topology detection, optimized communication graphs��

Source

]]>
0
Zachary Bourque <![CDATA[Dynamic Loading in the CUDA Runtime]]> http://www.open-lab.net/blog/?p=93958 2025-02-06T19:33:52Z 2025-01-31T20:03:32Z Historically, the GPU device code is compiled alongside the application with offline tools such as nvcc. In this case, the GPU device code is managed internally...]]> Historically, the GPU device code is compiled alongside the application with offline tools such as nvcc. In this case, the GPU device code is managed internally...

Historically, the GPU device code is compiled alongside the application with offline tools such as . In this case, the GPU device code is managed internally to the CUDA runtime. You can then launch kernels using and the CUDA runtime ensures that the invoked kernel is launched. However, in some cases, GPU device code needs to be dynamically compiled and loaded. This post shows a way to��

Source

]]>
0
Jonathan Bentz <![CDATA[CUDA Toolkit Now Available for NVIDIA Blackwell?]]> http://www.open-lab.net/blog/?p=95358 2025-02-06T19:33:53Z 2025-01-31T19:17:12Z The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...]]> The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...

The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and computer graphics and simulation, using the latest NVIDIA CPUs and GPUs. This post highlights some of the new features and enhancements included with this release: CUDA Toolkit 12.8 is the first version of the Toolkit to support��

Source

]]>
0
Annamalai Chockalingam <![CDATA[New AI SDKs and Tools Released for NVIDIA Blackwell GeForce RTX 50 Series GPUs]]> http://www.open-lab.net/blog/?p=95526 2025-02-06T19:33:57Z 2025-01-30T14:00:00Z NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the...]]> NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the...

NVIDIA recently announced a new generation of PC GPUs��the GeForce RTX 50 Series��alongside new AI-powered SDKs and tools for developers. Powered by the NVIDIA Blackwell architecture, fifth-generation Tensor Cores and fourth-generation RT Cores, the GeForce RTX 50 Series delivers breakthroughs in AI-driven rendering, including neural shaders, digital human technologies, geometry and lighting.

Source

]]>
0
Michelle Horton <![CDATA[Advancing Rare Disease Detection with AI-Powered Cellular Profiling]]> http://www.open-lab.net/blog/?p=95498 2025-02-06T19:33:59Z 2025-01-29T20:45:46Z Rare diseases are difficult to diagnose due to limitations in traditional genomic sequencing. Wolfgang Pernice, assistant professor at Columbia University, is...]]> Rare diseases are difficult to diagnose due to limitations in traditional genomic sequencing. Wolfgang Pernice, assistant professor at Columbia University, is...An illustration of DNA molecule structure.

Rare diseases are difficult to diagnose due to limitations in traditional genomic sequencing. Wolfgang Pernice, assistant professor at Columbia University, is using AI-powered cellular profiling to bridge these gaps and advance personalized medicine. At NVIDIA GTC 2024, Pernice shared insights from his lab��s work with diseases like Charcot-Marie-Tooth (CMT) and mitochondrial disorders.

Source

]]>
0
Fred Oh <![CDATA[Upcoming Event: CUDA Developer Meet Up in Silicon Valley]]> http://www.open-lab.net/blog/?p=95035 2025-01-23T19:54:25Z 2025-01-15T04:25:31Z Whether you're just starting your GPU programming journey or you're a CUDA ninja looking to share advanced techniques, join us in San Jose on 1/30/25.]]> Whether you're just starting your GPU programming journey or you're a CUDA ninja looking to share advanced techniques, join us in San Jose on 1/30/25.

Whether you��re just starting your GPU programming journey or you��re a CUDA ninja looking to share advanced techniques, join us in San Jose on 1/30/25.

Source

]]>
0
Nick Becker <![CDATA[RAPIDS 24.12 Introduces cuDF on PyPI, CUDA Unified Memory for Polars, and Faster GNNs]]> http://www.open-lab.net/blog/?p=94415 2024-12-19T21:46:07Z 2024-12-19T21:21:42Z RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up groupby aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the...]]> RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up groupby aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the...

RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the Polars GPU engine, and faster graph neural network (GNN) training on real-world graphs. Starting with the 24.12 release of RAPIDS, CUDA 12 builds of , , , and all of their dependencies are now available on PyPI. As a result��

Source

]]>
0
Ziyue Xu <![CDATA[Security for Data Privacy in Federated Learning with CUDA-Accelerated Homomorphic Encryption in XGBoost]]> http://www.open-lab.net/blog/?p=93870 2024-12-17T19:33:44Z 2024-12-18T21:30:00Z XGBoost is a machine learning algorithm widely used for tabular data modeling. To expand the XGBoost model from single-site learning to multisite collaborative...]]> XGBoost is a machine learning algorithm widely used for tabular data modeling. To expand the XGBoost model from single-site learning to multisite collaborative...

XGBoost is a machine learning algorithm widely used for tabular data modeling. To expand the XGBoost model from single-site learning to multisite collaborative training, NVIDIA has developed Federated XGBoost, an XGBoost plugin for federation learning. It covers vertical collaboration settings to jointly train XGBoost models across decentralized data sources, as well as horizontal histogram-based��

Source

]]>
0
Miles Macklin <![CDATA[Introducing Tile-Based Programming in Warp 1.5.0]]> http://www.open-lab.net/blog/?p=94002 2025-03-11T23:13:10Z 2024-12-14T21:15:45Z With the latest release of Warp 1.5.0, developers now have access to new tile-based programming primitives in Python. Leveraging cuBLASDx and cuFFTDx, these new...]]> With the latest release of Warp 1.5.0, developers now have access to new tile-based programming primitives in Python. Leveraging cuBLASDx and cuFFTDx, these new...

With the latest release of Warp 1.5.0, developers now have access to new tile-based programming primitives in Python. Leveraging cuBLASDx and cuFFTDx, these new tools provide developers with efficient matrix multiplication and Fourier transforms in Python kernels for accelerated simulation and scientific computing. In this blog post, we��ll introduce these new features and show how they can be used��

Source

]]>
0
Amr Elmeleegy <![CDATA[Spotlight: Perplexity AI Serves 400 Million Search Queries a Month Using NVIDIA Inference Stack]]> http://www.open-lab.net/blog/?p=93396 2025-03-18T18:26:38Z 2024-12-05T17:58:43Z The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...]]> The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with provisioning the necessary hardware and software to meet that demand while simultaneously balancing cost efficiency with optimal user experience. This challenge was faced by the��

Source

]]>
0
Ben Zaitlen https://www.linkedin.com/in/benjamin-zaitlen-62ab7b4/ <![CDATA[Best Practices for Multi-GPU Data Analysis Using RAPIDS with Dask]]> http://www.open-lab.net/blog/?p=92480 2024-12-12T19:38:40Z 2024-11-21T19:02:03Z As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis...]]> As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis...

As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis grows in popularity. We need tools and also best practices as developers and practitioners move from CPU to GPU clusters. RAPIDS is a suite of open-source GPU-accelerated data science and AI libraries. These libraries can easily scale-out for��

Source

]]>
0
Sungho Shin <![CDATA[NVIDIA cuDSS Library Removes Barriers to Optimizing the US Power Grid]]> http://www.open-lab.net/blog/?p=92065 2024-11-19T18:26:47Z 2024-11-19T17:00:00Z In the wake of ever-growing power demands, power systems optimization (PSO) of power grids is crucial for ensuring efficient resource management,...]]> In the wake of ever-growing power demands, power systems optimization (PSO) of power grids is crucial for ensuring efficient resource management,...Photo of a power line against city lights at twilight.

In the wake of ever-growing power demands, power systems optimization (PSO) of power grids is crucial for ensuring efficient resource management, sustainability, and energy security. The Eastern Interconnection, a major North American power grid, consists of approximately 70K nodes (Figure 1). Aside from sheer size, optimizing such a grid is complicated by uncertainties such as catastrophic��

Source

]]>
0
Alex McCaskey <![CDATA[Introducing NVIDIA CUDA-QX Libraries for Accelerated Quantum Supercomputing]]> http://www.open-lab.net/blog/?p=91929 2024-11-19T23:17:40Z 2024-11-18T18:30:00Z Accelerated quantum supercomputing combines the benefits of AI supercomputing with quantum processing units (QPUs) to develop solutions to some of the world��s...]]> Accelerated quantum supercomputing combines the benefits of AI supercomputing with quantum processing units (QPUs) to develop solutions to some of the world��s...Diagrammatic representation of the surface code an important quantum error correction code included in the CUDA-Q QEC library.

Accelerated quantum supercomputing combines the benefits of AI supercomputing with quantum processing units (QPUs) to develop solutions to some of the world��s hardest problems. Realizing such a device involves the seamless integration of one or more QPUs into a traditional CPU and GPU supercomputing architecture. An essential component of any accelerated quantum supercomputer is a programming��

Source

]]>
0
Szymon Karpi��ski <![CDATA[Fusing Epilog Operations with Matrix Multiplication Using nvmath-python]]> http://www.open-lab.net/blog/?p=92098 2024-11-21T21:07:24Z 2024-11-18T18:30:00Z nvmath-python (Beta) is an open-source Python library, providing Python programmers with access to high-performance mathematical operations from NVIDIA CUDA-X...]]> nvmath-python (Beta) is an open-source Python library, providing Python programmers with access to high-performance mathematical operations from NVIDIA CUDA-X...Code showing how to use epilogs with matrix multiplication in nvmath-python.

nvmath-python (Beta) is an open-source Python library, providing Python programmers with access to high-performance mathematical operations from NVIDIA CUDA-X math libraries. nvmath-python provides both low-level bindings to the underlying libraries and higher-level Pythonic abstractions. It is interoperable with existing Python packages, such as PyTorch and CuPy. In this post, I show how to��

Source

]]>
1
Wonchan Lee <![CDATA[Effortlessly Scale NumPy from Laptops to Supercomputers with NVIDIA cuPyNumeric]]> http://www.open-lab.net/blog/?p=91682 2025-02-25T19:37:52Z 2024-11-18T17:00:00Z Python is the most common programming language for data science, machine learning, and numerical computing. It continues to grow in popularity among scientists...]]> Python is the most common programming language for data science, machine learning, and numerical computing. It continues to grow in popularity among scientists...A photo of two GPU clusters and another picture of four scientific computing workflows demonstrating computational fluid dynamics.

Python is the most common programming language for data science, machine learning, and numerical computing. It continues to grow in popularity among scientists and researchers. In the Python ecosystem, NumPy is the foundational Python library for performing array-based numerical computations. NumPy��s standard implementation operates on a single CPU core, with only a limited set of operations��

Source

]]>
1
Kyle Tretina <![CDATA[Boost Alphafold2 Protein Structure Prediction with GPU-Accelerated MMseqs2]]> http://www.open-lab.net/blog/?p=91623 2024-11-14T17:10:35Z 2024-11-13T17:00:00Z The ability to compare the sequences of multiple related proteins is a foundational task for many life science researchers. This is often done in the form of a...]]> The ability to compare the sequences of multiple related proteins is a foundational task for many life science researchers. This is often done in the form of a...A protein structure illustration.

The ability to compare the sequences of multiple related proteins is a foundational task for many life science researchers. This is often done in the form of a multiple sequence alignment (MSA), and the evolutionary information retrieved from these alignments can yield insights into protein structure, function, and evolutionary history. Now, with MMseqs2-GPU, an updated GPU-accelerated��

Source

]]>
0
Michelle Horton <![CDATA[AI That ��Hears�� Heart Disease May Help Vets Diagnose Dogs]]> http://www.open-lab.net/blog/?p=91619 2024-11-14T17:10:40Z 2024-11-12T15:49:17Z A new machine-learning algorithm that listens to digital heartbeat data could help veterinarians diagnose murmurs and early-stage heart disease in dogs....]]> A new machine-learning algorithm that listens to digital heartbeat data could help veterinarians diagnose murmurs and early-stage heart disease in dogs....A Corgi at the vet.

A new machine-learning algorithm that listens to digital heartbeat data could help veterinarians diagnose murmurs and early-stage heart disease in dogs. Developed by a team of researchers from the University of Cambridge, the study analyzes electronic stethoscope recordings to grade murmur intensity and diagnose the stage of myxomatous mitral valve disease (MMVD)��the most common form of heart��

Source

]]>
0
Michael Yh Wang <![CDATA[Bridging the CUDA C++ Ecosystem and Python Developers with Numbast]]> http://www.open-lab.net/blog/?p=90086 2024-10-31T16:26:15Z 2024-10-24T16:30:00Z By enabling CUDA kernels to be written in Python similar to how they can be implemented within C++, Numba bridges the gap between the Python ecosystem and the...]]> By enabling CUDA kernels to be written in Python similar to how they can be implemented within C++, Numba bridges the gap between the Python ecosystem and the...

By enabling CUDA kernels to be written in Python similar to how they can be implemented within C++, Numba bridges the gap between the Python ecosystem and the performance of CUDA. However, CUDA C++ developers have access to many libraries that presently have no exposure in Python. These include the CUDA Core Compute Libraries (CCCL), cuRAND, and header-based implementations of numeric types��

Source

]]>
0
Elias Wolfberg <![CDATA[AI Medical Imagery Model Offers Fast, Cost-Efficient Expert Analysis?]]> http://www.open-lab.net/blog/?p=90392 2025-01-07T20:23:08Z 2024-10-17T18:28:20Z Researchers at UCLA have developed a new AI model that can expertly analyze 3D medical images of diseases in a fraction of the time it would otherwise take a...]]> Researchers at UCLA have developed a new AI model that can expertly analyze 3D medical images of diseases in a fraction of the time it would otherwise take a...An array on MRI images.

Researchers at UCLA have developed a new AI model that can expertly analyze 3D medical images of diseases in a fraction of the time it would otherwise take a human clinical specialist. The deep-learning framework, named SLIViT (SLice Integration by Vision Transformer), analyzes images from different imagery modalities, including retinal scans, ultrasound videos, CTs, MRIs, and others��

Source

]]>
0
Brad Nemire <![CDATA[Just Released: Updated Math Libraries in CUDA Toolkit 12.6.2]]> http://www.open-lab.net/blog/?p=90127 2024-10-17T18:19:05Z 2024-10-09T16:53:54Z CUDA Toolkit 12.6.2 improves performance and provides new features in cuBLAS, cuSOLVER, and cuFFT LTO libraries.]]> CUDA Toolkit 12.6.2 improves performance and provides new features in cuBLAS, cuSOLVER, and cuFFT LTO libraries.

CUDA Toolkit 12.6.2 improves performance and provides new features in cuBLAS, cuSOLVER, and cuFFT LTO libraries.

Source

]]>
0
Paul Logan <![CDATA[Accelerating Reality Capture Workflows with AI and NVIDIA RTX GPUs]]> http://www.open-lab.net/blog/?p=89719 2024-10-17T18:19:11Z 2024-10-07T23:03:48Z Reality capture creates highly accurate, detailed, and immersive digital representations of environments. Innovations in site scanning and accelerated data...]]> Reality capture creates highly accurate, detailed, and immersive digital representations of environments. Innovations in site scanning and accelerated data...

Reality capture creates highly accurate, detailed, and immersive digital representations of environments. Innovations in site scanning and accelerated data processing, and emerging technologies like neural radiance fields (NeRFs) and Gaussian splatting are significantly enhancing the capabilities of reality capture. These technologies are revolutionizing interactions with and analyses of the��

Source

]]>
0
Tanya Lenz <![CDATA[Webinar: Accelerating Python with GPUs]]> http://www.open-lab.net/blog/?p=89659 2024-10-17T19:07:02Z 2024-10-02T18:00:00Z Join us on October 9 to learn how your applications can benefit from NVIDIA CUDA Python software initiatives.]]> Join us on October 9 to learn how your applications can benefit from NVIDIA CUDA Python software initiatives.

Join us on October 9 to learn how your applications can benefit from NVIDIA CUDA Python software initiatives.

Source

]]>
0
Annamalai Chockalingam <![CDATA[Accelerating LLMs with llama.cpp on NVIDIA RTX Systems]]> http://www.open-lab.net/blog/?p=89663 2024-11-22T23:11:17Z 2024-10-02T13:00:00Z The NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate...]]> The NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate...

The NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate into Windows applications. Notably, llama.cpp is one popular tool, with over 65K GitHub stars at the time of writing. Originally released in 2023, this open-source repository is a lightweight, efficient framework for large language model��

Source

]]>
0
Mark Wolf <![CDATA[Advancing Quantum Algorithm Design with GPTs]]> http://www.open-lab.net/blog/?p=89173 2024-10-17T19:07:09Z 2024-09-30T16:00:00Z AI techniques like large language models (LLMs) are rapidly transforming many scientific disciplines. Quantum computing is no exception. A collaboration between...]]> AI techniques like large language models (LLMs) are rapidly transforming many scientific disciplines. Quantum computing is no exception. A collaboration between...Chart of algorithm workflow.

AI techniques like large language models (LLMs) are rapidly transforming many scientific disciplines. Quantum computing is no exception. A collaboration between NVIDIA, the University of Toronto, and Saint Jude Children��s Research Hospital is bringing generative pre-trained transformers (GPTs) to the design of new quantum algorithms, including the Generative Quantum Eigensolver (GQE) technique.

Source

]]>
0
Daniel Galvez <![CDATA[Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=89330 2024-10-17T19:07:17Z 2024-09-24T18:27:35Z NVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging...]]> NVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging...

NVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging Face Open ASR Leaderboard. These NVIDIA NeMo ASR models that transcribe speech into text offer a range of architectures designed to optimize both speed and accuracy: Previously, these models faced speed performance��

Source

]]>
0
William Hill <![CDATA[Just Released: Torch-TensorRT v2.4.0]]> http://www.open-lab.net/blog/?p=89229 2024-09-19T17:50:49Z 2024-09-19T17:50:46Z Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.]]> Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.

Includes C++ runtime support in Windows Support, Enhanced Dynamic Shape support in Converters, PyTorch 2.4, CUDA 12.4, TensorRT 10.1, Python 3.12.

Source

]]>
0
Richard Wang <![CDATA[Accelerating Oracle Database Generative AI Workloads with NVIDIA NIM and NVIDIA cuVS]]> http://www.open-lab.net/blog/?p=88963 2024-10-28T21:54:43Z 2024-09-17T19:04:16Z The vast majority of the world's data remains untapped, and enterprises are looking to generate value from this data by creating the next wave of generative AI...]]> The vast majority of the world's data remains untapped, and enterprises are looking to generate value from this data by creating the next wave of generative AI...

The vast majority of the world��s data remains untapped, and enterprises are looking to generate value from this data by creating the next wave of generative AI applications that will make a transformative business impact. Retrieval-augmented generation (RAG) pipelines are a key part of this, enabling users to have conversations with large corpuses of data and turning manuals, policy documents��

Source

]]>
0
Michelle Horton <![CDATA[Advanced Strategies for High-Performance GPU Programming with NVIDIA CUDA]]> http://www.open-lab.net/blog/?p=88069 2024-09-19T19:31:59Z 2024-09-11T16:25:00Z Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping...]]> Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping...An illustration representing CUDA.

Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping applications onto massively parallel machines. Going beyond the basics to explore the intricacies of GPU programming, he focuses on practical techniques such as parallel program design and specific details of GPU optimization for improving the��

Source

]]>
1
Houston Hoffman <![CDATA[Constant Time Launch for Straight-Line CUDA Graphs and Other Performance Enhancements]]> http://www.open-lab.net/blog/?p=88631 2024-09-19T19:32:10Z 2024-09-11T16:00:00Z CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...]]> CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...Decorative image of light fields in green, purple, and blue.

CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and other CUDA operations together and executes them with a specified dependency tree. It speeds up the workflow by combining the driver activities associated with CUDA kernel launches and CUDA API calls. It also enforces the dependencies with��

Source

]]>
0
Mohammad Almasri <![CDATA[Accelerating the HPCG Benchmark with NVIDIA Math Sparse Libraries]]> http://www.open-lab.net/blog/?p=88566 2024-09-19T19:32:22Z 2024-09-10T16:30:00Z In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate...]]> In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate...Decorative image of light fields in green, purple, and blue.

In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate Gradient (HPCG) benchmark program as part of the NVIDIA HPC benchmark program collection. We now provide the NVIDIA HPCG benchmark program in the /NVIDIA/nvidia-hpcg GitHub repo, using its high-performance math libraries, cuSPARSE��

Source

]]>
0
Akhil Langer <![CDATA[Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0]]> http://www.open-lab.net/blog/?p=88550 2024-09-19T19:34:01Z 2024-09-06T20:30:09Z NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...]]> NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...

NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on OpenSHMEM, NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams.

Source

]]>
0
Oscar Javier Aldana <![CDATA[Spotlight: clicOH Accelerates Last-Mile Delivery 20x with NVIDIA cuOpt]]> http://www.open-lab.net/blog/?p=88363 2024-09-05T17:57:11Z 2024-08-29T22:18:14Z Driven by shifts in consumer behavior and the pandemic, e-commerce continues its explosive growth and transformation. As a result, logistics and transportation...]]> Driven by shifts in consumer behavior and the pandemic, e-commerce continues its explosive growth and transformation. As a result, logistics and transportation...

Driven by shifts in consumer behavior and the pandemic, e-commerce continues its explosive growth and transformation. As a result, logistics and transportation firms find themselves at the forefront of a parcel delivery revolution. This new reality is especially evident in last-mile delivery, which is now the most expensive element of supply chain logistics. It represents more than 41%

Source

]]>
0
Michelle Horton <![CDATA[Boosting CUDA Efficiency with Essential Techniques for New Developers]]> http://www.open-lab.net/blog/?p=87823 2024-09-05T17:57:12Z 2024-08-29T17:00:00Z To fully harness the capabilities of NVIDIA GPUs, optimizing NVIDIA CUDA performance is essential, particularly for developers new to GPU programming. This talk...]]> To fully harness the capabilities of NVIDIA GPUs, optimizing NVIDIA CUDA performance is essential, particularly for developers new to GPU programming. This talk...An illustration representing CUDA.

To fully harness the capabilities of NVIDIA GPUs, optimizing NVIDIA CUDA performance is essential, particularly for developers new to GPU programming. This talk is specifically designed for those stepping into the world of CUDA, providing a solid foundation in GPU architecture principles and optimization techniques. Athena Elafrou, a developer technology engineer at NVIDIA��

Source

]]>
1
Rob Van der Wijngaart <![CDATA[Improving GPU Performance by Reducing Instruction Cache Misses]]> http://www.open-lab.net/blog/?p=86868 2025-01-22T17:57:59Z 2024-08-08T16:30:00Z GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...]]> GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...Decorative image of light fields in green, purple, and blue.

GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming multiprocessors (SMs), and an array of facilities to keep them fed with data: high bandwidth to memory, sizable data caches, and the capability to switch to other teams of workers (warps) without any overhead if an active team has run out of data.

Source

]]>
4
Alan Gray <![CDATA[Optimizing llama.cpp AI Inference with CUDA Graphs]]> http://www.open-lab.net/blog/?p=86845 2024-11-14T16:03:17Z 2024-08-07T20:00:00Z The open-source llama.cpp code base was originally released in 2023 as a lightweight but efficient framework for performing inference on Meta Llama models....]]> The open-source llama.cpp code base was originally released in 2023 as a lightweight but efficient framework for performing inference on Meta Llama models....

The open-source llama.cpp code base was originally released in 2023 as a lightweight but efficient framework for performing inference on Meta Llama models. Built on the GGML library released the previous year, llama.cpp quickly became attractive to many users and developers (particularly for use on personal workstations) due to its focus on C/C++ without the need for complex dependencies.

Source

]]>
0
Rob Armstrong <![CDATA[Just Released: CUDA Toolkit 12.6]]> http://www.open-lab.net/blog/?p=86675 2024-08-28T17:29:07Z 2024-08-01T20:00:00Z The release supports GB100 capabilities and new library enhancements to cuBLAS, cuFFT, cuSOLVER, cuSPARSE, as well as the release of Nsight Compute 2024.3.]]> The release supports GB100 capabilities and new library enhancements to cuBLAS, cuFFT, cuSOLVER, cuSPARSE, as well as the release of Nsight Compute 2024.3.

The release supports GB100 capabilities and new library enhancements to cuBLAS, cuFFT, cuSOLVER, cuSPARSE, as well as the release of Nsight Compute 2024.3.

Source

]]>
0
Rob Armstrong <![CDATA[NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules]]> http://www.open-lab.net/blog/?p=85331 2024-08-08T18:48:48Z 2024-07-17T16:40:27Z With the R515 driver, NVIDIA released a set of Linux GPU kernel modules in May 2022 as open source with dual GPL and MIT licensing. The initial release targeted...]]> With the R515 driver, NVIDIA released a set of Linux GPU kernel modules in May 2022 as open source with dual GPL and MIT licensing. The initial release targeted...Decorative image of light fields in green, purple, and blue.

With the R515 driver, NVIDIA released a set of Linux GPU kernel modules in May 2022 as open source with dual GPL and MIT licensing. The initial release targeted datacenter compute GPUs, with GeForce and Workstation GPUs in an alpha state. At the time, we announced that more robust and fully-featured GeForce and Workstation Linux support would follow in subsequent releases and the NVIDIA Open��

Source

]]>
5
Vijay Thakkar <![CDATA[Next Generation of FlashAttention]]> http://www.open-lab.net/blog/?p=85219 2024-07-25T18:19:05Z 2024-07-11T17:46:06Z NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and...]]> NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and...

NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and Tensor Cores and accelerate key Fused Attention kernels using CUTLASS 3. FlashAttention-3 incorporates key techniques to achieve 1.5�C2.0x faster performance than FlashAttention-2 with FP16, up to 740 TFLOPS. With FP8��

Source

]]>
0
Robert Jensen <![CDATA[Just Released: nvmath-python]]> http://www.open-lab.net/blog/?p=84439 2024-07-25T18:19:11Z 2024-07-09T16:00:00Z nvmath-python is an open-source Python library that provides high performance access to the core mathematical operations in the NVIDIA Math Libraries. Available...]]> nvmath-python is an open-source Python library that provides high performance access to the core mathematical operations in the NVIDIA Math Libraries. Available...

nvmath-python is an open-source Python library that provides high performance access to the core mathematical operations in the NVIDIA Math Libraries. Available now in beta.

Source

]]>
0
Robert Jensen <![CDATA[Just Released: cuDSS 0.3.0]]> http://www.open-lab.net/blog/?p=84434 2024-07-25T18:19:14Z 2024-07-03T15:00:00Z cuDSS (Preview) is an accelerated direct sparse solver. It now supports multi-GPU multi-node platforms, and introduces a hybrid memory mode.]]> cuDSS (Preview) is an accelerated direct sparse solver. It now supports multi-GPU multi-node platforms, and introduces a hybrid memory mode.

cuDSS (Preview) is an accelerated direct sparse solver. It now supports multi-GPU multi-node platforms, and introduces a hybrid memory mode.

Source

]]>
0
Steven Gurfinkel <![CDATA[Checkpointing CUDA Applications with CRIU]]> http://www.open-lab.net/blog/?p=84236 2024-07-25T18:19:18Z 2024-07-02T16:00:00Z Checkpoint and restore functionality for CUDA is exposed through a command-line utility called cuda-checkpoint. This utility can be used to transparently...]]> Checkpoint and restore functionality for CUDA is exposed through a command-line utility called cuda-checkpoint. This utility can be used to transparently...

Source

]]>
0
Jon Waxman <![CDATA[Runtime Fatbin Creation Using the NVIDIA CUDA Toolkit 12.4 Compiler]]> http://www.open-lab.net/blog/?p=83992 2024-06-27T18:17:56Z 2024-06-18T17:28:55Z CUDA Toolkit 12.4 introduced a new nvFatbin library for creating fatbins at runtime. Fatbins, otherwise known as NVIDIA device code fat binaries, are containers...]]> CUDA Toolkit 12.4 introduced a new nvFatbin library for creating fatbins at runtime. Fatbins, otherwise known as NVIDIA device code fat binaries, are containers...Decorative image of light fields in green, purple, and blue.

Source

]]>
1
Elena Agostini <![CDATA[Unlocking GPU-Accelerated RDMA with NVIDIA DOCA GPUNetIO]]> http://www.open-lab.net/blog/?p=83998 2024-06-27T23:59:16Z 2024-06-13T20:43:59Z NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like...]]> NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like...

NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like GPUDirect RDMA and GPUDirect Async to enable the creation of GPU-centric applications where a CUDA kernel can directly communicate with the network interface card (NIC) for sending and receiving packets, bypassing the CPU and excluding it��

Source

]]>
3
Babak Hejazi <![CDATA[Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates]]> http://www.open-lab.net/blog/?p=83888 2024-07-16T17:19:07Z 2024-06-12T20:30:00Z The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...]]> The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance computing (HPC) workloads. This post provides an overview of the following updates on cuBLAS matrix multiplications (matmuls) since version 12.0, and a walkthrough: Grouped GEMM APIs can be viewed as a generalization of the batched��

Source

]]>
0
Jen Witsoe <![CDATA[Just Released: Nsight Compute 2024.2]]> http://www.open-lab.net/blog/?p=82789 2024-08-28T17:29:39Z 2024-05-22T16:57:50Z Nsight Compute 2024.2 adds Python syntax highlighting and call stacks, a redesigned report header, and source page statistics to make CUDA optimization easier.]]> Nsight Compute 2024.2 adds Python syntax highlighting and call stacks, a redesigned report header, and source page statistics to make CUDA optimization easier.Nsight Compute screenshot.

Nsight Compute 2024.2 adds Python syntax highlighting and call stacks, a redesigned report header, and source page statistics to make CUDA optimization easier.

Source

]]>
0
Tanya Lenz <![CDATA[Just Released: CUDA Toolkit 12.5]]> http://www.open-lab.net/blog/?p=82840 2024-05-30T19:55:49Z 2024-05-21T20:29:43Z CUDA Toolkit 12.5 supports new NVIDIA L20 and H20 GPUs and simultaneous compute and graphics to DirectX, and updates Nsight Compute and CUDA-X Libraries.]]> CUDA Toolkit 12.5 supports new NVIDIA L20 and H20 GPUs and simultaneous compute and graphics to DirectX, and updates Nsight Compute and CUDA-X Libraries.

CUDA Toolkit 12.5 supports new NVIDIA L20 and H20 GPUs and simultaneous compute and graphics to DirectX, and updates Nsight Compute and CUDA-X Libraries.

Source

]]>
0
Jason Gaiser <![CDATA[Dynamic Control Flow in CUDA Graphs with Conditional Nodes]]> http://www.open-lab.net/blog/?p=81012 2025-02-03T22:25:21Z 2024-05-10T18:43:37Z Post updated on February 3, 2025 with details about CUDA 12.8. CUDA Graphs can provide a significant performance increase, as the driver is able to optimize...]]> Post updated on February 3, 2025 with details about CUDA 12.8. CUDA Graphs can provide a significant performance increase, as the driver is able to optimize...

Post updated on February 3, 2025 with details about CUDA 12.8. CUDA Graphs can provide a significant performance increase, as the driver is able to optimize execution using the complete description of tasks and dependencies. Graphs provide incredible benefits for static workflows where the overhead of graph creation can be amortized over many successive launches. However��

Source

]]>
2
Tianna Nguy <![CDATA[NVIDIA GTC Training Labs On Demand Available Now]]> http://www.open-lab.net/blog/?p=82157 2024-05-07T17:47:17Z 2024-05-07T17:02:57Z Missed GTC or want to replay your favorite training labs? Find it on demand with the NVIDIA GTC Training Labs playlist.]]> Missed GTC or want to replay your favorite training labs? Find it on demand with the NVIDIA GTC Training Labs playlist.nearly 100 training labs from GTC available on demand

Missed GTC or want to replay your favorite training labs? Find it on demand with the NVIDIA GTC Training Labs playlist.

Source

]]>
0
Rob Van der Wijngaart <![CDATA[Measuring the GPU Occupancy of Multi-stream Workloads]]> http://www.open-lab.net/blog/?p=81074 2025-01-03T00:33:09Z 2024-04-19T16:00:00Z NVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM), the...]]> NVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM), the...Image of Nsight Systems report.

NVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM), the workhorse of the GPU, can execute instructions faster and faster, and the memory system can deliver data to the SMs at an ever-increasing pace. At the same time, the number of SMs also typically increases with each generation��

Source

]]>
0
Paul Graham <![CDATA[Efficient CUDA Debugging: Using NVIDIA Compute Sanitizer with NVIDIA Tools Extension and Creating Custom Tools]]> http://www.open-lab.net/blog/?p=80383 2024-08-28T17:30:34Z 2024-03-27T20:29:15Z NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications....]]> NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications....Decorative image of bugs crawling over a computer chip.

Source

]]>
1
Robert Jensen <![CDATA[Building High-Performance Applications in the Era of Accelerated Computing]]> http://www.open-lab.net/blog/?p=80067 2024-08-28T17:32:20Z 2024-03-25T16:00:00Z AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...]]> AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...Illustration representing HPC.

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements of these new AI workloads, HPC is scaling up at a rapid pace. To enable applications to scale to multi-GPU and multi-node platforms, HPC tools and libraries must support that growth. NVIDIA provides a comprehensive ecosystem of��

Source

]]>
0
Robert Jensen <![CDATA[Just Released: NVIDIA cuSPARSELt 0.6]]> http://www.open-lab.net/blog/?p=78683 2024-04-09T23:45:24Z 2024-03-14T16:00:00Z NVIDIA cuSPARSELt harnesses Sparse Tensor Cores to accelerate general matrix multiplications. Version 0.6. adds support for the NVIDIA Hopper architecture.]]> NVIDIA cuSPARSELt harnesses Sparse Tensor Cores to accelerate general matrix multiplications. Version 0.6. adds support for the NVIDIA Hopper architecture.Decorative image of structured sparsity

NVIDIA cuSPARSELt harnesses Sparse Tensor Cores to accelerate general matrix multiplications. Version 0.6. adds support for the NVIDIA Hopper architecture.

Source

]]>
0
Rob Armstrong <![CDATA[CUDA Toolkit 12.4 Enhances Support for NVIDIA Grace Hopper and Confidential Computing]]> http://www.open-lab.net/blog/?p=79119 2024-08-28T17:32:44Z 2024-03-06T19:55:00Z The latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new...]]> The latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new...

The latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new features and enhancements included in this release: CUDA and the CUDA Toolkit software provide the foundation for all NVIDIA GPU-accelerated computing applications in data science and analytics, machine learning��

Source

]]>
0
Feiwen Zhu <![CDATA[Optimizing OpenFold Training for Drug Discovery]]> http://www.open-lab.net/blog/?p=78346 2024-03-07T19:18:52Z 2024-02-28T19:29:02Z Predicting 3D protein structures from amino acid sequences has been an important long-standing question in bioinformatics. In recent years, deep...]]> Predicting 3D protein structures from amino acid sequences has been an important long-standing question in bioinformatics. In recent years, deep...Decorative image of colorful protein structures.

Predicting 3D protein structures from amino acid sequences has been an important long-standing question in bioinformatics. In recent years, deep learning�Cbased computational methods have been emerging and have shown promising results. Among these lines of work, AlphaFold2 is the first method that has achieved results comparable to slower physics-based computational methods.

Source

]]>
0
Robert Jensen <![CDATA[Just Released: cuBLASDx]]> http://www.open-lab.net/blog/?p=76535 2024-01-25T18:17:35Z 2024-01-12T18:58:48Z cuBLASDx allows you to perform BLAS calculations inside your CUDA kernel, improving the performance of your application. Available to download in Preview...]]> cuBLASDx allows you to perform BLAS calculations inside your CUDA kernel, improving the performance of your application. Available to download in Preview...

cuBLASDx allows you to perform BLAS calculations inside your CUDA kernel, improving the performance of your application. Available to download in Preview now.

Source

]]>
0
Rahul Ramasubramanian <![CDATA[Improving CUDA Initialization Times Using cgroups in Certain Scenarios]]> http://www.open-lab.net/blog/?p=75534 2024-01-11T19:49:33Z 2024-01-05T22:14:41Z Many CUDA applications running on multi-GPU platforms usually use a single GPU for their compute needs. In such scenarios, a performance penalty is paid by...]]> Many CUDA applications running on multi-GPU platforms usually use a single GPU for their compute needs. In such scenarios, a performance penalty is paid by...Decorative image of light fields in green, purple, and blue.

Many CUDA applications running on multi-GPU platforms usually use a single GPU for their compute needs. In such scenarios, a performance penalty is paid by applications because CUDA has to enumerate/initialize all the GPUs on the system. If a CUDA application does not require other GPUs to be visible and accessible, you can launch such applications by isolating the unwanted GPUs from the CUDA��

Source

]]>
0
Robert Jensen <![CDATA[Just Released: cuBLASMp]]> http://www.open-lab.net/blog/?p=75170 2023-12-20T18:06:09Z 2023-12-20T18:00:00Z cuBLASMp is a high-performance, multi-process, GPU-accelerated library for distributed basic dense linear algebra. It is available to download in Preview now.]]> cuBLASMp is a high-performance, multi-process, GPU-accelerated library for distributed basic dense linear algebra. It is available to download in Preview now.Logo for cuBLAS

cuBLASMp is a high-performance, multi-process, GPU-accelerated library for distributed basic dense linear algebra. It is available to download in Preview now.

Source

]]>
0
Efrat Shabtai <![CDATA[CUDA-Q 0.5 Delivers New Features for Quantum-Classical Computing]]> http://www.open-lab.net/blog/?p=74316 2024-05-07T19:29:27Z 2023-11-29T17:00:00Z NVIDIA CUDA-Q is a platform for building quantum-classical computing applications. It is an open-source programming model for heterogeneous computing such as...]]> NVIDIA CUDA-Q is a platform for building quantum-classical computing applications. It is an open-source programming model for heterogeneous computing such as...Intricate image of a QPU.

NVIDIA CUDA-Q is a platform for building quantum-classical computing applications. It is an open-source programming model for heterogeneous computing such as quantum processor units (QPUs), GPUs, and CPUs. CUDA-Q accelerates workflows such as quantum simulation, quantum machine learning, quantum chemistry, and more. It optimizes these workflows as part of its compiler toolchain and uses the��

Source

]]>
0
Alexey Panteleev <![CDATA[Unlocking GPU Intrinsics in HLSL]]> http://www.open-lab.net/blog/?p=72095 2023-11-30T19:43:27Z 2023-11-21T18:08:42Z There are some useful intrinsic functions in the NVIDIA GPU instruction set that are not included in standard graphics APIs. Updated from the original 2016 post...]]> There are some useful intrinsic functions in the NVIDIA GPU instruction set that are not included in standard graphics APIs. Updated from the original 2016 post...Decorative image of blocks of light against a dark background.

There are some useful intrinsic functions in the NVIDIA GPU instruction set that are not included in standard graphics APIs. Updated from the original 2016 post to add information about new intrinsics and cross-vendor APIs in DirectX and Vulkan. For example, a shader can use warp shuffle instructions to exchange data between threads in a warp without going through shared memory��

Source

]]>
0
Asawaree Bhide <![CDATA[Boosting Custom ROS Graphs Using NVIDIA Isaac Transport for ROS]]> http://www.open-lab.net/blog/?p=74008 2023-11-30T19:43:29Z 2023-11-17T21:11:29Z NVIDIA Isaac Transport for ROS (NITROS) is the implementation of two hardware-acceleration features introduced with ROS 2 Humble-type adaptation and type...]]> NVIDIA Isaac Transport for ROS (NITROS) is the implementation of two hardware-acceleration features introduced with ROS 2 Humble-type adaptation and type...

NVIDIA Isaac Transport for ROS (NITROS) is the implementation of two hardware-acceleration features introduced with ROS 2 Humble-type adaptation and type negotiation. Type adaptation enables ROS nodes to work in a data format optimized for specific hardware accelerators. The adapted type is used by processing graphs to eliminate memory copies between the CPU and the memory accelerator.

Source

]]>
1
Graham Lopez <![CDATA[Unlock the Power of NVIDIA Grace and NVIDIA Hopper Architectures with Foundational HPC Software]]> http://www.open-lab.net/blog/?p=72977 2024-08-28T17:33:20Z 2023-11-16T19:07:51Z High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern...]]> High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern...An illustration representing HPC applications.

High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern data center, HPC synergizes with AI, harnessing data in transformative new ways. The performance and throughput demands of next-generation HPC applications call for an accelerated computing platform that can handle diverse workloads��

Source

]]>
0
Balakumar Sundaralingam <![CDATA[CUDA-Accelerated Robot Motion Generation in Milliseconds with NVIDIA cuRobo]]> http://www.open-lab.net/blog/?p=72424 2023-12-05T19:04:45Z 2023-11-07T22:22:37Z Real-time autonomous robot navigation powered by a fast motion-generation algorithm can enable applications in several industries such as food and services,...]]> Real-time autonomous robot navigation powered by a fast motion-generation algorithm can enable applications in several industries such as food and services,...

Real-time autonomous robot navigation powered by a fast motion-generation algorithm can enable applications in several industries such as food and services, warehouse automation, and machine tending. Motion generation for manipulators is extremely challenging, as it requires satisfying complex constraints and minimizing several cost terms. In addition, manipulators can have many��

Source

]]>
0
Joseph Chandler <![CDATA[ICYMI: Leveraging the Power of GPUs with CuPy in Python]]> http://www.open-lab.net/blog/?p=72637 2023-11-16T19:16:47Z 2023-11-06T19:17:06Z See how KDNuggets achieved 500x speedup using CuPy and NVIDIA CUDA on 3D arrays.]]> See how KDNuggets achieved 500x speedup using CuPy and NVIDIA CUDA on 3D arrays.Written text:

See how KDNuggets achieved 500x speedup using CuPy and NVIDIA CUDA on 3D arrays.

Source

]]>
0
Rob Armstrong <![CDATA[CUDA Toolkit 12.3 Delivers New Features for Accelerated Computing]]> http://www.open-lab.net/blog/?p=71735 2024-08-28T17:33:55Z 2023-11-01T16:00:00Z The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this...]]> The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this...

The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this release, version 12.3, include: CUDA and the CUDA Toolkit continue to provide the foundation for all accelerated computing applications in data science, machine learning and deep learning, generative AI with LLMs for both training and��

Source

]]>
0
Mozhgan Kabiri Chimeh <![CDATA[Efficient CUDA Debugging: Memory Initialization and Thread Synchronization with NVIDIA Compute Sanitizer]]> http://www.open-lab.net/blog/?p=71925 2024-03-21T22:25:40Z 2023-10-24T16:00:00Z NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications.? In...]]> NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications.? In...

Source

]]>
1
Sai Bangaru <![CDATA[Differentiable Slang: Example Applications]]> http://www.open-lab.net/blog/?p=72018 2023-11-02T20:23:30Z 2023-10-23T04:03:00Z Differentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable...]]> Differentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable...Decorative image of green transparent cube with tiered white lights inside.

Differentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable novel data-driven and neural research. In this post, we introduce several code examples using differentiable Slang to demonstrate the potential use across different rendering applications and the ease of integration. This is part of a series��

Source

]]>
0
Sai Bangaru <![CDATA[Differentiable Slang: A Shading Language for Renderers That Learn]]> http://www.open-lab.net/blog/?p=72011 2023-11-02T20:23:44Z 2023-10-23T04:02:00Z NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...]]> NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...

NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language can serve as a unified platform for real-time, inverse, and differentiable rendering. The work is a collaboration between MIT, UCSD, UW, and NVIDIA researchers. This is part of a series on Differentiable Slang. For more information about��

Source

]]>
0
Tanya Lenz <![CDATA[Just Released: NVIDIA HPC SDK 23.9]]> http://www.open-lab.net/blog/?p=71163 2023-11-02T18:14:44Z 2023-10-05T20:00:00Z This NVIDIA HPC SDK 23.9 update expands platform support and provides minor updates.]]> This NVIDIA HPC SDK 23.9 update expands platform support and provides minor updates.

This NVIDIA HPC SDK 23.9 update expands platform support and provides minor updates.

Source

]]>
0
Prabhu Ramamoorthy <![CDATA[NVIDIA H100 System for HPC and Generative AI Sets Record for Financial Risk Calculations]]> http://www.open-lab.net/blog/?p=71196 2024-08-28T17:35:19Z 2023-09-28T15:25:49Z Generative AI is taking the world by storm, from large language models (LLMs) to generative pretrained transformer (GPT) models to diffusion models. NVIDIA is...]]> Generative AI is taking the world by storm, from large language models (LLMs) to generative pretrained transformer (GPT) models to diffusion models. NVIDIA is...Image of GPU on black background with an artful spotlight.

Generative AI is taking the world by storm, from large language models (LLMs) to generative pretrained transformer (GPT) models to diffusion models. NVIDIA is uniquely positioned to accelerate generative AI workloads, but also those for data processing, analytics, high-performance computing (HPC), quantitative financial applications, and more. NVIDIA offers a one-stop solution for diverse workload��

Source

]]>
1
Jackson Marusarz <![CDATA[New Video Series: CUDA Developer Tools Tutorials]]> http://www.open-lab.net/blog/?p=71058 2024-08-28T17:35:38Z 2023-09-25T17:00:00Z GPU acceleration is enabling faster and more intelligent applications than ever before, and the CUDA Toolkit is key to harnessing acceleration on NVIDIA GPUs....]]> GPU acceleration is enabling faster and more intelligent applications than ever before, and the CUDA Toolkit is key to harnessing acceleration on NVIDIA GPUs....Class learning with laptops

GPU acceleration is enabling faster and more intelligent applications than ever before, and the CUDA Toolkit is key to harnessing acceleration on NVIDIA GPUs. But debugging, profiling, and optimizing CUDA can be a challenge, especially if you are unable to inspect hardware-level throughput and performance. To help you harness CUDA acceleration, NVIDIA offers Nsight Developer Tools.

Source

]]>
0
Zachary Bourque <![CDATA[NVIDIA CUDA Toolkit Symbol Server]]> http://www.open-lab.net/blog/?p=70493 2023-09-21T17:56:27Z 2023-09-07T19:10:21Z NVIDIA has already made available a GPU driver binary symbols server for Windows. Now, NVIDIA is making available a repository of CUDA Toolkit symbols for...]]> NVIDIA has already made available a GPU driver binary symbols server for Windows. Now, NVIDIA is making available a repository of CUDA Toolkit symbols for...Decorative image of two boxes with libcuda.sym labels.

NVIDIA has already made available a GPU driver binary symbols server for Windows. Now, NVIDIA is making available a repository of CUDA Toolkit symbols for Linux. NVIDIA is introducing CUDA Toolkit symbols for Linux for an application development enhancement. During application development, you can now download obfuscated symbols for NVIDIA libraries that are being debugged or profiled in��

Source

]]>
2
Robert Jensen <![CDATA[New Video Tutorial: Profiling and Debugging NVIDIA CUDA Applications]]> http://www.open-lab.net/blog/?p=70094 2024-08-28T17:36:19Z 2023-08-30T16:00:00Z Episode 5 of the NVIDIA CUDA Tutorials Video series is out. Jackson Marusarz, product manager for Compute Developer Tools at NVIDIA, introduces a suite of tools...]]> Episode 5 of the NVIDIA CUDA Tutorials Video series is out. Jackson Marusarz, product manager for Compute Developer Tools at NVIDIA, introduces a suite of tools...A woman working at a laptop.

Episode 5 of the NVIDIA CUDA Tutorials Video series is out. Jackson Marusarz, product manager for Compute Developer Tools at NVIDIA, introduces a suite of tools to help you build, debug, and optimize CUDA applications, making development easy and more efficient. This includes: IDEs and debuggers: integration with popular IDEs like NVIDIA Nsight Visual Studio Edition��

Source

]]>
0
John Hubbard <![CDATA[Simplifying GPU Application Development with Heterogeneous Memory Management]]> http://www.open-lab.net/blog/?p=69542 2023-09-13T17:07:34Z 2023-08-22T17:00:00Z Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...]]> Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...

Source

]]>
0
Michelle Horton <![CDATA[Ask Me Anything: NVIDIA CUDA Toolkit 12]]> http://www.open-lab.net/blog/?p=68440 2023-08-10T17:11:18Z 2023-07-25T18:21:38Z On July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.?]]> On July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.?CUDA Toolkit AMA promo card.

On July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.

Source

]]>
0
Zohim Chandani <![CDATA[Programming the Quantum-Classical Supercomputer]]> http://www.open-lab.net/blog/?p=68044 2024-05-07T19:30:16Z 2023-07-19T16:00:00Z Heterogeneous computing architectures��those that incorporate a variety of processor types working in tandem��have proven extremely valuable in the continued...]]> Heterogeneous computing architectures��those that incorporate a variety of processor types working in tandem��have proven extremely valuable in the continued...Illustration of a DGX GH200.

Heterogeneous computing architectures��those that incorporate a variety of processor types working in tandem��have proven extremely valuable in the continued scalability of computational workloads in AI, machine learning (ML), quantum physics, and general data science. Critical to this development has been the ability to abstract away the heterogeneous architecture and promote a framework that��

Source

]]>
0
Joel Lashmore <![CDATA[GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks]]> http://www.open-lab.net/blog/?p=67503 2023-11-10T01:27:07Z 2023-07-17T18:08:30Z We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on...]]> We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on...Stylized image of a computer chip.

We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on trillions of point-of-sale transaction records in a few hours. The results of this job would feed a series of downstream machine learning (ML) models that would make critical retail assortment allocation decisions for a global retailer.

Source

]]>
0
Guy Salton <![CDATA[Train Your AI Model Once and Deploy on Any Cloud with NVIDIA and Run:ai]]> http://www.open-lab.net/blog/?p=67035 2023-09-11T21:36:55Z 2023-07-07T16:38:25Z Organizations are increasingly adopting hybrid and multi-cloud strategies to access the latest compute resources, consistently support worldwide customers, and...]]> Organizations are increasingly adopting hybrid and multi-cloud strategies to access the latest compute resources, consistently support worldwide customers, and...

Organizations are increasingly adopting hybrid and multi-cloud strategies to access the latest compute resources, consistently support worldwide customers, and optimize cost. However, a major challenge that engineering teams face is operationalizing AI applications across different platforms as the stack changes. This requires MLOps teams to familiarize themselves with different environments and��

Source

]]>
2
Rob Armstrong <![CDATA[CUDA Toolkit 12.2 Unleashes Powerful Features for Boosting Applications]]> http://www.open-lab.net/blog/?p=67705 2024-08-28T17:39:00Z 2023-07-06T19:16:56Z The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware...]]> The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware...CUDA abstract image.

The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware capabilities accelerating CUDA applications. Now out through general availability from NVIDIA, CUDA Toolkit 12.2 includes many new capabilities, both major and minor. The following post offers an overview of many of the key��

Source

]]>
0
Fred Oh <![CDATA[Event: CUDA 12.2 YouTube Premiere]]> http://www.open-lab.net/blog/?p=67504 2023-07-27T18:54:26Z 2023-07-03T19:00:00Z Watch on-demand as experts deep dive into CUDA 12.2, including support for confidential computing.]]> Watch on-demand as experts deep dive into CUDA 12.2, including support for confidential computing.A screenshot of the YouTube page for the event.

Watch on-demand as experts deep dive into CUDA 12.2, including support for confidential computing.

Source

]]>
0
Paul Graham <![CDATA[Efficient CUDA Debugging: How to Hunt Bugs with NVIDIA Compute Sanitizer]]> http://www.open-lab.net/blog/?p=66915 2024-03-21T22:32:29Z 2023-06-29T18:21:00Z Debugging code is a crucial aspect of software development but can be both challenging and time-consuming. Parallel programming with thousands of threads can...]]> Debugging code is a crucial aspect of software development but can be both challenging and time-consuming. Parallel programming with thousands of threads can...Stylized image of a beetle on lines of code.

Source

]]>
7
Ashraf Eassa <![CDATA[Breaking MLPerf Training Records with NVIDIA H100 GPUs]]> http://www.open-lab.net/blog/?p=66919 2023-07-13T19:00:28Z 2023-06-27T16:00:00Z At the heart of the rapidly expanding set of AI-powered applications are powerful AI models. Before these models can be deployed, they must be trained through a...]]> At the heart of the rapidly expanding set of AI-powered applications are powerful AI models. Before these models can be deployed, they must be trained through a...Data center

At the heart of the rapidly expanding set of AI-powered applications are powerful AI models. Before these models can be deployed, they must be trained through a process that requires an immense amount of AI computing power. AI training is also an ongoing process, with models constantly retrained with new data to ensure high-quality results. Faster model training means that AI-powered applications��

Source

]]>
0
Deepak Unnikrishnan <![CDATA[CUDA 12.1 Supports Large Kernel Parameters]]> http://www.open-lab.net/blog/?p=66058 2024-08-28T17:39:46Z 2023-06-05T17:00:00Z CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...]]> CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...Abstract image

Source

]]>
4
Michael Balint <![CDATA[Harnessing the Power of NVIDIA AI Enterprise on Azure Machine Learning]]> http://www.open-lab.net/blog/?p=66016 2023-06-14T19:45:42Z 2023-06-02T18:08:43Z AI is transforming industries, automating processes, and opening new opportunities for innovation in the rapidly evolving technological landscape. As more...]]> AI is transforming industries, automating processes, and opening new opportunities for innovation in the rapidly evolving technological landscape. As more...

AI is transforming industries, automating processes, and opening new opportunities for innovation in the rapidly evolving technological landscape. As more businesses recognize the value of incorporating AI into their operations, they face the challenge of implementing these technologies efficiently, effectively, and reliably. Enter NVIDIA AI Enterprise, a comprehensive software suite��

Source

]]>
0
Tom Lubowe <![CDATA[QHack Results Highlight Quantum Computing Applications and Tools on GPUs]]> http://www.open-lab.net/blog/?p=64781 2024-05-07T19:30:32Z 2023-05-18T19:00:00Z QHack is an educational conference and the world��s largest quantum machine learning (QML) hackathon. This year at QHack 2023, 2,850 individuals from 105...]]> QHack is an educational conference and the world��s largest quantum machine learning (QML) hackathon. This year at QHack 2023, 2,850 individuals from 105...cuQuantum graphic

QHack is an educational conference and the world��s largest quantum machine learning (QML) hackathon. This year at QHack 2023, 2,850 individuals from 105 different countries competed for 8 days to build the most innovative solutions for quantum computing applications using NVIDIA quantum technology. The event was organized by Xanadu, with NVIDIA sponsoring the QHack 2023 NVIDIA Challenge.

Source

]]>
0
Michelle Horton <![CDATA[Webinar: Performant Multiphase Flow Simulation at Leadership-Class Scale]]> http://www.open-lab.net/blog/?p=64907 2023-08-18T20:53:46Z 2023-05-17T22:10:15Z On June 6, learn how researchers use OpenACC for GPU acceleration of multiphase and compressible flow solvers that obtain speedups at scale.]]> On June 6, learn how researchers use OpenACC for GPU acceleration of multiphase and compressible flow solvers that obtain speedups at scale.An abstract visualization of droplets.

On June 6, learn how researchers use OpenACC for GPU acceleration of multiphase and compressible flow solvers that obtain speedups at scale.

Source

]]>
0
Joseph Cavanaugh <![CDATA[Advanced API Performance: CPUs]]> http://www.open-lab.net/blog/?p=64153 2023-10-02T05:00:51Z 2023-05-17T18:00:00Z This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]> This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...A graphic of a computer sending code to multiple stacks.

This post covers CPU best practices when working with NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. To get the best performance from your NVIDIA GPU, pair it with efficient work delegation on the CPU. Frame-rate caps, stutter, and other subpar application performance events can often be traced back to a bottleneck on the CPU.

Source

]]>
0
Jonathan Wong <![CDATA[Asynchronous Error Reporting: When printf Just Won��t Do]]> http://www.open-lab.net/blog/?p=60377 2023-06-12T07:56:12Z 2023-05-16T18:18:39Z Some programming situations call for reporting ��soft�� errors asynchronously. While printf can be a useful tool, it can increase register use and impact...]]> Some programming situations call for reporting ��soft�� errors asynchronously. While printf can be a useful tool, it can increase register use and impact...Picture of a 25 mph speed limit sign and actual speed of 27 mph displayed.

Source

]]>
0
Yury Uralsky <![CDATA[Advanced API Performance: Sampler Feedback]]> http://www.open-lab.net/blog/?p=62908 2023-10-02T05:02:21Z 2023-05-04T17:11:42Z This post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...]]> This post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...A graphic of a computer sending code to multiple stacks.

This post covers best practices for using sampler feedback on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Sampler feedback is a DirectX 12 Ultimate feature for capturing and recording texture sampling information and locations. Sampler feedback was designed to provide better support for streaming and texture-space shading.

Source

]]>
0
Gene Pache <![CDATA[Microsoft and TempoQuest Accelerate Wind Energy Forecasts with AceCast]]> http://www.open-lab.net/blog/?p=64091 2023-06-09T20:29:27Z 2023-04-28T14:00:00Z Accurate weather modeling is essential for companies to properly forecast renewable energy production and plan for natural disasters. Ineffective and...]]> Accurate weather modeling is essential for companies to properly forecast renewable energy production and plan for natural disasters. Ineffective and...

Accurate weather modeling is essential for companies to properly forecast renewable energy production and plan for natural disasters. Ineffective and non-forecasted weather cost an estimated $714 billion in 2022 alone. To avoid this, companies need faster, cheaper, and more accurate weather models. In a recent GTC session, Microsoft, and TempoQuest detailed their work with NVIDIA to address��

Source

]]>
1
Peter Entschev <![CDATA[Debugging a Mixed Python and C Language Stack]]> http://www.open-lab.net/blog/?p=63641 2023-06-09T22:28:19Z 2023-04-20T17:00:00Z Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill...]]> Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill...

Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill sets and expertise to reveal the underlying problem. Yet projects often require using multiple languages, to ensure high performance where necessary, a user-friendly experience, and compatibility where possible. Unfortunately��

Source

]]>
0
Alan Gray <![CDATA[A Guide to CUDA Graphs in GROMACS 2023]]> http://www.open-lab.net/blog/?p=63250 2023-06-09T22:31:08Z 2023-04-14T18:10:14Z GPUs continue to get faster with each new generation, and it is often the case that each activity on the GPU (such as a kernel or memory copy) completes very...]]> GPUs continue to get faster with each new generation, and it is often the case that each activity on the GPU (such as a kernel or memory copy) completes very...Biomolecule

GPUs continue to get faster with each new generation, and it is often the case that each activity on the GPU (such as a kernel or memory copy) completes very quickly. In the past, each activity had to be separately scheduled (launched) by the CPU, and associated overheads could accumulate to become a performance bottleneck. The CUDA Graphs facility addresses this problem by enabling multiple GPU��

Source

]]>
1
Shashank Gaur <![CDATA[Topic Modeling and Image Classification with Dataiku and NVIDIA Data Science]]> http://www.open-lab.net/blog/?p=62857 2023-11-03T07:15:04Z 2023-04-04T18:30:00Z The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...]]> The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...Twitter topic model Dataiku diagram

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language processing (NLP). Dataiku helps you with labeling, model training, explainability, model deployment, and centralized management of code and code environments. This post dives into high-level Dataiku and NVIDIA integrations for image��

Source

]]>
0
���˳���97caoporen����