Nikolay Sakharnykh – NVIDIA Technical Blog

Nikolay Sakharnykh – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-07-30T22:16:24Z http://www.open-lab.net/blog/feed/ Nikolay Sakharnykh <![CDATA[Simplifying GPU Application Development with Heterogeneous Memory Management]]> http://www.open-lab.net/blog/?p=69542 2023-09-13T17:07:34Z 2023-08-22T17:00:00Z

Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...]]>

]]> 0 Nikolay Sakharnykh <![CDATA[Maximizing Performance with Massively Parallel Hash Maps on GPUs]]> http://www.open-lab.net/blog/?p=61480 2023-05-23T23:50:12Z 2023-03-06T17:30:00Z

Decades of computer science history have been devoted to devising solutions for efficient storage and retrieval of information. Hash maps (or hash tables) are a...]]>

Decades of computer science history have been devoted to devising solutions for efficient storage and retrieval of information. Hash maps (or hash tables) are a popular data structure for information storage given their amortized, constant-time guarantees for the insertion and retrieval of elements. However, despite their prevalence, hash maps are seldom discussed in the context of GPU…

]]> 1 Nikolay Sakharnykh <![CDATA[Accelerating Lossless GPU Compression with New Flexible Interfaces in NVIDIA nvCOMP]]> http://www.open-lab.net/blog/?p=45097 2023-06-12T21:00:29Z 2022-03-18T22:26:59Z

Compression can improve performance in a variety of use cases such as DL workloads, databases, and general HPC. On the GPU, compression can accelerate inter-GPU...]]>

Compression can improve performance in a variety of use cases such as DL workloads, databases, and general HPC. On the GPU, compression can accelerate inter-GPU communications for collaborative workflows. It can increase the size of datasets that a single GPU can handle by compressing data before it’s stored to global memory. It can also accelerate the data link between the CPU and GPU.

]]> 0 Nikolay Sakharnykh <![CDATA[Improving GPU Memory Oversubscription Performance]]> http://www.open-lab.net/blog/?p=37205 2022-08-21T23:52:39Z 2021-10-05T23:29:05Z

Since its introduction more than 7 years ago, the CUDA Unified Memory programming model has kept gaining popularity among developers. Unified Memory provides a...]]>

Since its introduction more than 7 years ago, the CUDA Unified Memory programming model has kept gaining popularity among developers. Unified Memory provides a simple interface for prototyping GPU applications without manually migrating memory between host and device. Starting from the NVIDIA Pascal GPU architecture, Unified Memory enabled applications to use all available CPU and GPU memory…

]]> 4 Nikolay Sakharnykh <![CDATA[Optimizing Data Transfer Using Lossless Compression with NVIDIA nvcomp]]> http://www.open-lab.net/blog/?p=22884 2022-08-21T23:40:51Z 2020-12-18T18:05:00Z

One of the most interesting applications of compression is optimizing communications in GPU applications. GPUs are getting faster every year. For some apps,...]]>

]]> 5 Nikolay Sakharnykh <![CDATA[Introducing Low-Level GPU Virtual Memory Management]]> http://www.open-lab.net/blog/?p=16913 2024-07-30T22:16:24Z 2020-04-15T22:00:00Z

There is a growing need among CUDA applications to manage memory as quickly and as efficiently as possible. Before CUDA 10.2, the number of options available to...]]>

]]> 59 Nikolay Sakharnykh <![CDATA[Maximizing Unified Memory Performance in CUDA]]> http://www.open-lab.net/blog/parallelforall/?p=8603 2022-08-21T23:38:33Z 2017-11-20T03:37:53Z

Many of today's applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the...]]>

Many of today’s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible. This is especially important for applications that iterate over the same data multiple times or have a high flops/byte ratio. Many real-world codes have to…

]]> 18 Nikolay Sakharnykh <![CDATA[Beyond GPU Memory Limits with Unified Memory on Pascal]]> http://www.open-lab.net/blog/parallelforall/?p=7233 2022-08-21T23:37:59Z 2016-12-14T10:31:50Z

[caption id="attachment_7428" align="alignright" width="300"] Figure 1: Dimethyl ether jet simulations designed to study complex new fuels. Image courtesy of...]]>

Modern computer architectures have a hierarchy of memories of varying size and performance. GPU architectures are approaching a terabyte per second memory bandwidth that, coupled with high-throughput computational cores, creates an ideal device for data-intensive tasks. However, everybody knows that fast memory is expensive. Modern applications striving to solve larger and larger problems can be…

]]> 15 Nikolay Sakharnykh <![CDATA[High-Performance Geometric Multi-Grid with GPU Acceleration]]> http://www.open-lab.net/blog/parallelforall/?p=6313 2023-02-10T22:34:08Z 2016-02-23T10:11:05Z

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an...]]>

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an equation: direct and iterative. Direct methods are usually robust, but have additional computational complexity and memory capacity requirements. Unlike direct solvers, iterative solvers require minimal memory overhead and feature better…

]]> 5 Nikolay Sakharnykh <![CDATA[Combine OpenACC and Unified Memory for Productivity and Performance]]> http://www.open-lab.net/blog/parallelforall/?p=5830 2022-08-21T23:37:37Z 2015-09-17T04:53:49Z

The post Getting Started with OpenACC?covered four steps to progressively accelerate your code with OpenACC. It's often necessary to use OpenACC directives to...]]>

The post Getting Started with OpenACC covered four steps to progressively accelerate your code with OpenACC. It’s often necessary to use OpenACC directives to express both loop parallelism and data locality in order to get good performance with accelerators. After expressing available parallelism, excessive data movement generated by the compiler can be a bottleneck, and correcting this by adding…

]]> 0 Nikolay Sakharnykh <![CDATA[GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell]]> http://www.open-lab.net/blog/parallelforall/?p=4175 2022-08-21T23:37:29Z 2015-03-17T16:34:16Z

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical...]]>

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical representation of the data distribution across predefined bins. The input data set and the number of bins can vary greatly depending on the domain, so let’s focus on one of the most common use cases: an image histogram using 256 bins for each…

]]> 10 ��˳��97caoporen��