Beginner – NVIDIA Technical Blog

Beginner – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Mark Harris <![CDATA[An Even Easier Introduction to CUDA (Updated)]]> http://www.open-lab.net/blog/parallelforall/?p=7501 2025-05-19T16:20:29Z 2025-05-02T17:31:00Z

Note: This blog post was originally published on Jan 25, 2017, but has been edited to reflect new updates. This post is a super simple introduction to CUDA, the...]]>

Note: This blog post was originally published on Jan 25, 2017, but has been edited to reflect new updates. This post is a super simple introduction to CUDA, the...

CUDA Blog Image 1000x600

]]> 141 Mark Harris <![CDATA[Unified Memory for CUDA Beginners]]> http://www.open-lab.net/blog/parallelforall/?p=7937 2022-08-21T23:38:11Z 2017-06-20T03:59:57Z

My previous introductory post, "An Even Easier Introduction to CUDA C++", introduced the basics of CUDA programming by showing how to write a simple program...]]>

My previous introductory post, "An Even Easier Introduction to CUDA C++", introduced the basics of CUDA programming by showing how to write a simple program...

CUDA_Cube_1K

]]> 46 Mark Harris <![CDATA[How to Access Global Memory Efficiently in CUDA C/C++ Kernels]]> http://www.parallelforall.com/?p=926 2022-08-21T23:36:49Z 2013-01-08T07:13:44Z

In the previous two posts we looked at how to move data efficiently between the host and device. In this sixth post of our CUDA C/C++ series we discuss how to...]]>

In the previous two posts we looked at how to move data efficiently between the host and device. In this sixth post of our CUDA C/C++ series we discuss how to...

CUDA_Cube_1K

In the previous two posts we looked at how to move data efficiently between the host and device. In this sixth post of our CUDA C/C++ series we discuss how to efficiently access device memory, in particular global memory, from within kernels. There are several kinds of memory on a CUDA device, each with different scope, lifetime, and caching behavior. So far in this series we have used global��

]]> 7 Greg Ruetsch <![CDATA[How to Access Global Memory Efficiently in CUDA Fortran Kernels]]> http://www.parallelforall.com/?p=521 2022-08-21T23:36:48Z 2013-01-04T02:16:42Z

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...]]>

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

cuda_fortran_simple

In the previous two posts we looked at how to move data efficiently between the host and device. In this sixth post of our CUDA Fortran series we discuss how to efficiently access device memory, in particular global memory, from within kernels. There are several kinds of memory on a CUDA device, each with different scope, lifetime, and caching behavior. So far in this series we have used global��

]]> 0 Mark Harris <![CDATA[How to Optimize Data Transfers in CUDA C/C++]]> http://www.parallelforall.com/?p=805 2022-08-21T23:36:49Z 2012-12-05T01:20:31Z

In the previous three posts of this CUDA C & C++ series we laid the groundwork for the major thrust of the series: how to optimize CUDA C/C++ code. In this...]]>

In the previous three posts of this CUDA C & C++ series we laid the groundwork for the major thrust of the series: how to optimize CUDA C/C++ code. In this...

CUDA_Cube_1K

In the previous three posts of this CUDA C & C++ series we laid the groundwork for the major thrust of the series: how to optimize CUDA C/C++ code. In this and the following post we begin our discussion of code optimization with how to efficiently transfer data between the host and device. The peak bandwidth between the device memory and the GPU is much higher (144 GB/s on the NVIDIA Tesla C2050��

]]> 12 Greg Ruetsch <![CDATA[How to Optimize Data Transfers in CUDA Fortran]]> http://test.markmark.net/?p=434 2022-08-21T23:36:47Z 2012-11-29T18:08:36Z

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...]]>

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

cuda_fortran_simple

In the previous three posts of this CUDA Fortran series we laid the groundwork for the major thrust of the series: how to optimize CUDA Fortran code. In this and the following post we begin our discussion of code optimization with how to efficiently transfer data between the host and device. The peak bandwidth between the device memory and the GPU is much higher (144 GB/s on the NVIDIA Tesla C2050��

]]> 2 Mark Harris <![CDATA[How to Implement Performance Metrics in CUDA C/C++]]> http://test.markmark.net/?p=390 2023-05-22T22:52:22Z 2012-11-08T04:03:28Z

In the first post of this series we looked at the basic elements of CUDA C/C++ by examining a CUDA C/C++ implementation of SAXPY. In this second post we discuss...]]>

In the first post of this series we looked at the basic elements of CUDA C/C++ by examining a CUDA C/C++ implementation of SAXPY. In this second post we discuss...

CUDA_Cube_1K

In the first post of this series we looked at the basic elements of CUDA C/C++ by examining a CUDA C/C++ implementation of SAXPY. In this second post we discuss how to analyze the performance of this and other CUDA C/C++ codes. We will rely on these performance measurement techniques in future posts where performance optimization will be increasingly important. CUDA performance measurement is��

]]> 20 Greg Ruetsch <![CDATA[How to Implement Performance Metrics in CUDA Fortran]]> http://test.markmark.net/?p=288 2022-08-21T23:36:47Z 2012-11-05T18:41:03Z

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...]]>

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

cuda_fortran_simple

In the first post of this series we looked at the basic elements of CUDA Fortran by examining a CUDA Fortran implementation of SAXPY. In this second post we discuss how to analyze the performance of this and other CUDA Fortran codes. We will rely on these performance measurement techniques in future posts where performance optimization will be increasingly important.

]]> 4 Greg Ruetsch <![CDATA[An Easy Introduction to CUDA Fortran]]> http://test.markmark.net/?p=260 2022-08-21T23:36:47Z 2012-10-30T05:07:12Z

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...]]>

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

cuda_fortran_simple

This post is the first in a series on CUDA Fortran, which is the Fortran interface to the CUDA parallel computing platform. If you are familiar with CUDA C, then you are already well on your way to using CUDA Fortran as it is based on the CUDA C runtime API. There are a few differences in how CUDA concepts are expressed using Fortran 90 constructs, but the programming model for both CUDA Fortran��

]]> 7 Mark Ebersole http://www.open-lab.net/blog/parallelforall <![CDATA[CUDA 101: Get Ahead of the CUDA Curve with Practice!]]> http://www.parallelforall.com/?p=83 2023-05-22T22:51:32Z 2012-06-11T23:26:52Z

After a recent talk I gave called "CUDA 101:?Intro to GPU Computing", a student asked "What's the best way for me to get experience in parallel programming and...]]>

After a recent talk I gave called "CUDA 101:?Intro to GPU Computing", a student asked "What's the best way for me to get experience in parallel programming and...

cuda_hamster_blog

After a recent talk I gave called ��CUDA 101: Intro to GPU Computing��, a student asked ��What��s the best way for me to get experience in parallel programming and CUDA?��. This is a question I struggled a lot with when I was in college and one I still ask myself about various topics today. The first step is to realize that it��s hard to get useful experience without having some skill in an area.

]]> 0 ��˳��97caoporen��