Jeff Larkin – NVIDIA Technical Blog

Jeff Larkin – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-01T18:34:42Z http://www.open-lab.net/blog/feed/ Jeff Larkin http://jefflarkin.com <![CDATA[Profit and Loss Modeling on GPUs with ISO C++ Language Parallelism]]> http://www.open-lab.net/blog/?p=85106 2024-08-22T18:25:37Z 2024-08-07T16:30:00Z

The previous post How to Accelerate Quantitative Finance with ISO C++ Standard Parallelism demonstrated how to write a Black-Scholes simulation using ISO C++...]]>

The previous post How to Accelerate Quantitative Finance with ISO C++ Standard Parallelism demonstrated how to write a Black-Scholes simulation using ISO C++ standard parallelism with the code found in the /NVIDIA/accelerated-quant-finance GitHub repo. This approach enables you to productively write code that is both concise and portable. Using solely standard C++, it’s possible to write an…

]]> Jeff Larkin http://jefflarkin.com <![CDATA[How to Accelerate Quantitative Finance with ISO C++ Standard Parallelism]]> http://www.open-lab.net/blog/?p=78691 2024-04-09T23:45:35Z 2024-03-06T19:00:00Z

Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in...]]>

Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in quantitative investment contexts. They contain a wide range of functionalities, often proprietary, to support the valuation, risk management, construction, and optimization of investment portfolios. Financial firms that develop such…

]]> 1 Jeff Larkin http://jefflarkin.com <![CDATA[Simplifying GPU Programming for HPC with NVIDIA Grace Hopper Superchip]]> http://www.open-lab.net/blog/?p=72720 2023-11-16T19:16:39Z 2023-11-13T17:13:02Z

The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most...]]>

The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most notably, the bidirectional, high-bandwidth, and cache-coherent connection between CPU and GPU memory means that the user can develop their application for both processors while using a single, unified address space.

]]> 1 Jeff Larkin http://jefflarkin.com <![CDATA[Using Fortran Standard Parallel Programming for GPU Acceleration]]> http://www.open-lab.net/blog/?p=48632 2023-12-05T21:53:22Z 2022-06-12T21:28:55Z

Standard languages have begun adding features that compilers can use for accelerated GPU and CPU parallel programming, for instance, do concurrent loops and...]]>

Standard languages have begun adding features that compilers can use for accelerated GPU and CPU parallel programming, for instance, loops and array math intrinsics in Fortran. This is the fourth post in the Standard Parallel Programming series, which aims to instruct developers on the advantages of using parallelism in standard languages for accelerated computing: Using standard…

]]> 8 Jeff Larkin http://jefflarkin.com <![CDATA[Multi-GPU Programming with Standard Parallel C++, Part 2]]> http://www.open-lab.net/blog/?p=44906 2023-12-05T21:52:40Z 2022-04-18T23:20:23Z

It may seem natural to expect that the performance of your CPU-to-GPU port will range below that of a dedicated HPC code. After all, you are limited by the...]]>

It may seem natural to expect that the performance of your CPU-to-GPU port will range below that of a dedicated HPC code. After all, you are limited by the constraints of the software architecture, the established API, and the need to account for sophisticated extra features expected by the user base. Not only that, the simplistic programming model of C++ standard parallelism allows for less…

]]> 0 Jeff Larkin http://jefflarkin.com <![CDATA[Multi-GPU Programming with Standard Parallel C++, Part 1]]> http://www.open-lab.net/blog/?p=44904 2023-12-05T21:52:55Z 2022-04-18T23:18:13Z

The difficulty of porting an application to GPUs varies from one case to another. In the best-case scenario, you can accelerate critical code sections by...]]>

The difficulty of porting an application to GPUs varies from one case to another. In the best-case scenario, you can accelerate critical code sections by calling into an existing GPU-optimized library. This is, for example, when the building blocks of your simulation software consist of BLAS linear algebra functions, which can be accelerated using cuBLAS. This is the second post in the…

]]> 0 Jeff Larkin http://jefflarkin.com <![CDATA[Developing Accelerated Code with Standard Language Parallelism]]> http://www.open-lab.net/blog/?p=43006 2025-02-25T19:38:50Z 2022-01-12T17:14:46Z

The NVIDIA platform is the most mature and complete platform for accelerated computing. In this post, I address the simplest, most productive, and most portable...]]>

The NVIDIA platform is the most mature and complete platform for accelerated computing. In this post, I address the simplest, most productive, and most portable approach to accelerated computing. This is the first post in the Standard Parallel Programming series, which aims to instruct developers on the advantages of using parallelism in standard languages for accelerated computing…

]]> 0 Jeff Larkin http://jefflarkin.com <![CDATA[Getting Started with OpenACC]]> http://www.open-lab.net/blog/parallelforall/?p=5507 2022-08-21T23:37:33Z 2015-07-14T03:48:18Z

This week NVIDIA has released the NVIDIA OpenACC Toolkit, a starting point for anyone interested in using OpenACC. OpenACC gives scientists and researchers...]]>

This week NVIDIA has released the NVIDIA OpenACC Toolkit, a starting point for anyone interested in using OpenACC. OpenACC gives scientists and researchers a simple and powerful way to accelerate scientific computing without significant programming effort. The toolkit includes the PGI OpenACC Compiler, the NVIDIA Visual Profiler with CPU and GPU profiling, and the new OpenACC Programming and Best…

]]> 8 Jeff Larkin http://jefflarkin.com <![CDATA[GPU Pro Tip: Track MPI Calls In The NVIDIA Visual Profiler]]> http://www.open-lab.net/blog/parallelforall/?p=5177 2022-08-21T23:37:32Z 2015-05-06T02:30:13Z

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI?(Message Passing Interface) calls on the GPU timeline in the...]]>

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI (Message Passing Interface) calls on the GPU timeline in the profiler. While tools like Vampir and Tau will allow programmers to see a big picture view of how a parallel application performs, sometimes all you need is a look at how MPI is affecting GPU performance on a single node using a simple tool…

]]> 2 Jeff Larkin http://jefflarkin.com <![CDATA[3 Versatile OpenACC Interoperability Techniques]]> http://www.open-lab.net/blog/parallelforall/?p=3523 2025-05-01T18:34:29Z 2014-09-02T13:00:16Z

OpenACC is a high-level programming model for accelerating applications with GPUs and other devices using compiler directives compiler directives to specify...]]>

OpenACC is a high-level programming model for accelerating applications with GPUs and other devices using compiler directives compiler directives to specify loops and regions of code in standard C, C++ and Fortran to offload from a host CPU to an attached accelerator. OpenACC simplifies accelerating applications with GPUs. OpenACC tutorial: Three Steps to More Science An often-overlooked…

]]> 6 Jeff Larkin http://jefflarkin.com <![CDATA[7 Powerful New Features in OpenACC 2.0]]> http://www.open-lab.net/blog/parallelforall/?p=2625 2025-05-01T18:34:42Z 2014-02-27T01:00:00Z

OpenACC is a high-level programming model for accelerators, such as NVIDIA?GPUs, that allows programmers to accelerate applications using compiler?directives...]]>

OpenACC is a high-level programming model for accelerators, such as NVIDIA GPUs, that allows programmers to accelerate applications using compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded to an accelerator. Through the use of compiler directives, OpenACC allows programmers to maintain a single source code for the CPU and GPU that is portable…

]]> 2 ��˳��97caoporen��