The new release includes several enhancements to the Math Libraries and improvements for C++ programming.
]]>The new release includes several new features including improved stdpar programming and Arm processor support.
]]>CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and other CUDA operations together and executes them with a specified dependency tree. It speeds up the workflow by combining the driver activities associated with CUDA kernel launches and CUDA API calls. It also enforces the dependencies with��
]]>The previous post How to Accelerate Quantitative Finance with ISO C++ Standard Parallelism demonstrated how to write a Black-Scholes simulation using ISO C++ standard parallelism with the code found in the /NVIDIA/accelerated-quant-finance GitHub repo. This approach enables you to productively write code that is both concise and portable. Using solely standard C++, it��s possible to write an��
]]>The new release delivers support for Ubuntu 24.04, new Fortran interfaces for CUDA Graphs, and a major version NVSHMEM API update. It is the last release to support RHEL 7.
]]>NVIDIA HPC SDK 24.5 updates include support for new NVPL components and CUDA 12.4.
]]>AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements of these new AI workloads, HPC is scaling up at a rapid pace. To enable applications to scale to multi-GPU and multi-node platforms, HPC tools and libraries must support that growth. NVIDIA provides a comprehensive ecosystem of��
]]>This NVIDIA HPC SDK update includes the cuBLASMp preview library, along with minor bug fixes and enhancements.
]]>Explore the status of Quantum ESPRESSO porting strategies that enable state-of-the-art performance on HPC systems.
]]>On December 7, learn how to verify OpenACC implementations across compilers and system architectures with the validation testsuite.
]]>High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern data center, HPC synergizes with AI, harnessing data in transformative new ways. The performance and throughput demands of next-generation HPC applications call for an accelerated computing platform that can handle diverse workloads��
]]>The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most notably, the bidirectional, high-bandwidth, and cache-coherent connection between CPU and GPU memory means that the user can develop their application for both processors while using a single, unified address space.
]]>This NVIDIA HPC SDK 23.9 update expands platform support and provides minor updates.
]]>NVIDIA HPC SDK version 23.7 is now available and provides minor updates and enhancements.
]]>This update expands platform support and provides minor updates.
]]>This version 23.05 update to the NVIDIA PhysicsNeMo platform expands support for physics-ML and provides minor updates.
]]>Version 23.3 expands platform support and provides minor updates to the NVIDIA HPC SDK.
]]>Celebrating the SuperComputing 2022 international conference, NVIDIA announces the release of HPC Software Development Kit (SDK) v22.11. Members of the NVIDIA Developer Program can download the release now for free. The NVIDIA HPC SDK is a comprehensive suite of compilers, libraries, and tools for high performance computing (HPC) developers. It provides everything developers need to��
]]>This version 22.9 update to the NVIDIA HPC SDK includes fixes and minor enhancements.
]]>The latest NVIDIA HPC SDK update expands portability and now supports the Arm-based AWS Graviton3 processor. In this post, you learn how to enable Scalable Vector Extension (SVE) auto-vectorization with the NVIDIA compilers to maximize the performance of HPC applications running on the AWS Graviton3 CPU. The NVIDIA HPC SDK includes the proven compilers, libraries��
]]>This release includes enhancements, fixes, and new support for Arm SVE, Rocky Linux OS, and Amazon EC2 C7g instances, powered by the latest generation AWS Graviton3 processors.
]]>High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling climate change, or creating accurate simulations of our world, these solutions demand an enormous��and rapidly growing��amount of processing power. They are increasingly out of reach of traditional computing approaches.
]]>It may seem natural to expect that the performance of your CPU-to-GPU port will range below that of a dedicated HPC code. After all, you are limited by the constraints of the software architecture, the established API, and the need to account for sophisticated extra features expected by the user base. Not only that, the simplistic programming model of C++ standard parallelism allows for less��
]]>The difficulty of porting an application to GPUs varies from one case to another. In the best-case scenario, you can accelerate critical code sections by calling into an existing GPU-optimized library. This is, for example, when the building blocks of your simulation software consist of BLAS linear algebra functions, which can be accelerated using cuBLAS. This is the second post in the��
]]>Our weekly roundup covers the most recent software updates, learning resources, events, and notable news. This week we have several software releases. Software releases The NVIDIA HPC SDK is a comprehensive suite of compilers, libraries, and tools for developing accelerated HPC applications. With a breadth of flexible support options, users can create applications with a��
]]>The NVIDIA platform is the most mature and complete platform for accelerated computing. In this post, I address the simplest, most productive, and most portable approach to accelerated computing. This is the first post in the Standard Parallel Programming series, which aims to instruct developers on the advantages of using parallelism in standard languages for accelerated computing��
]]>At the Supercomputing Conference (SC21) NVIDIA preannounced the next update to the HPC SDK. Today, the HPC SDK 21.11 release was posted for free download to Developer Program members. The NVIDIA HPC SDK is a comprehensive suite of compilers and libraries for high performance computing development. It includes a wide variety of tools proven to maximize developer productivity, as well as the��
]]>Today, NVIDIA announced the upcoming HPC SDK 21.11 release with new Library enhancements. This software will be available free of charge in the coming weeks. The NVIDIA HPC SDK is a comprehensive suite of compilers and libraries for high-performance computing development. It includes a wide variety of tools proven to maximize developer productivity, as well as the performance and portability��
]]>In July of 2021, NVIDIA announced the availability of the NVIDIA Arm HPC Developer Kit for preordering, along with the NVIDIA HPC SDK. Since then NVIDIA and its partners have been working hard to get units into the hands of developers, to increase global availability, and enhance the software stack. The NVIDIA Arm HPC Developer Kit is based on the GIGABYTE G242-P32 2U server.
]]>Today NVIDIA announced the availability of the NVIDIA Arm HPC Developer Kit with the NVIDIA HPC SDK version 21.7. The DevKit is an integrated hardware-software platform for creating, evaluating, and benchmarking HPC, AI, and scientific computing applications for Arm server based accelerated platforms. The HPC SDK v21.7 is the latest update of the software development kit, and fully supports the��
]]>Tensor Cores, which are programmable matrix multiply and accumulate units, were first introduced in the V100 GPUs where they operated on half-precision (16-bit) multiplicands. Tensor Core functionality has been expanded in the following architectures, and in the Ampere A100 GPUs (compute capability 8.0) support for other data types was added, including double precision.
]]>Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages and libraries. Since then, programming paradigms have evolved and so has the NVIDIA HPC SDK. In this post, I demonstrate five ways to implement a simple SAXPY computation using NVIDIA GPUs. Why is this interesting?
]]>Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first is testing changes to parts of a program, new compile-time flags, or a port to a new compiler or to a new processor. You might want to test whether a new library gives the same result, or test the safety of adding OpenMP parallelism��
]]>HPC development environments are typically complex configurations composed of multiple software packages, each providing unique capabilities. In addition to the core set of compilers used for building software from source code, they often include a number of specialty packages covering a broad range of operations such as communications, data structures, mathematics, I/O control��
]]>Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran Standard Parallel Programming for GPU Acceleration, which aims to instruct developers on the advantages of using parallelism in standard languages for accelerated computing. Now with the latest 20.11 release of the NVIDIA HPC SDK��
]]>The C++ standard library contains a rich collection of containers, iterators, and algorithms that can be composed to produce elegant solutions to complex problems. Most importantly, they are fast, making C++ an attractive choice for writing highly performant code. NVIDIA recently introduced stdpar: a way to automatically accelerate the execution of C++ standard library algorithms on GPUs��
]]>Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: In many cases, the results of these ports are worth the effort. But what if you could get the same effect without that cost? What if you could take your Standard C++ code and accelerate on a GPU? Now you can!
]]>