Explore the status of Quantum ESPRESSO porting strategies that enable state-of-the-art performance on HPC systems.
]]>On December 7, learn how to verify OpenACC implementations across compilers and system architectures with the validation testsuite.
]]>This NVIDIA HPC SDK 23.9 update expands platform support and provides minor updates.
]]>NVIDIA HPC SDK version 23.7 is now available and provides minor updates and enhancements.
]]>This update expands platform support and provides minor updates.
]]>On June 6, learn how researchers use OpenACC for GPU acceleration of multiphase and compressible flow solvers that obtain speedups at scale.
]]>Accurate weather modeling is essential for companies to properly forecast renewable energy production and plan for natural disasters. Ineffective and non-forecasted weather cost an estimated $714 billion in 2022 alone. To avoid this, companies need faster, cheaper, and more accurate weather models. In a recent GTC session, Microsoft, and TempoQuest detailed their work with NVIDIA to address��
]]>Version 23.3 expands platform support and provides minor updates to the NVIDIA HPC SDK.
]]>Version 23.1 of the NVIDIA HPC SDK introduces CUDA 12 support, fixes, and minor enhancements.
]]>Celebrating the SuperComputing 2022 international conference, NVIDIA announces the release of HPC Software Development Kit (SDK) v22.11. Members of the NVIDIA Developer Program can download the release now for free. The NVIDIA HPC SDK is a comprehensive suite of compilers, libraries, and tools for high performance computing (HPC) developers. It provides everything developers need to��
]]>Join this digital conference from August 2-4 to learn how science is being advanced through the work done at Open Hackathons or accelerated using OpenACC.
]]>Standard languages have begun adding features that compilers can use for accelerated GPU and CPU parallel programming, for instance, loops and array math intrinsics in Fortran. This is the fourth post in the Standard Parallel Programming series, which aims to instruct developers on the advantages of using parallelism in standard languages for accelerated computing: Using standard��
]]>While the world is continuously changing, one constant is the ongoing drive of developers to tackle challenges using innovative technologies. The recent Taiwan Computing Cloud (TWCC) GPU Hackathon exemplified such a drive, serving as a catalyst for developers and engineers to advance their HPC and AI projects using GPUs. A collaboration between the National Center for High-Performance��
]]>Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first is testing changes to parts of a program, new compile-time flags, or a port to a new compiler or to a new processor. You might want to test whether a new library gives the same result, or test the safety of adding OpenMP parallelism��
]]>This year��s OpenACC 2020 Summit is going digital. Scheduled from August 31st to September 4th, the OpenACC Summit brings together users of the OpenACC programming model and members of OpenACC organization across national laboratories, research institutions, and industry. This year the Summit will be completely online and feature a keynote from Martijn Marsman from the University of Vienna��
]]>The Government of India��s Center for Development of Advanced Computing (C-DAC) under Ministry of Electronics and IT (MeitY) in association with NVIDIA, and OpenACC, organized the SAMHAR-COVID19 Hackathon to help researchers combat ongoing COVID-19 pandemic and help the scientific community predict future outbreaks. Through C-DAC��s program, Supercomputing using artificial intelligence��
]]>Developers of the world��s leading HPC application for atomic scale modelling, Vienna Ab initio Simulation Package (VASP), rolled out VASP 6.1.0 which ports new and expanded acceleration in NVIDIA GPUs through OpenACC. VASP is one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. Senior scientist and VASP lead developer Dr.
]]>New PGI Community Edition supports NVIDIA V100 Tensor Cores in CUDA Fortran, the full C++17 language, PCAST CPU/GPU auto-compare directives, OpenACC 2.6 and more. PGI Compilers & Tools are for scientists and engineers developing high-performance computing (HPC) applications. PGI products deliver world-class multicore CPU performance, an easy on-ramp to GPU computing with OpenACC directives��
]]>PGI Compilers & Tools are used by scientists and engineers developing applications for high-performance computing (HPC). PGI products deliver world-class multicore CPU performance, an easy on-ramp to GPU computing with OpenACC directives, and performance portability across all major HPC platforms. Available for free download. New Features in PGI 19.4 Link to full description of��
]]>At CES in Las Vegas, Nevada, The Weather Company, an IBM subsidiary, announced a new GPU-accelerated global weather forecasting system that uses crowdsourced data to deliver hourly weather updates worldwide. The new system named GRAF, Global High-Resolution Atmospheric Forecasting System, can predict something as small as thunderstorms globally. ��Compared to existing models, GRAF will provide a��
]]>In the age of Exascale, scientists are striving to use the latest generation of supercomputers to do more science faster. At the same time many researchers find themselves trapped in new complex technologies and architectures that are not always easy to grasp �� they need tools that can help them spend less time on programming for new machines, and more time on science. OpenACC is a directive��
]]>A new blog details the history of the OpenACC GCC implementation, its availability, and enhancements to OpenACC support in GCC. You will also learn about a recent project to assess and improve the performance of codes compiled with GCC��s OpenACC support. A scalar optimizing compiler has a really good day when it gets an optimization that boosts performance by 5%.
]]>PGI Compilers & Tools are used by scientists and engineers developing applications for high-performance computing (HPC). PGI products deliver world-class multicore CPU performance, an easy on-ramp to GPU computing with OpenACC directives, and performance portability across all major HPC platforms. Version 17.10 is available now for users with current PGI Professional support.
]]>PGI compilers & tools are used by scientists and engineers who develop applications for high-performance computing (HPC) systems. They deliver world-class multicore CPU performance, an easy on-ramp to GPU computing with OpenACC directives, and performance portability across all major HPC platforms. 17.7 is available now for users with current PGI Professional support. New Features in PGI 17.7��
]]>PGI compilers and tools are used by scientists and engineers who develop applications for high-performance computing (HPC) systems. They deliver world-class multicore CPU performance, an easy on-ramp to GPU computing with OpenACC directives, and performance portability across all major HPC platforms. New update now available at no cost. PGI 17.4 Community Edition Download now��
]]>Todd Raeker, Research Technology Consultant at the University of Michigan shares how a group of 50 researchers at University of Michigan are using GPUs and OpenACC to accelerate the codes for their data-driven physics simulations. The current versions of the codes use MPI and depend on finer and finer meshes for higher accuracy which are computationally demanding. To overcome the demands��
]]>Janus Juul Eriksen, a Ph.D. fellow at Aarhus University in Denmark, shares how he is using OpenACC to optimize and accelerate the quantum chemistry code LSDalton on the Titan Supercomputer at Oak Ridge National Laboratory. ��OpenACC makes GPU computing approachable for domain scientists,�� said Eriksen. ��Initial OpenACC implementation required only minor effort, and more importantly��
]]>Anne Severt, PhD student at Forschungszentrum J��lich in Germany shares how she is using NVIDIA Tesla K80s and OpenACC with complex geometries to create real-time simulations of smoke propagation to better prepare firefighters for real-life situations �C such as where smoke will be propagating from underground metro stations over time. To learn more, view Anne��s poster from this year��s��
]]>Oak Ridge National Lab, NVIDIA and PGI launched the OpenACC Hackathon initiative last year to help scientists accelerate applications on GPUs. OpenACC was selected as a primary tool since it offers acceleration without significant programming effort and works great with existing application codes. University of Delaware (UDEL) hosted a five-day Hackathon last week. Selected teams of scientific��
]]>In partnership with J��lich Supercomputing Center and Oak Ridge National Labs, TU Dresden is hosting a ��EuroHack�� GPU Hackathon February 29 to March 4, 2016 at their Germany campus. Paired with two GPU mentors each, teams of scientific application developers will set forth on a five-day project to accelerate their code with GPUs. The mentors provide guidance based on extensive experience��
]]>Stony Brook University researchers are exploring the physics of Type Ia supernovas using the Tesla-accelerated Titan Supercomputer at Oak Ridge National Laboratory. It��s been estimated that Type Ia supernovas can be used to calculate distances to within 10 percent accuracy, good enough to help scientists determine that the expansion of the universe is accelerating, a discovery that garnered��
]]>OpenACC gives scientists and researchers a simple and powerful way to accelerate scientific computing applications incrementally. The OpenACC API describes a collection of compiler directives to specify loops and regions of code in standard C, C++, and Fortran to be offloaded from a host CPU to an attached accelerator. OpenACC is designed for portability across operating systems, host CPUs��
]]>New PGI compiler release includes support for C++ and Fortran applications to run in parallel on multi-core CPUs or GPU accelerators. OpenACC gives scientists and researchers a simple and powerful way to accelerate scientific computing applications incrementally. With the PGI Compiler 15.10 release, OpenACC enables performance portability between accelerators and multicore CPUs.
]]>We love seeing all of the social media posts from developers using NVIDIA GPUs �C here are a few highlights from the week: https://twitter.com/tnybny/status/650845294117191680 On Twitter? Follow @GPUComputing and @mention us and/or use hashtags so we��re able to keep track of what you��re up to: #CUDA, #cuDNN, #OpenACC.
]]>Interactive lectures, hands-on labs, and live office hours. Learn everything you need to start accelerating your code on GPUs and CPUs. Join HPC industry��s OpenACC experts for a free online course. This course is comprised of four instructor-led classes that include interactive lectures, hands-on exercises, and office hours with the instructors. You��ll learn everything you need to start��
]]>The post Getting Started with OpenACC covered four steps to progressively accelerate your code with OpenACC. It��s often necessary to use OpenACC directives to express both loop parallelism and data locality in order to get good performance with accelerators. After expressing available parallelism, excessive data movement generated by the compiler can be a bottleneck, and correcting this by adding��
]]>For this interview, I reached out to Janus Juul Eriksen, a Ph.D. fellow at Aarhus University in Denmark. Janus is a chemist by trade without any formal education in computer science; but he is getting up to 12x speed-up compared to his CPU-only code after modifying less than 100 lines of code with one week of programming effort. How did he do this? He used OpenACC. OpenACC is a simple��
]]>This week NVIDIA has released the NVIDIA OpenACC Toolkit, a starting point for anyone interested in using OpenACC. OpenACC gives scientists and researchers a simple and powerful way to accelerate scientific computing without significant programming effort. The toolkit includes the PGI OpenACC Compiler, the NVIDIA Visual Profiler with CPU and GPU profiling, and the new OpenACC Programming and Best��
]]>Programmability is crucial to accelerated computing, and NVIDIA��s CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA Toolkits have been downloaded since its first launch. However, there are many scientists and researchers yet to benefit from GPU computing. These scientists have limited time to learn and apply a parallel programming language, and they often have��
]]>Six scientific computing teams from around the world spent an intense week late last year porting their applications to GPUs using OpenACC directives. The Oak Ridge Leadership Computing Facility (OLCF) hosted its first ever OpenACC Hackathon in Knoxville, Tennessee. Paired with two GPU mentors, each team of scientific developers set forth on the journey to accelerate their code with GPUs. Dr.
]]>With one week to go until we all descend on GTC 2015, I��ve scoured through the list of Accelerated Computing sessions and put together 12 diverse ��not to miss�� talks you should add to your planner. This year, the conference is highlighting the revolution in Deep Learning that will affect every aspect of computing. GTC 2015 includes over 40 session categories, including deep learning and machine��
]]>Every year NVIDIA��s GPU Technology Conference (GTC) gets bigger and better. One of the aims of GTC is to give developers, scientists, and practitioners opportunities to learn with hands-on labs how to use accelerated computing in their work. This year we are nearly doubling the amount of hands-on training provided from last year, with almost 2,400 lab hours available to GTC attendees!
]]>As CUDA Educator at NVIDIA, I work to give access to massively parallel programming education & training to everyone, whether or not they have access to GPUs in their own machines. This is why, in partnership with qwikLABS, NVIDIA has made the hands-on content we use to train thousands of developers at the Supercomputing Conference and the GPU Technology Conference online and accessible from��
]]>OpenACC is a high-level programming model for accelerating applications with GPUs and other devices using compiler directives compiler directives to specify loops and regions of code in standard C, C++ and Fortran to offload from a host CPU to an attached accelerator. OpenACC simplifies accelerating applications with GPUs. OpenACC tutorial: Three Steps to More Science An often-overlooked��
]]>When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it��s necessary to identify the MPI rank where the performance issue occurs. Before CUDA 6.5 it was hard to do this because the CUDA profiler only shows the PID of the processes and leaves the developer to figure out the mapping from PIDs to MPI ranks. Although the mapping can be done��
]]>Computational Fluid Dynamics (CFD) is a valuable tool to study the behavior of fluids. Today, many areas of engineering use CFD. For example, the automotive industry uses CFD to study airflow around cars, and to optimize the car body shapes to reduce drag and improve fuel efficiency. To get accurate results in fluid simulation it is necessary to capture complex phenomena such as turbulence��
]]>OpenACC is a high-level programming model for accelerators, such as NVIDIA GPUs, that allows programmers to accelerate applications using compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded to an accelerator. Through the use of compiler directives, OpenACC allows programmers to maintain a single source code for the CPU and GPU that is portable��
]]>I introduced CUDA-aware MPI in my last post, with an introduction to MPI and a description of the functionality and benefits of CUDA-aware MPI. In this post I will demonstrate the performance of MPI through both synthetic and realistic benchmarks. Since you now know why CUDA-aware MPI is more efficient from a theoretical perspective, let��s take a look at the results of MPI bandwidth and��
]]>MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed processes that is commonly used in HPC to build applications that can scale to multi-node computer clusters. As such, MPI is fully compatible with CUDA, which is designed for parallel computing on a single computer or node. There are many reasons for wanting to combine the two parallel��
]]>You may want to read the more recent post Getting Started with OpenACC by Jeff Larkin. In my previous post I added 3 lines of OpenACC directives to a Jacobi iteration code, achieving more than 2x speedup by running it on a GPU. In this post I��ll continue where I left off and demonstrate how we can use OpenACC directives clauses to take more explicit control over how the compiler parallelizes our��
]]>You may want to read the more recent post Getting Started with OpenACC by Jeff Larkin. In this post I��ll continue where I left off in my introductory post about OpenACC and provide a somewhat more realistic example. This simple C/Fortran code example demonstrates a 2x speedup with the addition of just a few lines of OpenACC directives, and in the next post I��ll add just a few more lines to push��
]]>