Every year NVIDIA’s GPU Technology Conference (GTC) gets bigger and better. One of the aims of GTC is to give developers, scientists, and practitioners opportunities to learn with hands-on labs how to use accelerated computing in their work. This year we are nearly doubling the amount of hands-on training provided from last year, with almost 2,400 lab hours available to GTC attendees!
]]>As CUDA Educator at NVIDIA, I work to give access to massively parallel programming education & training to everyone, whether or not they have access to GPUs in their own machines. This is why, in partnership with qwikLABS, NVIDIA has made the hands-on content we use to train thousands of developers at the Supercomputing Conference and the GPU Technology Conference online and accessible from…
]]>In the previous CUDACasts episode, we saw how to flash your Jetson TK1 to the latest release of Linux4Tegra, and install both the CUDA toolkit and OpenCV SDK. We’ll continue exploring the power efficiency the Jetson TK1 Kepler-based GPU brings to computer vision by porting a simple OpenCV sample to run on the GPU. We’ll explore computer vision further in a future CUDACast when we look at the…
]]>The Jetson TK1 development kit has fast become a must-have for mobile and embedded parallel computing due the amazing level of performance packed into such a low-power board. In this and the following CUDACast, you’ll learn how to get started building computer vision applications on your Jetson TK1 using CUDA and the OpenCV library. CUDACasts are short how-to screencast videos about new…
]]>In the world of high-performance computing, it is important to understand how your code affects the operating characteristics of your HW. For example, if your program executes inefficient code, it may cause the GPU to work harder than it needs to, leading to higher power consumption, and a potential slow-down due to throttling. A new profiling feature in CUDA 5.5 allows you to profile the…
]]>So far in the CUDA Python mini-series on CUDACasts, I introduced you to using the decorator and CUDA libraries, two different methods for accelerating code using NVIDIA GPUs. In today’s CUDACast, I’ll be demonstrating how to use the NumbaPro compiler from Continuum Analytics to write CUDA Python code which runs on the GPU. In CUDACast #12, we’ll continue using the Monte Carlo options pricing…
]]>In the previous episode of CUDACasts I introduced you to NumbaPro, the high-performance Python compiler from Continuum Analytics, and demonstrated how to accelerate simple Python functions on the GPU. Continuing the Python theme, today’s CUDACast demonstrates NumbaPro’s support for CUDA libraries. The optimized algorithms in GPU-accelerated libraries often provide the easiest way to accelerate…
]]>This week’s CUDACast continues the Parallel Forall Python theme kicked off in last week’s post by Mark Harris, demonstrating exciting new support for CUDA acceleration in Python with NumbaPro. This video is the first in a 3-part series showing various ways to accelerate your Python code on NVIDIA GPUs. Tomorrow you won’t want to miss the chance to learn about Python GPU acceleration with…
]]>Visual tools offer a very efficient method for developing and debugging applications. When working on massively parallel codes built on the CUDA Platform, this visual approach is even more important because you could be dealing with tens of thousands of parallel threads. With the free NVIDIA Nsight Eclipse Edition IDE, you can quickly and easily examine the GPU memory state in a running CUDA C…
]]>GPU libraries provide an easy way to accelerate applications without writing any GPU-specific code. With the new CUDA 5.5 version of the NVIDIA CUFFT Fast Fourier Transform library, FFT acceleration gets even easier, with new support for the popular FFTW API. It is now extremely simple for developers to accelerate existing FFTW library calls on the GPU, sometimes with no code changes!
]]>The NVIDIA System Management Interface, nvidia-smi, is a command-line interface to the NVIDIA Management Library, NVML. nvidia-smi provides Linux system administrators with powerful GPU configuration and monitoring tools. HPC cluster system administrators need to be able to monitor resource utilization (processor time, memory usage, etc.) on their systems. This resource monitoring is typically…
]]>In CUDACast #5, we saw how to use the new NVIDIA RPM and Debian packages to install the CUDA toolkit, samples, and driver on a supported Linux OS with a standard package manager. With CUDA 5.5, it is now possible to compile and run CUDA applications on Arm-based systems such as the Kayla development platform. In addition to native compilation on an Arm-based CPU system, it is also possible to…
]]>Today, CUDA 5.5 has been officially released! To continue with our CUDACasts mini-series on new CUDA 5.5 features, we will be exploring a new method for installing the CUDA platform on a supported Linux OS. In previous versions of CUDA, you would have used the run-file installer, a utility that handled installing the CUDA Toolkit, samples, and NVIDIA driver. While the run-file installer is still…
]]>After a recent talk I gave called “CUDA 101: Intro to GPU Computing”, a student asked “What’s the best way for me to get experience in parallel programming and CUDA?”. This is a question I struggled a lot with when I was in college and one I still ask myself about various topics today. The first step is to realize that it’s hard to get useful experience without having some skill in an area.
]]>