Linear Algebra – NVIDIA Technical Blog

Linear Algebra – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-21T20:30:26Z http://www.open-lab.net/blog/feed/ Anton Anders <![CDATA[NVIDIA cuDSS Advances Solver Technologies for Engineering and Scientific Computing]]> http://www.open-lab.net/blog/?p=96466 2025-03-06T19:26:49Z 2025-02-25T18:30:56Z

NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in...]]>

NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in...

NVIDIA cuDSS 3.0

NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in data centers and other environments and supports single-GPU, multi-GPU and multi-node (MGMN) configurations. cuDSS has become a key tool for accelerating computer-aided engineering (CAE) workflows and scientific computations across��

]]> 0 Eryk Lewinson <![CDATA[A Comprehensive Overview of Regression Evaluation Metrics]]> http://www.open-lab.net/blog/?p=63623 2023-07-11T23:19:05Z 2023-04-20T15:00:00Z

As a data scientist, evaluating machine learning model performance is a crucial aspect of your work. To do so effectively, you have a wide range of statistical...]]>

As a data scientist, evaluating machine learning model performance is a crucial aspect of your work. To do so effectively, you have a wide range of statistical... Stylized image of a line chart with a magnifying glass next to it.

Stylized image of a line chart with a magnifying glass next to it.

As a data scientist, evaluating machine learning model performance is a crucial aspect of your work. To do so effectively, you have a wide range of statistical metrics at your disposal, each with its own unique strengths and weaknesses. By developing a solid understanding of these metrics, you are not only better equipped to choose the best one for optimizing your model but also to explain your��

]]> 1 Andrew Kerr <![CDATA[CUTLASS: Fast Linear Algebra in CUDA C++]]> http://www.open-lab.net/blog/parallelforall/?p=8708 2023-02-13T17:46:48Z 2017-12-06T04:03:29Z

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview...]]>

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview...

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview release described in the blog post below. We have decomposed the structure of the GEMM computation into deeper, structured primitives for loading data, computing predicate masks, streaming data at each level of the GEMM hierarchy��

]]> 13 Jeremy Appleyard <![CDATA[Programming Tensor Cores in CUDA 9]]> http://www.open-lab.net/blog/parallelforall/?p=8496 2024-05-17T17:25:34Z 2017-10-17T09:29:09Z

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x...]]>

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x... Decorative image of Tensor Cores.

Decorative image of Tensor Cores.

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x the 32-bit floating point throughput of the previous-generation NVIDIA P100. Tensor Cores enable you to use mixed-precision for higher throughput without sacrificing accuracy. Tensor Cores provide a huge boost to convolutions and matrix operations.

]]> 14 Cris Cecka <![CDATA[Pro Tip: cuBLAS Strided Batched Matrix Multiply]]> http://www.open-lab.net/blog/parallelforall/?p=7561 2022-08-21T23:38:07Z 2017-02-28T03:39:17Z

There��s a new computational workhorse in town. For decades, general matrix-matrix multiply��known as GEMM in Basic Linear Algebra Subroutines (BLAS)...]]>

There��s a new computational workhorse in town. For decades, general matrix-matrix multiply��known as GEMM in Basic Linear Algebra Subroutines (BLAS)... GPU Pro Tip

GPU Pro Tip

There��s a new computational workhorse in town. For decades, general matrix-matrix multiply��known as GEMM in Basic Linear Algebra Subroutines (BLAS) libraries��has been a standard benchmark for computational performance. GEMM is possibly the most optimized and widely used routine in scientific computing. Expert implementations are available for every architecture and quickly achieve the peak��

]]> 11 Maxim Naumov <![CDATA[Graph Coloring: More Parallelism for Incomplete-LU Factorization]]> http://www.open-lab.net/blog/parallelforall/?p=5367 2022-08-21T23:37:33Z 2015-06-09T08:00:05Z

In this blog post I will briefly discuss the importance and simplicity of graph coloring and its application to one of the most common problems in sparse linear...]]>

In this blog post I will briefly discuss the importance and simplicity of graph coloring and its application to one of the most common problems in sparse linear...

graph_coloring_thumb

]]> 0 Joe Eaton <![CDATA[Parallel Direct Solvers with cuSOLVER: Batched QR]]> http://www.open-lab.net/blog/parallelforall/?p=5106 2022-08-21T23:37:32Z 2015-04-28T13:00:20Z

[Note:?Lung Sheng Chien from NVIDIA also contributed to this post.] A key bottleneck for most science and engineering simulations is the solution of sparse...]]>

[Note:?Lung Sheng Chien from NVIDIA also contributed to this post.] A key bottleneck for most science and engineering simulations is the solution of sparse... CUDA 7

CUDA 7

[Note: Lung Sheng Chien from NVIDIA also contributed to this post.] A key bottleneck for most science and engineering simulations is the solution of sparse linear systems of equations, which can account for up to 95% of total simulation time. There are two types of solvers for these systems: iterative and direct solvers. Iterative solvers are favored for the largest systems these days (see my��

]]> 2 Massimiliano Fatica <![CDATA[Optimizing the High Performance Conjugate Gradient Benchmark on GPUs]]> http://www.open-lab.net/blog/parallelforall/?p=3357 2023-07-05T19:43:37Z 2014-10-23T19:01:07Z

[This post was co-written by Everett Phillips and Massimiliano Fatica.] The High Performance Conjugate Gradient Benchmark (HPCG) is a new benchmark intended to...]]>

[This post was co-written by Everett Phillips and Massimiliano Fatica.] The High Performance Conjugate Gradient Benchmark (HPCG) is a new benchmark intended to...

HPCG_Figure2_HPCGperf

[This post was co-written by Everett Phillips and Massimiliano Fatica.] The High Performance Conjugate Gradient Benchmark (HPCG) is a new benchmark intended to complement the High-Performance Linpack (HPL) benchmark currently used to rank supercomputers in the TOP500 list. This new benchmark solves a large sparse linear system using a multigrid preconditioned conjugate gradient (PCG) algorithm.

]]> 9 Mark Harris <![CDATA[CUDA Pro Tip: Fast and Robust Computation of Givens Rotations]]> http://www.open-lab.net/blog/parallelforall/?p=3140 2022-08-21T23:37:04Z 2014-04-29T17:59:10Z

A Givens rotation [1] represents a rotation in a plane represented by a matrix of the form $latex G(i, j, \theta) = \begin{bmatrix} 1 & \cdots & 0 &...]]>

A Givens rotation [1] represents a rotation in a plane represented by a matrix of the form $latex G(i, j, \theta) = \begin{bmatrix} 1 & \cdots & 0 &... GPU Pro Tip

GPU Pro Tip

A Givens rotation [1] represents a rotation in a plane represented by a matrix of the form , where the intersections of the th and th columns contain the values and . Multiplying a vector by a Givens rotation matrix represents a rotation of the vector in the plane by radians. According to Wikipedia, the main use of Givens rotations in numerical linear algebra is to introduce zeros in��

]]> 2 ��˳��97caoporen��