Author: Julien Demouth | NVIDIA Technical Blog

Julien Demouth

Julien is a Senior Manager in the GPU Architecture group at NVIDIA. He is one of the co-authors?of many of the low level implementations for Deep Learning in cuDNN and TensorRT. Among other things, Julien wrote the first version of FFT-based 2D convolutions for cuDNN, he wrote a large fraction of the Implicit GEMM convolutions for Maxwell, Pascal and Volta GPUs, and he is the author of several?Winograd implementations. Julien holds?a Ph.D. in Computational Geometry from INRIA in France.

Posts by Julien Demouth

Simulation / Modeling / Design Dec 05, 2017

CUTLASS: Fast Linear Algebra in CUDA C++

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview... 25 MIN READ

Data Center / Cloud Dec 16, 2014

How We Achieved Record Finance Benchmark Performance on Tesla K80

STAC Research develops financial benchmarks in partnership with leading banks and software or hardware vendors. The STAC-A2 suite of benchmarks aims?to... 7 MIN READ

Simulation / Modeling / Design Jun 04, 2014

CUDA Pro Tip: Minimize the Tail Effect

When I work on the optimization of CUDA kernels, I sometimes see a discrepancy between Achieved and Theoretical Occupancies. The Theoretical Occupancy is the... 3 MIN READ