Julien Demouth

Julien is a Senior Manager in the GPU Architecture group at NVIDIA. He is one of the co-authors?of many of the low level implementations for Deep Learning in cuDNN and TensorRT. Among other things, Julien wrote the first version of FFT-based 2D convolutions for cuDNN, he wrote a large fraction of the Implicit GEMM convolutions for Maxwell, Pascal and Volta GPUs, and he is the author of several?Winograd implementations. Julien holds?a Ph.D. in Computational Geometry from INRIA in France.
Avatar photo

Posts by Julien Demouth

Simulation / Modeling / Design

CUTLASS: Fast Linear Algebra in CUDA C++

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview... 25 MIN READ
Data Center / Cloud

How We Achieved Record Finance Benchmark Performance on Tesla K80

STAC Research develops financial benchmarks in partnership with leading banks and software or hardware vendors. The STAC-A2 suite of benchmarks aims?to... 7 MIN READ
GPU Pro Tip
Simulation / Modeling / Design

CUDA Pro Tip: Minimize the Tail Effect

When I work on the optimization of CUDA kernels, I sometimes see a discrepancy between Achieved and Theoretical Occupancies. The Theoretical Occupancy is the... 3 MIN READ