Jatin Gangani

Jatin Gangani is a senior computer architect in the deep learning compute group at NVIDIA. He is passionate about pushing the limits of hardware and software performance for AI inference in the data center. His recent focus is on enhancing the performance of TensorRT-LLM software. Jatin holds a M.Sc. in Computer Engineering from North Carolina State University.
Avatar photo

Posts by Jatin Gangani

Generative AI

Optimizing Qwen2.5-Coder Throughput with NVIDIA TensorRT-LLM Lookahead Decoding

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents,... 7 MIN READ
Generative AI

Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding

Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only... 8 MIN READ