Lalit Vaidya

Lalit Vaidya is a performance engineer at NVIDIA. He focuses on providing benchmark data for inference and training. He holds a B.Sc. in computer science from the University of the Pacific.

Posts by Lalit Vaidya

Generative AI Feb 14, 2025

Optimizing Qwen2.5-Coder Throughput with NVIDIA TensorRT-LLM Lookahead Decoding

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents,... 7 MIN READ

Generative AI Dec 17, 2024

Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding

Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only... 8 MIN READ

Image of the TensorRT-LLM icon next to multiple other icons of computer activities.

Generative AI Dec 02, 2024

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that... 9 MIN READ