Lalit Vaidya

Lalit Vaidya is a performance engineer at NVIDIA. He focuses on providing benchmark data for inference and training. He holds a B.Sc. in computer science from the University of the Pacific.
Avatar photo

Posts by Lalit Vaidya

Generative AI

Optimizing Qwen2.5-Coder Throughput with NVIDIA TensorRT-LLM Lookahead Decoding

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents,... 7 MIN READ
Generative AI

Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding

Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only... 8 MIN READ
Image of the TensorRT-LLM icon next to multiple other icons of computer activities.
Generative AI

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that... 9 MIN READ