Author: Harry Kim | NVIDIA Technical Blog

Harry Kim

Harry Kim is a Principal Product Manager at NVIDIA enabling performant and scalable AI/ML inference with Triton. He has experience working on recommendation systems at Meta, AI infrastructure at Intel AI, and Ads ranking and recommendation at Google. He holds a PhD in Statistics from UC Berkeley.

Posts by Harry Kim

Development & Optimization Mar 18, 2025

Introducing NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for... 14 MIN READ

Generative AI Aug 01, 2024

Measuring Generative AI Model Performance Using NVIDIA GenAI-Perf and an OpenAI-Compatible API

NVIDIA offers tools like Perf Analyzer and Model Analyzer to assist machine learning engineers with measuring and balancing the trade-off between latency and... 6 MIN READ