Harry Kim – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-24T20:52:54Z http://www.open-lab.net/blog/feed/ Harry Kim <![CDATA[Introducing NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models]]> http://www.open-lab.net/blog/?p=95274 2025-03-24T20:52:54Z 2025-03-18T17:50:00Z NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for...]]>

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The framework boosts the number of requests served by up to 30x, when running the open-source DeepSeek-R1 models on NVIDIA Blackwell.

Source

]]>
Harry Kim <![CDATA[Measuring Generative AI Model Performance Using NVIDIA GenAI-Perf and an OpenAI-Compatible API]]> http://www.open-lab.net/blog/?p=85839 2024-08-22T18:25:47Z 2024-08-01T15:00:00Z NVIDIA offers tools like Perf Analyzer and Model Analyzer to assist machine learning engineers with measuring and balancing the trade-off between latency and...]]>

NVIDIA offers tools like Perf Analyzer and Model Analyzer to assist machine learning engineers with measuring and balancing the trade-off between latency and throughput, crucial for optimizing ML inference performance. Model Analyzer has been embraced by leading organizations such as Snap to identify optimal configurations that enhance throughput and reduce deployment costs. However…

Source

]]>
���˳���97caoporen����