Ethem Can – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-18T18:22:39Z http://www.open-lab.net/blog/feed/ Ethem Can <![CDATA[Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models]]> http://www.open-lab.net/blog/?p=61372 2025-03-18T18:22:39Z 2023-03-13T14:00:00Z In many production-level machine learning (ML) applications, inference is not limited to running a forward pass on a single ML model. Instead, a pipeline of ML...]]>

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. In many production-level machine learning (ML) applications, inference is not limited to running a forward pass on a single ML model. Instead, a pipeline of ML models often needs to be executed. Take, for example, a conversational AI pipeline that consists of three modules: an automatic speech recognition (ASR) module to…

Source

]]>
1
Ethem Can <![CDATA[Accelerating Machine Learning Model Inference on Google Cloud Dataflow with NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=34954 2022-08-21T23:52:18Z 2021-07-21T19:30:00Z Today, in partnership with NVIDIA, Google Cloud announced Dataflow is bringing GPUs to the world of big data processing to unlock new possibilities. With...]]>

Today, in partnership with NVIDIA, Google Cloud announced Dataflow is bringing GPUs to the world of big data processing to unlock new possibilities. With Dataflow GPU, users can now leverage the power of NVIDIA GPUs in their machine learning inference workflows. Here we show you how to access these performance benefits with BERT. Google Cloud’s Dataflow is a managed service for executing a…

Source

]]>
0
Ethem Can <![CDATA[Profiling and Optimizing Deep Neural Networks with DLProf and PyProf]]> http://www.open-lab.net/blog/?p=21005 2024-08-28T17:55:38Z 2020-09-28T18:33:08Z Software profiling is key for achieving the best performance on a system and that��s true for the data science and machine learning applications as well. In...]]>

Software profiling is key for achieving the best performance on a system and that’s true for the data science and machine learning applications as well. In the era of GPU-accelerated deep learning, when profiling deep neural networks, it is important to understand CPU, GPU, and even memory bottlenecks, which could cause slowdowns in training or inference. In this post…

Source

]]>
13
���˳���97caoporen����