Carl (Izzy) Putterman – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-01-11T17:32:51Z http://www.open-lab.net/blog/feed/ Carl (Izzy) Putterman <![CDATA[Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding]]> http://www.open-lab.net/blog/?p=94146 2024-12-19T23:03:40Z 2024-12-17T17:00:00Z Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only...]]>

Meta’s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model. Llama 3.3 provides enhanced performance respective to the older Llama 3.1 70B model and can even match the capabilities of the larger, more computationally expensive Llama 3.1 405B model on several tasks including math, reasoning, coding…

Source

]]>
2
Carl (Izzy) Putterman <![CDATA[TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x]]> http://www.open-lab.net/blog/?p=92847 2025-01-11T17:32:51Z 2024-12-02T23:09:43Z NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that...]]>

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that provides blazing-fast inference support for numerous popular large language models (LLMs) on NVIDIA GPUs. By adding support for speculative decoding on single GPU and single-node multi-GPU, the library further expands its supported…

Source

]]>
3
Carl (Izzy) Putterman <![CDATA[NVIDIA NIM 1.4 Ready to Deploy with 2.4x Faster Inference]]> http://www.open-lab.net/blog/?p=92172 2024-11-20T04:40:21Z 2024-11-16T00:41:54Z The demand for ready-to-deploy high-performance inference is growing as generative AI reshapes industries. NVIDIA NIM provides production-ready microservice...]]>

The demand for ready-to-deploy high-performance inference is growing as generative AI reshapes industries. NVIDIA NIM provides production-ready microservice containers for AI model inference, constantly improving enterprise-grade generative AI performance. With the upcoming NIM version 1.4 scheduled for release in early December, request performance is improved by up to 2.4x out-of-the-box with…

Source

]]>
Carl (Izzy) Putterman <![CDATA[Beating SOTA Inference Performance on NVIDIA GPUs with GPUNet]]> http://www.open-lab.net/blog/?p=54172 2023-06-12T09:01:44Z 2022-08-30T20:02:14Z Crafted by AI for AI, GPUNet is a class of convolutional neural networks designed to maximize the performance of NVIDIA GPUs using NVIDIA TensorRT. Built using...]]>

Crafted by AI for AI, GPUNet is a class of convolutional neural networks designed to maximize the performance of NVIDIA GPUs using NVIDIA TensorRT. Built using novel neural architecture search (NAS) methods, GPUNet demonstrates state-of-the-art inference performance up to 2x faster than EfficientNet-X and FBNet-V3. The NAS methodology helps build GPUNet for a wide range of applications…

Source

]]>
0
Carl (Izzy) Putterman <![CDATA[Time Series Forecasting with the NVIDIA Time Series Prediction Platform and Triton Inference Server]]> http://www.open-lab.net/blog/?p=44168 2022-08-21T23:53:25Z 2022-02-15T16:00:00Z In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary...]]>

In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary combinations of forecasting models, time-series datasets, and other configurations. The TSPP also provides functionality to explore the hyperparameter search space, run accelerated model training using distributed training and Automatic Mixed…

Source

]]>
3
���˳���97caoporen����