Hao Wu – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2023-06-12T21:09:34Z http://www.open-lab.net/blog/feed/ Hao Wu <![CDATA[Fast, Terabyte-Scale Recommender Training Made Easy with NVIDIA Merlin Distributed-Embeddings]]> http://www.open-lab.net/blog/?p=54372 2022-09-01T23:00:57Z 2022-08-31T16:00:00Z Embeddings play a key role in deep learning recommender models. They are used to map encoded categorical inputs in data to numerical values that can be...]]>

Embeddings play a key role in deep learning recommender models. They are used to map encoded categorical inputs in data to numerical values that can be processed by the math layers or multilayer perceptrons (MLPs). Embeddings often constitute most of the parameters in deep learning recommender models and can be quite large, even reaching into the terabyte scale. It can be difficult to fit…

Source

]]>
0
Hao Wu <![CDATA[Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=34216 2023-06-12T21:09:34Z 2021-07-20T13:00:00Z Deep learning is revolutionizing the way that industries are delivering products and services. These services include object detection, classification, and...]]>

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Deep learning is revolutionizing the way that industries are delivering products and services. These services include object detection, classification, and segmentation for computer vision, and text extraction, classification…

Source

]]>
1
Hao Wu <![CDATA[Int4 Precision for AI Inference]]> http://www.open-lab.net/blog/?p=15821 2023-02-13T17:33:48Z 2019-11-06T18:00:57Z INT4 Precision Can Bring an Additional 59% Speedup Compared to INT8 If there��s one constant in AI and deep learning, it��s never-ending optimization to wring...]]>

If there’s one constant in AI and deep learning, it’s never-ending optimization to wring every possible bit of performance out of a given platform. Many inference applications benefit from reduced precision, whether it’s mixed precision for recurrent neural networks (RNNs) or INT8 for convolutional neural networks (CNNs), where applications can get 3x+ speedups. NVIDIA’s Turing architecture…

Source

]]>
2
���˳���97caoporen����