NVIDIA TensorRT – Inference ??? ? ???? ?? NVIDIA? Toolkit

Reading Time: 4 minutes

TensorRT? ??? Deep Learning ??? ????? NVIDIA GPU ???? Inference ??? ?? ~ ??? ?? ???? Deep Learning ??? TCO (Total Cost of Ownership) ? ????? ??? ? ? ?? ?? ??? ?????. ? ?????? Deep Learning ??? ?? ? ??? ???? ?? ?? ??? ? ?????? ?? TensorRT? ??? ????, ?? ??? TensorRT ?? ??? ?? ???? ???? ??? ???? ???. TensorRT? project? ????? ????, ?? ? ????? ???? ??? ??? ??, NVIDIA? Deep Learning ?? ???? Solutions Architect ? Data Scientist?? ??? ????? ????.

Introduction

TensorRT? NVIDIA GPU ??? ??? ??? ???? ???? ??? ????? Optimizer? ??? GPU?? ????? ???? Runtime Engine? ?????. TensorRT? ???? Deep Learning Frameworks (TensorFlow, PyTorch ?) ?? ??? ??? ????, NVIDIA Datacenter, Automotive, Embedded ??? ? ???? NVIDIA GPU ???? ??? ???? ?? ????, ??? Deep Learning model Inference ??? ?????.

TensorRT developments

TensorRT? C++ ? Python ??? API ???? ???? ?? ???, GPU programming language? CUDA ??? ??? ???? Deep Learning ??? ????? ?? ??? ? ???, Figure 2.? ??? ?? ??? ? ????.

TensorRT? GPU? ???? ?? ??? ??? ?? ??? ???? ??? ? ??? Runtime binary? ????? ???, Latency ? Throughput? ?? ???? ? ??, ?? ?? Deep Learning ?? ???? ? ???? ???? ??? ?????. ?? ?? ??? NVIDIA? Datacenter, Automotive, Embedded ?? ?? ???? ?? ??? Kernel? ???? ? ??, ? ????? ?? ??? ???? ?????. ??? ??? Deep Learning layer ? ??? ??? customization? ? ?? ???? ???? ?? ?? ????? ???? TensorRT? ??? ? ??? ??? ????. ?? ??? ??? ????? ?? ???? ???? ??? ?? ??? ?????.

TensorRT optimizations

TensorRT? NVIDIA platform?? ??? Inference ??? ? ? ??? Network compression, Network optimization ??? GPU ??? ???? ?? Deep Learning ??? ???? ?????. ????, TensorRT?? Inference ??? ?? ???? ???? ???? ??? ????? ?????.

Quantization & Precision Calibration

?? Deep Learning Training ? Inference??? Precision reduction ? ?? ???? ??? ?????. ?? Precision? Network? ?? data? ?? ? weight?? bit?? ?? ??? ? ??? ???? ??? ?????. ?? ?? Quantization ??? ?, TensorRT? Symmetric Linear Quantization(Figure 4.)? ???? ???, ?? ??? Deep Learning Framework? ???? FP32? data? FP16 ? INT8? data type?? precision? ?? ? ????.

????? FP16??? precision down-scale? Network? accuracy drop? ? ??? ??? ???, INT8?? down-scale? accuracy drop? ??? ? ??? Network? ????? ???? calibration ???? ?????. (Figure 5. ??) ?? ??? TensorRT???EntronpyCalibrator, EntropyCalibrator2 ??? MinMaxCalibrator? ???? ???, ?? ???? quantization? weight ? intermediate tensor?? ??? ??? ??? ? ? ????.

Graph Optimization

????? Graph Optimization? Deep Learning Network?? ???? primitive ?? ??, compound ?? ??? graph node?? ? platform? ???? code? ???? ??? ?????. TensorRT??? ?? ???? Layer Fusion ??? Tensor Fusion ??? ??? ???? ????. Figure 6? ?? Layer Fusion? Vertical Layer Fusion, Horizontal Layer Fusion ??? Tensor Fusion? ???? model graph? ??? ???? ?? ??? model? layer ??? ?? ???? ???.

Figure 6. Layer & Tensor Fusion and its results

Kernel Auto-tuning

?? ??? ?? ?? TensorRT? NVIDIA? ??? platform ? architecture? ?? Runtime ??? ?? ???. ? ???? CUDA engine? ??, architecture, memory ??? specialized engine ?? ??? ?? optimize? kernel ? ??? ??? ?? TensorRT Runtime engine build?? ????? ???? ??? engine binary ??? ????.

Dynamic Tensor Memory & Multi-stream execution

? ??? Memory management? ??? footprint? ?? ???? ? ? ??? ???? Dynamic tensor memory ??? ?? ??, CUDA stream ??? ???? multiple input stream? scheduling? ?? ?? ??? ??? ? ? ?? Multi-stream execution?? ?? ?????.

TensorRT performances

?? ??? TenosrRT? ???? ??? ?? ? ?? ?? ??? ????? ?????. ???? ResNet50 ???? ?? ??? GPU?? TensorRT? ???? ????? ?? 8? ??? ?? ?? ??? ????. (Figure 7. ??) ??? ?? ?? ?? ??? ??? ??? TensorRT Inference? Deep Learning ???? ???? ??? ??? ???? ??? ??? ? ??? ?? ?? ? ????. ?? ??? ?? ?? ??? ??? URL (http://www.open-lab.net/deep-learning-performance-training-Inference#deeplearningperformance_Inference) ?? NVIDIA platform ? Network?? benchmark?? ??? ?? ? ? ????.

Figure 7. ResNet50 Performance Benchmark

???? NVIDIA? Deep Learning Inference ??? ?? solution? TensorRT? ??? ???????. TensorRT? ??? Deep Learning Framework? ???? ?? training ? Neural Network?? ? domain? ?? NVIDIA? GPU ????? ????? Inference? ?? ?? Toolkit ?? library???. ?? ???? ??? ? ?? ??? ?? Deep Learning ?? ??? ?? ?? ??? ?? ??? ??? ???? ????. ???? ?? ?? ?? ? ?? ?? ?? ?? ??? NVIDIA?? ??? ??? ???? ???? ?? ??? ????.