• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • NVIDIA TensorRT – Inference ??? ? ???? ?? NVIDIA? Toolkit

    Reading Time: 4 minutes

    TensorRT? ??? Deep Learning ??? ????? NVIDIA GPU ???? Inference ??? ?? ~ ??? ?? ???? Deep Learning ??? TCO (Total Cost of Ownership) ?  ????? ??? ? ? ?? ?? ??? ?????. ? ?????? Deep Learning ??? ?? ? ??? ???? ?? ?? ??? ? ?????? ?? TensorRT? ??? ????, ?? ??? TensorRT ?? ??? ?? ???? ???? ??? ???? ???. TensorRT? project? ????? ????, ?? ? ????? ???? ??? ??? ??, NVIDIA? Deep Learning ?? ???? Solutions Architect ? Data Scientist?? ??? ????? ????.

    1. Introduction

    TensorRT? NVIDIA GPU ??? ??? ??? ???? ???? ??? ????? Optimizer? ??? GPU?? ????? ???? Runtime Engine? ?????. TensorRT? ???? Deep Learning Frameworks (TensorFlow, PyTorch ?) ?? ??? ??? ????, NVIDIA Datacenter, Automotive, Embedded ??? ? ???? NVIDIA GPU ???? ??? ???? ?? ????, ??? Deep Learning model Inference ??? ?????.

    TensorRT introduction
    Figure 1. TensorRT introduction
    1. TensorRT developments

    TensorRT? C++ ? Python ??? API ???? ???? ?? ???, GPU programming language? CUDA ??? ??? ???? Deep Learning ??? ????? ?? ??? ? ???, Figure 2.? ??? ?? ??? ? ????.

    TensorRT Workflow
    Figure 2. TensorRT Workflow

    TensorRT? GPU? ???? ?? ??? ??? ?? ??? ???? ??? ? ??? Runtime binary? ????? ???, Latency ? Throughput? ?? ???? ? ??, ?? ?? Deep Learning ?? ???? ? ???? ???? ??? ?????. ?? ?? ??? NVIDIA? Datacenter, Automotive, Embedded ?? ?? ???? ?? ??? Kernel? ???? ? ??, ? ????? ?? ??? ???? ?????. ??? ??? Deep Learning layer ? ??? ??? customization? ? ?? ???? ???? ?? ?? ????? ???? TensorRT? ??? ? ??? ??? ????. ?? ??? ??? ????? ?? ???? ???? ??? ?? ??? ?????.

    1. TensorRT optimizations

    TensorRT? NVIDIA platform?? ??? Inference ??? ? ? ??? Network compression, Network optimization ??? GPU ??? ???? ?? Deep Learning ??? ???? ?????. ????, TensorRT?? Inference ??? ?? ???? ???? ???? ??? ????? ?????.

     TensorRT optimizations
    Figure 3. TensorRT optimizations
    • Quantization & Precision Calibration

    ?? Deep Learning Training ? Inference??? Precision reduction ? ?? ???? ??? ?????. ?? Precision? Network? ?? data? ?? ? weight?? bit?? ?? ??? ? ??? ???? ??? ?????. ?? ?? Quantization ??? ?, TensorRT? Symmetric Linear Quantization(Figure 4.)? ???? ???, ?? ??? Deep Learning Framework? ???? FP32? data? FP16 ? INT8? data type?? precision? ?? ? ????.

    Figure 4. Symmetric Linear Quantization
    Figure 4. Symmetric Linear Quantization

    ????? FP16??? precision down-scale? Network? accuracy drop? ? ??? ??? ???, INT8?? down-scale? accuracy drop? ??? ? ??? Network? ????? ???? calibration ???? ?????. (Figure 5. ??) ?? ??? TensorRT???EntronpyCalibrator, EntropyCalibrator2 ??? MinMaxCalibrator? ???? ???, ?? ???? quantization? weight ? intermediate tensor?? ??? ??? ??? ? ? ????.

    Figure 5. Calibration methodology
    Figure 5. Calibration methodology
    • Graph Optimization

    ????? Graph Optimization? Deep Learning Network?? ???? primitive ?? ??, compound ?? ??? graph node?? ? platform? ???? code? ???? ??? ?????. TensorRT??? ?? ???? Layer Fusion ??? Tensor Fusion ??? ??? ???? ????. Figure 6? ?? Layer Fusion? Vertical Layer Fusion, Horizontal Layer Fusion ??? Tensor Fusion? ???? model graph? ??? ???? ?? ??? model? layer ??? ?? ???? ???.

    Figure 6. Layer & Tensor Fusion and its results
    Figure 6. Layer & Tensor Fusion and its results
    • Kernel Auto-tuning

    ?? ??? ?? ?? TensorRT? NVIDIA? ??? platform ? architecture? ?? Runtime ??? ?? ???. ? ???? CUDA engine? ??, architecture, memory ??? specialized engine ?? ??? ?? optimize? kernel ? ??? ??? ?? TensorRT Runtime engine build?? ????? ???? ??? engine binary ??? ????.

    • Dynamic Tensor Memory & Multi-stream execution

    ? ??? Memory management? ??? footprint? ?? ???? ? ? ??? ???? Dynamic tensor memory ??? ?? ??, CUDA stream ??? ???? multiple input stream? scheduling? ?? ?? ??? ??? ? ? ?? Multi-stream execution?? ?? ?????.

    1. TensorRT performances

    ?? ??? TenosrRT? ???? ??? ?? ? ?? ?? ??? ????? ?????. ???? ResNet50 ???? ?? ??? GPU?? TensorRT? ???? ????? ?? 8? ??? ?? ?? ??? ????. (Figure 7. ??) ??? ?? ?? ?? ??? ??? ??? TensorRT  Inference? Deep Learning ???? ???? ??? ??? ???? ??? ??? ? ??? ?? ?? ? ????. ?? ??? ?? ?? ??? ??? URL (http://www.open-lab.net/deep-learning-performance-training-Inference#deeplearningperformance_Inference) ?? NVIDIA platform ? Network?? benchmark?? ??? ?? ? ? ????.

    Figure 7. ResNet50 Performance Benchmark
    Figure 7. ResNet50 Performance Benchmark
    1. ??

    ???? NVIDIA? Deep Learning Inference ??? ?? solution? TensorRT? ??? ???????. TensorRT? ??? Deep Learning Framework? ???? ?? training ? Neural Network?? ? domain? ?? NVIDIA? GPU ????? ????? Inference? ?? ?? Toolkit ?? library???. ?? ???? ??? ? ?? ??? ?? Deep Learning ?? ??? ?? ?? ??? ?? ??? ??? ???? ????. ???? ?? ?? ?? ? ?? ?? ?? ?? ??? NVIDIA?? ??? ??? ???? ???? ?? ??? ????.

    Discuss (0)
    +2

    Tags

    ?? ???

    人人超碰97caoporen国产