Skip to content

TensorRT

Farley Lai edited this page Apr 30, 2021 · 1 revision

TensorRT is a set of runtime libraries dependent of the CUDA runtime. The python binding wheel can be installed with a custom env pip.

Installation

TensorRT 7.2.2.3 for Linux and CUDA 11.0

os="ubuntu1604"
tag="cuda11.0-trt7.2.2.3-ga-20201211"
sudo dpkg -i nv-tensorrt-repo-${os}-${tag}_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-${tag}/7fa2af80.pub
sudo apt-get update

# Everything including CUDA-11 through dependency
# sudo apt-get install tensorrt

# TensorRT runtime
version="7.2.2-1+cuda11.0"
sudo apt-get install libnvinfer7=${version} libnvonnxparsers7=${version} libnvparsers7=${version} libnvinfer-plugin7=${version} libnvinfer-dev=${version} libnvonnxparsers-dev=${version} libnvparsers-dev=${version} libnvinfer-plugin-dev=${version} python-libnvinfer=${version} python3-libnvinfer=${version}

TensorRT 7.1.3 for Linux and CUDA 10.2

## TensorRT using cuda10.2
version="7.1.3-1+cuda10.2"

sudo apt-get install libnvinfer7=${version} libnvonnxparsers7=${version} libnvparsers7=${version} libnvinfer-plugin7=${version} libnvinfer-dev=${version} libnvonnxparsers-dev=${version} libnvparsers-dev=${version} libnvinfer-plugin-dev=${version} python-libnvinfer=${version} python3-libnvinfer=${version}

Prerequisites

  • CUDA Toolkit
    • official from nvidia: drivers, toolchains and libraries
    • conda distributions:
      • cudatoolkit: runtime libraries
      • cudatoolkit-dev: compiler toolchains
  • TensorRT
    • official distribution:
      • build tools(deb)
      • runtime libraries(deb)
      • python bindings(zip)
        • tensorrt-7.1.3.4-cp37-none-linux_x86_64.whl for custom env
  • CuDNN
    • conda install cudnn-7.6.5
  • PyCUDA
    • pip install pycuda
  • torch2trt
    • pip install --install-option='--plugins' git+https://github.com/NVIDIA-AI-IOT/torch2trt.git@b0cc8e77a0fbd61e96b971a66bbc11326f77c6b5
os="ubuntu1604"
tag="cuda10.2-trt7.1.3.4-ga-20200617"
sudo dpkg -i nv-tensorrt-repo-${os}-${tag}_1-1_amd64.deb

sudo apt-get update
sudo apt-get install tensorrt libcudnn8 cuda-nvrtc-10-2

Inference Engine

Runtime considerations:

  • batch size
  • workspace size
  • mixed precision
  • bounds on dynamic shapes

Optimizations:

  • Elimination of layers whose outputs are not used
  • Elimination of operations which are equivalent to no-op
  • The fusion of convolution, bias and ReLU operations
  • Aggregation of operations with sufficiently similar parameters and the same source tensor
  • Merging of concatenation layers by directing layer outputs to the correct eventual destination
  • Modification of the precision of weights if necessary with calibration to determine the dynamic range of intermediate activations, and hence the appropriate scaling factors for quantization

Specific to:

  • target platform
  • TensorRT version
  • exact GPU model

Optimization Considerations

  • TensorRT allows you to increase GPU memory footprint during the engine building phase with the setMaxWorkspaceSize function. Increasing the limit may affect the number of applications that could share the GPU at the same time. Setting this limit too low may filter out several algorithms and create a suboptimal engine.
  • In general, CUDA programming streams are a way of organizing asynchronous work. Asynchronous commands put into a stream are guaranteed to run in sequence but may execute out of order with respect to other streams. In particular, asynchronous commands in two streams may be scheduled to run concurrently (subject to hardware limitations). In the context of TensorRT and inference, each layer of the optimized final network will require work on the GPU. However, not all layers will be able to fully utilize the computation capabilities of the hardware. Scheduling requests in separate streams allows work to be scheduled immediately as the hardware becomes available without unnecessary synchronization. Even if only some layers can be overlapped, overall performance will improve.

CUDA Binding and Issues

If there is CUDNN_STATUS_MAPPING_ERROR, try to initialize with pycuda before pytorch:

import pycuda.autoinit

Alternatively, manage host/cuda memeory with pytorch only as shown by torch2trt.

References