BF16 failure of TensorRT 10.0 when running trtexec on GPU RTX4090 #3778

bernardrb · 2024-04-05T13:51:06Z

Description

We are trying to set precisions to BF16 using trtexec

trtexec --onnx=onnx/xl1_encoder.onnx --minShapes=input_image:1x3x1024x1024 --optShapes=input_image:4x3x1024x1024 --maxShapes=input_image:4x3x1024x1024 --bf16 --saveEngine=trtexec/xl1_encoder_bf16.engine --profilingVerbosity=detailed --dumpLayerInfo --dumpProfile --exportLayerInfo=trtexec/layer_info/xl1_encoder_bf16.json --exportProfile=trtexec/profile_info/xl1_encoder_bf16.json

We are using the nvcr.io/nvidia/tensorrt:24.02-py3 container, and to upgrade to the latest version of tensorrt we call:

pip install --extra-index-url https://pypi.nvidia.com tensorrt==10.0.0b6

However, this does not work since it doesn't recognize the argument. Confirmed by the fact that the bf16 option is not there.

On https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix, our GPU RTX4090 is not included in the example devices. However, since it has the Ada Lovelace architecture (compatible with Hopper), that it would support b16.

$ pip list
tensorrt                 10.0.0b6
tensorrt-bindings        9.3.0.post12.dev1
tensorrt-cu12            10.0.0b6
tensorrt-cu12_bindings   10.0.0b6
tensorrt-cu12_libs       10.0.0b6
tensorrt-libs            9.3.0.post11.dev1

Environment

Fri Apr  5 13:45:51 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 40%   31C    P8              5W /  450W |      11MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

nvpohanh · 2024-04-05T14:01:04Z

I don't think pip install ... installs trtexec, so you are probably still using trtexec from TRT 8.6. Could you build the container with TRT 10.0.0 by following the steps here: https://github.com/NVIDIA/TensorRT?tab=readme-ov-file#setting-up-the-build-environment ?

nvpohanh · 2024-04-05T14:02:00Z

Or get the new trtexec by downloading the tarball: https://github.com/NVIDIA/TensorRT/blob/release/10.0/docker/ubuntu-22.04.Dockerfile#L85-L99

bernardrb · 2024-04-05T15:02:01Z

Or get the new trtexec by downloading the tarball: https://github.com/NVIDIA/TensorRT/blob/release/10.0/docker/ubuntu-22.04.Dockerfile#L85-L99

Thank you! This worked.

nvpohanh · 2024-04-07T14:50:26Z

When TRT 10.0 GA is released, the NCG TensorRT container will also be updated to TRT 10.0

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 7, 2024

zerollzeng assigned nvpohanh Apr 7, 2024

nvpohanh closed this as completed Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BF16 failure of TensorRT 10.0 when running trtexec on GPU RTX4090 #3778

BF16 failure of TensorRT 10.0 when running trtexec on GPU RTX4090 #3778

bernardrb commented Apr 5, 2024

nvpohanh commented Apr 5, 2024

nvpohanh commented Apr 5, 2024

bernardrb commented Apr 5, 2024

nvpohanh commented Apr 7, 2024

BF16 failure of TensorRT 10.0 when running trtexec on GPU RTX4090 #3778

BF16 failure of TensorRT 10.0 when running trtexec on GPU RTX4090 #3778

Comments

bernardrb commented Apr 5, 2024

Description

Environment

nvpohanh commented Apr 5, 2024

nvpohanh commented Apr 5, 2024

bernardrb commented Apr 5, 2024

nvpohanh commented Apr 7, 2024