Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF16 failure of TensorRT 10.0 when running trtexec on GPU RTX4090 #3778

Closed
bernardrb opened this issue Apr 5, 2024 · 4 comments
Closed

BF16 failure of TensorRT 10.0 when running trtexec on GPU RTX4090 #3778

bernardrb opened this issue Apr 5, 2024 · 4 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@bernardrb
Copy link

Description

We are trying to set precisions to BF16 using trtexec

trtexec --onnx=onnx/xl1_encoder.onnx --minShapes=input_image:1x3x1024x1024 --optShapes=input_image:4x3x1024x1024 --maxShapes=input_image:4x3x1024x1024 --bf16 --saveEngine=trtexec/xl1_encoder_bf16.engine --profilingVerbosity=detailed --dumpLayerInfo --dumpProfile --exportLayerInfo=trtexec/layer_info/xl1_encoder_bf16.json --exportProfile=trtexec/profile_info/xl1_encoder_bf16.json

We are using the nvcr.io/nvidia/tensorrt:24.02-py3 container, and to upgrade to the latest version of tensorrt we call:

pip install --extra-index-url https://pypi.nvidia.com tensorrt==10.0.0b6

However, this does not work since it doesn't recognize the argument. Confirmed by the fact that the bf16 option is not there.

On https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix, our GPU RTX4090 is not included in the example devices. However, since it has the Ada Lovelace architecture (compatible with Hopper), that it would support b16.

$ pip list
tensorrt                 10.0.0b6
tensorrt-bindings        9.3.0.post12.dev1
tensorrt-cu12            10.0.0b6
tensorrt-cu12_bindings   10.0.0b6
tensorrt-cu12_libs       10.0.0b6
tensorrt-libs            9.3.0.post11.dev1

Environment

Fri Apr  5 13:45:51 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 40%   31C    P8              5W /  450W |      11MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+


@nvpohanh
Copy link
Collaborator

nvpohanh commented Apr 5, 2024

I don't think pip install ... installs trtexec, so you are probably still using trtexec from TRT 8.6. Could you build the container with TRT 10.0.0 by following the steps here: https://github.com/NVIDIA/TensorRT?tab=readme-ov-file#setting-up-the-build-environment ?

@nvpohanh
Copy link
Collaborator

nvpohanh commented Apr 5, 2024

Or get the new trtexec by downloading the tarball: https://github.com/NVIDIA/TensorRT/blob/release/10.0/docker/ubuntu-22.04.Dockerfile#L85-L99

@bernardrb
Copy link
Author

Or get the new trtexec by downloading the tarball: https://github.com/NVIDIA/TensorRT/blob/release/10.0/docker/ubuntu-22.04.Dockerfile#L85-L99

Thank you! This worked.

@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Apr 7, 2024
@nvpohanh
Copy link
Collaborator

nvpohanh commented Apr 7, 2024

When TRT 10.0 GA is released, the NCG TensorRT container will also be updated to TRT 10.0

@nvpohanh nvpohanh closed this as completed Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants