Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any gpu memory optimization compare TensorRT 8.6.1.6 to 8.4.0.6 ? #3743

Closed
Jsy0220 opened this issue Mar 27, 2024 · 3 comments
Closed
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@Jsy0220
Copy link

Jsy0220 commented Mar 27, 2024

Hi, I tested the same model on the same machine with TensorRT 8.6.1.6 and 8.4.0.6 as following steps

  1. use bin/trtexec to trans onnx model to engine file with 8.6 and 8.4 respectively by the same command bin/trtexec --onnx=xxxx.onnx --saveEngine=xxx.engine(this model has fixed input shapes)
  2. load this engine in the same c++ code and run.
  3. check gpu memory usage by nvidia-smi

The following are the results
8.6

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   66C    P0    63W /  70W |    283MiB / 15360MiB |     61%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

8.4

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   69C    P0    73W /  70W |    853MiB / 15360MiB |     59%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

It can be seen that the gpu memory consumption is significantly reduced from the above.

So

  1. is there any gpu memory optimization in TensorRT 8.6.1.6 compare to 8.4.0.6 ?
  2. If not, is there anything special parameter should be set when using bin/trtexec in 8.4 but default in 8.6 ?

Thanks !!!

@zerollzeng
Copy link
Collaborator

we do have keep optimize the build time memory consumption over versions.

@zerollzeng zerollzeng self-assigned this Mar 28, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Mar 28, 2024
@zerollzeng
Copy link
Collaborator

close this, feel free to reopen if you have any further questions.

@Jsy0220
Copy link
Author

Jsy0220 commented May 28, 2024

Hi, I have a further question about difference between createExecutionContext and createExecutionContextWithoutDeviceMemory, Is there have any gpu memory difference when using the corresponding context ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants