Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory failure of TensorRT 10.5 when running flux dit on GPU L40S #4214

Open
QZH-eng opened this issue Oct 21, 2024 · 5 comments
Open
Labels
Demo: Diffusion Issues regarding demoDiffusion triaged Issue has been triaged by maintainers

Comments

@QZH-eng
Copy link

QZH-eng commented Oct 21, 2024

Description

I tried to convert the Flux Dit model on L40S with TensorRT10.5, and found that the peak gpu memory exceeded 46068MiB, but 23597MiB gpu memory was occupied during inference. Is this normal? If normal, what measures can be taken to reduce the gpu memory usage during model conversion so that Flux TensorRT inference can be run normally in the L40S

[10/17/2024-11:07:02] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 22681 MiB, GPU 49917 MiB

Environment

TensorRT Version: 10.5

**NVIDIA GPU **: L40S

NVIDIA Driver Version: 535.129.03

CUDA Version: 12.2

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@algorithmconquer
Copy link

@QZH-eng I also encountered this OOM (Out of Memory) issue in L40s.

@algorithmconquer
Copy link

@QZH-eng I encountered OOM (Out of Memory) issue during inference, Specifically ,
when executing "engine_from_bytes(bytes_from_path(self.engine_path))" this step, an OOM (Out of Memory) error occurs.Can you share your code?

@QZH-eng
Copy link
Author

QZH-eng commented Oct 21, 2024

@QZH-eng I encountered OOM (Out of Memory) issue during inference, Specifically , when executing "engine_from_bytes(bytes_from_path(self.engine_path))" this step, an OOM (Out of Memory) error occurs.Can you share your code?

I encountered out of gpu memory when I was converting models on the L40S, when execute trtexec to convert onnx to BF16 plan on the command line.

@yuanyao-nv
Copy link
Collaborator

You can try quantization with modelopt to reduce the engine size.

@yuanyao-nv yuanyao-nv added triaged Issue has been triaged by maintainers Demo: Diffusion Issues regarding demoDiffusion labels Oct 31, 2024
@asfiyab-nvidia
Copy link
Collaborator

@QZH-eng the flux demo should now run on L40S as we have added memory optimizations in release/10.6. Can you please try again and update here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Demo: Diffusion Issues regarding demoDiffusion triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants