Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory usage #51

Open
sunyi000 opened this issue Jul 18, 2023 · 2 comments
Open

GPU memory usage #51

sunyi000 opened this issue Jul 18, 2023 · 2 comments

Comments

@sunyi000
Copy link

sunyi000 commented Jul 18, 2023

We have nvidia GPU with 12G memory.

If I allocate less than 3 GPUs to run the example mono_channel_3D,ipynb, I got GPU out of memory error.
From nvidia-smi, I can see that all 3 GPUs are fully utilized, therefore the OOM errors.
The error comes from cell 5, 'Run segmentation' of the example notebook.

If I allocate 5 GPUs to the same notebook, it seems only the first 3 GPU are used, the 4th and 5th are not used, therefore I still get OOM errors.

Also, it seems torch.cuda.empty_cache() did not do anything. The only way to clear GPU memory is to close the jupyter notebook and reopen it.

I'm not sure how to work around this issue, and how to tweak the model.

Appreciate any help on this
Thank you

@kevinjohncutler
Copy link
Owner

Hi @sunyi000 , do you have 12GB memory per GPU, or in total? How large are your images? You can use the tile=True parameter to run smaller sections of your images to avoid suing too much memory. This is especially necessary when running 3D models on low-VRAM GPUs. The tyx parameter allows you to set the size of each tile.

Unfortunately, I have run into the same empty_cache() issue and do not know of a solution. The problem seems to be when a memory error occurs. It can work before an error, but not after (from what I have observed). Restarting the kernel is indeed the only fix I know of.

@sunyi000
Copy link
Author

Thanks @kevinjohncutler
I have 12G per GPU, and I'm running the example notebook here mono_channel_3D.ipynb

I'm running this Jupyterlab notebook inside a singularity container. I can see from nvidia-smi output, the notebook uses GPU 1 initially, when the memory is used up, it uses GPU2, and GPU3, but never uses GPU4 and 5.

So the example notebook uses img02.png which is just 130k, but still uses lots of memory. I don't where is the issue.

Yi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants