-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainer does not release all CUDA memory #567
Comments
Currently get the same issue.. |
Did you get the solution lately? @lopozz |
Hi, this is likely a problem of sentence-transformers. I started to collect information here and linked from a few other issues, this might give you some pointers: UKPLab/sentence-transformers#1793 . As far as I know, this is still unsolved. |
Thank you very much for your response, @chschroeder! Since this issue is still unsolved, I tried exploring alternative approaches to address it. The main reason I encountered this error is that I need to initialize the To work around this issue, I modified my approach by:
Here’s the snippet for the def reset_parameters(model):
def reset_model_body(model_body):
if model_body is not None:
def init_weights(module):
if hasattr(module, 'reset_parameters'):
module.reset_parameters()
print("Model body parameters reset using `reset_parameters` function.")
elif isinstance(module, (torch.nn.Linear, torch.nn.Embedding)):
torch.nn.init.xavier_uniform_(module.weight)
if module.bias is not None:
torch.nn.init.zeros_(module.bias)
print("Model body parameters reset using Xavier initialization and zero bias.")
model_body.apply(init_weights)
print("Model body parameters have been successfully reset.")
if model.model_body is not None:
reset_model_body(model.model_body)
if hasattr(model.model_head, 'apply'):
model.model_head.apply(model.model_head._init_weight)
print("Model head parameters reset using `_init_weight` function.")
if hasattr(model.model_head.linear, 'reset_parameters'):
model.model_head.linear.reset_parameters()
print("Model head linear parameters reset using `reset_parameters` function.")
# Use this inside the loop
reset_parameters(model) With that approach, I use " Using this approach, I managed to avoid the CUDA memory issue for now. I hope this helps anyone facing a similar challenge! |
Interesting, thanks for the feedback. I was thinking: why should resetting the weights free memory? This means you are suspecting gradients to be the cause of the memory increase, right? |
Yup, I believe gradients could be one of the possible causes, but I don’t rule out the possibility of |
Im am currently trying to run a kfold trining loop. At the end of each iteration I free memory using
gc.collect()
andtorch.cuda.empty_cache()
but seems not to do the job completely. I leave the code here:and my setup:
I also leave the memory printed at each iteration:
Memory allocated: 279.685546875
Memory reserved: 596.0
Memory allocated: 279.685546875
Memory reserved: 342.0
Memory allocated: 411.4501953125
Memory reserved: 738.0
Memory allocated: 411.4501953125
Memory reserved: 484.0
Memory allocated: 542.93359375
Memory reserved: 876.0
Memory allocated: 542.93359375
Memory reserved: 626.0
Memory allocated: 674.4638671875
Memory reserved: 1052.0
Memory allocated: 674.4638671875
Memory reserved: 780.0
Does anyone could suggest the reason?
The text was updated successfully, but these errors were encountered: