-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [BUG] Cannot restart run with different dataset #349
Comments
Hi @pablo-unzueta , Thanks for your interest in our code! I'm not sure why this is happening, but You could also try adding |
Thanks for you advice! I tried I also tried
|
I couldn't figure out how to set global_scale=None in the config, but I just set the global_scale_scale: 1.1e-6 so it wouldn't raise the ValueError due to it being lower than the threshold. Does this seem ok? |
Yes, in princple if you just set it to some number, it will get overriden by the loaded state dict, but I'm still not totally sure why this is happening at all. If you do this, does it pass sanity checks? Like is the starting validation and training loss the same as before if you restart with the same dataset? |
Describe the bug
I am trying to restart a training instance using load_model_state or initialize_from_state. I keep receiving an error that scale_by from the state_dict is empty while in the new run it is of size 1:
RuntimeError: Error(s) in loading state_dict for RescaleOutput:
size mismatch for scale_by: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]).
I also tried load_model_state_strict: false but that yielded the same error
To Reproduce
Attached are the yaml files I used. I start the training with energy_only.yaml. After training for some time, I want to restart using a different dataset using the restart.yaml file.
Expected behavior
Training should resume according to #343 or #297
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
configs.zip
The text was updated successfully, but these errors were encountered: