-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save model weights for each epoch 1720 #1921
Save model weights for each epoch 1720 #1921
Conversation
07b943d
to
c6fa61d
Compare
This is a simple and elegant solution to this problem. Really appreciate the high-quality PR! Some minor changes I would suggest are:
Additionally, if we want to make this field usable in the RV-as-a-framework context, we would also want to add it to |
Adds `save_all_checkpoints` to `PyTorchlearnerBackendConfig`
…l_checkpoints` to `LearnerConfig`
Adds `save_all_checkpoints` to `PyTorchlearnerBackendConfig`
a3bc602
to
a1e60e6
Compare
Looks like there are linter errors. You can run If you are using VS Code, you can install the |
Codecov Report
@@ Coverage Diff @@
## master #1921 +/- ##
==========================================
- Coverage 82.49% 82.47% -0.02%
==========================================
Files 190 190
Lines 9316 9325 +9
==========================================
+ Hits 7685 7691 +6
- Misses 1631 1634 +3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thank you!
Thanks for your help. |
* Save model's weights for each epoch azavea#1720 * Fixes azavea#1720 (azavea#1921 (comment) - first point) Adds `save_all_checkpoints` to `LearnerConfig` * Fixes azavea#1720 (azavea#1921 (comment) (comment) - second point) Adds `save_all_checkpoints` to `PyTorchlearnerBackendConfig` * Formats code (yapf) azavea#1720
Overview
Save model weights for each epoch.
Notes
From #1720, proposition 1 is followed. Last checkpoint is still kept as
last-model.pth
. Instead of overwriting, it copies the previous checkpoint tomodel-ckpt-epoch-{N}.pth
whereN
is the epoch number.Testing Instructions
Learner
is extended. The optionsave_all_checkpoints
can be added as inCloses #1720