Modify training script to account for stored metrics when resuming training #2286
Unanswered
CCanchilaM
asked this question in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem
When resuming training from a checkpoint, the metric from the previously saved best model is ignored.
CheckpointSaver
is initiated withself.best_epoch = None
andself.best_metric = None
, completely ignoring previously saved results. The provided training script (train.py
) does not provide an option to read these metrics.I'm not sure if people are aware of this issue, at least I didn't realize until I restarted a training, and the new best model had lower accuracy than previously reported.
Proposed solution
This can be easily solved with something like:
Please share if there is a better way to do this or if I missed something😅
Beta Was this translation helpful? Give feedback.
All reactions