-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use records_per_epoch
for scheduling? 🤔[question]
#6869
Comments
This was deprecated for PyTorch trials in 0.21.0. The doc you listed needs to be updated to reflect this, and I will make a ticket to do that. Previously, this value was only necessary because our code made it impossible for us to determine the length of the dataset at the time we initialized the trial. But due to some refactoring, we no longer have this lifecycle issue and opted to not require users to give us this value. You are correct that we now use the chief worker's dataset length to determine this epoch length.
|
This still seems undesirable though because as you mention in the docs, the length of a dataset/dataloader can vary depending on your augmentation or sampling strategy |
Hi @charles-viss, can you expand upon your use case a bit more? Is it that you have a fixed dataset, but then run some augmentation so that the size of the dataset is effectively expanded? If so, is it safe to assume that the size of the expanded dataset is the same for every epoch, or will that also vary? Thinking about how we might best address your scenario, but would like to make sure we understand the situation precisely, first. |
Also, which version of |
For example, one use case is training over a fixed dataset with or without category-weighted sampling. Because custom data samplers change the size of the dataloader, i've used the |
One helpful feature could be to have the length of an epoch defined by the length of the dataloader by default, but then could be overridden if a |
We are currently using determined version 0.21.2 |
In 0.21.2, I think converting to using
Thank you for the suggestion, we will consider adding such a feature. One more question about your use case: are you using |
In one situation we use a |
Describe your question
According to the docs
records_per_epoch
can be used to schedule validation and checkpoint frequencies in conjunction with theepoch
scheduling unit in the config file: https://docs.determined.ai/latest/reference/reference-training/experiment-config-reference.html?highlight=gc%20policy#config-records-per-epoch. However, upon upgrading to a newer version of Determined, experiments seem to ignore therecords_per_epoch
field and instead define an epoch by the length of the dataloader. Is there a way to still userecords_per_epoch
to define epoch length instead?The text was updated successfully, but these errors were encountered: