You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this exciting project! I have been using it for some time and I have recently found a potential problem which may lead to too small batch size in some training epochs. Please feel free to correct me if any of my understanding is wrong.
I note the batch size for dataloader is set to None as here. So I think each worker will return a batch with the size defined in the config file (like here). As a result, the last batch given by each worker may be very small, if the length of data list return by the sampling step is not divisible by the batch size and the reminder is small. The number of such small batches equals to the number of workers. This leads to some noisy backward passes with small batch size. If the amount of training data is small, the effect of these backward passes may be non-neglectable (think of an extreme case: 8 workers, batch size 128, and the data list of each worker has 129 items; then it is 8 updates with batch size 128 and 8 updates with batch size 1).
What do you think of it? Thanks!
The text was updated successfully, but these errors were encountered:
Thank you for bringing this potential issue to our attention.
It is important to consider the impact of small batch sizes on the training process, especially for some tasks with imbalanced data distribution.
If your models fail to converge because of this problem, one potential solution is to set a fixed batch size for the dataloader, ensuring that all batches have the same size and minimizing the impact of noisy gradients.
Also, for a dataset which contains enough training samples, we are interested in whether this training process leads to better performance.
Thank you for your contribution and for helping to improve the project!
Hi,
Thank you for this exciting project! I have been using it for some time and I have recently found a potential problem which may lead to too small batch size in some training epochs. Please feel free to correct me if any of my understanding is wrong.
I note the batch size for dataloader is set to None as here. So I think each worker will return a batch with the size defined in the config file (like here). As a result, the last batch given by each worker may be very small, if the length of data list return by the sampling step is not divisible by the batch size and the reminder is small. The number of such small batches equals to the number of workers. This leads to some noisy backward passes with small batch size. If the amount of training data is small, the effect of these backward passes may be non-neglectable (think of an extreme case: 8 workers, batch size 128, and the data list of each worker has 129 items; then it is 8 updates with batch size 128 and 8 updates with batch size 1).
What do you think of it? Thanks!
The text was updated successfully, but these errors were encountered: