Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A potential problem on training batch size #164

Open
tomato18463 opened this issue May 11, 2024 · 2 comments
Open

A potential problem on training batch size #164

tomato18463 opened this issue May 11, 2024 · 2 comments

Comments

@tomato18463
Copy link

Hi,

Thank you for this exciting project! I have been using it for some time and I have recently found a potential problem which may lead to too small batch size in some training epochs. Please feel free to correct me if any of my understanding is wrong.

I note the batch size for dataloader is set to None as here. So I think each worker will return a batch with the size defined in the config file (like here). As a result, the last batch given by each worker may be very small, if the length of data list return by the sampling step is not divisible by the batch size and the reminder is small. The number of such small batches equals to the number of workers. This leads to some noisy backward passes with small batch size. If the amount of training data is small, the effect of these backward passes may be non-neglectable (think of an extreme case: 8 workers, batch size 128, and the data list of each worker has 129 items; then it is 8 updates with batch size 128 and 8 updates with batch size 1).

What do you think of it? Thanks!

@mlxu995
Copy link
Collaborator

mlxu995 commented May 14, 2024

Thank you for bringing this potential issue to our attention.

It is important to consider the impact of small batch sizes on the training process, especially for some tasks with imbalanced data distribution.
If your models fail to converge because of this problem, one potential solution is to set a fixed batch size for the dataloader, ensuring that all batches have the same size and minimizing the impact of noisy gradients.
Also, for a dataset which contains enough training samples, we are interested in whether this training process leads to better performance.

Thank you for your contribution and for helping to improve the project!

@tomato18463
Copy link
Author

I see. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants