The changed things -- DataParallel removed from train.py script #310
-
Hello. Thanks for sharing this great codes. In contrast, validate.py contains args.num_gpu but not args.cpu. Best regard |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@jinseok-karl yes, I removed support for DataParallel in the train script. It wasn't worth maintaining as it conflicts with a number of the useful other training options and seems to be a lower priority for PyTorch team these days. It is slower than DDP and all around not so useful. DDP is really easy to use via the shell script here for multi-gpu single machine training. It's still used for validation because it's hard to get 100% correct multi-gpu validation for ALL samples in a validation set without using it (or writing some extra fiddly code), the DDP default data setup involves padding the last few samples. See: |
Beta Was this translation helpful? Give feedback.
@jinseok-karl yes, I removed support for DataParallel in the train script. It wasn't worth maintaining as it conflicts with a number of the useful other training options and seems to be a lower priority for PyTorch team these days. It is slower than DDP and all around not so useful. DDP is really easy to use via the shell script here for multi-gpu single machine training. It's still used for validation because it's hard to get 100% correct multi-gpu validation for ALL samples in a validation set without using it (or writing some extra fiddly code), the DDP default data setup involves padding the last few samples.
See:
https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html
ht…