Change Log
Feature
- Implement
SOAP
optimizer. (#275) - Support
AdEMAMix
variants. (#276)bnb_ademamix8bit
,bnb_ademamix32bit
,bnb_paged_ademamix8bit
,bnb_paged_ademamix32bit
- Support 8/4bit, fp8 optimizers. (#208, #281)
torchao_adamw8bit
,torchao_adamw4bit
,torchao_adamwfp8
.
- Support a module-name-level (e.g.
LayerNorm
) weight decay exclusion forget_optimizer_parameters
. (#282, #283) - Implement
CPUOffloadOptimizer
, which offloads optimizer to CPU for single-GPU training. (#284) - Support a regex-based filter for searching names of optimizers, lr schedulers, and loss functions.
Bug
Contributions
thanks to @Vectorrent