v0.4
Pre-releaseSummary of release notes:
Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include:
-
Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization.
-
Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here)
-
Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report).
Other interesting examples are in the works, so stay tuned!
What's Changed
- ILQL indicies on wrong device by @cat-state in #105
- Fix ppo ratio inaccuracy by @reciprocated in #108
- Set RNG seeds across multiple dependencies by @jon-tow in #113
- Set seed after default config instantiation by @jon-tow in #114
- Move queries on the device by @reciprocated in #115
- Add ppo randomwalks example by @reciprocated in #119
- Add unit tests to ensure valid example configs by @jon-tow in #120
- updating gptj-config by @Dahoas in #109
- Fix get distributed config by @reciprocated in #122
- Add local rollout logging by @thomfoster in #124
- Add support for more
CausalLM
s by @jon-tow in #103 - Add hydra head support for
GPTNeo
by @jon-tow in #126 - Add
BloomModel
hydra support by @jon-tow in #129 - Simplifying logic to merge configs by @leshanbog in #134
- add: load function for AccelerateRLModel by @dongs0104 in #136
- Add
OptimizerConfig
andSchedulerConfig
by @jon-tow in #135 - Remove incorrect default config settings by @jon-tow in #137
- Update TRL acknowledgement by @osanseviero in #138
- Fix context overflow by @reciprocated in #131
- Fix seeding per process by @reciprocated in #141
- Set device-specific seeding with global rank by @jon-tow in #143
- Freeze hydra model branches by @jon-tow in #140
- Refactor RL model wrapper into a
trainer
module by @jon-tow in #144 - Logging learning rate by @leshanbog in #147
- Fix instantiating base transformer from a custom config by @reciprocated in #149
- Linear LR scheduler by @leshanbog in #150
- Update
pre-commit
version and addisort
by @jon-tow in #152 - fix: configure flake8, fix errors, add
trackers
config by @Mistobaan in #157 - Features/use-python-3.8-in-ci by @Mistobaan in #159
- Add
bitsandbytes
optimizer support by @aicrumb in #133 - initial commit for trlx LORA support by @ethankim00 in #110
- Fix default
delta_kwargs
handling by @jon-tow in #171 - Add T5 model by @PhungVanDuy in #145
- Fix wandb.errors.RequireError as reported in #162 by @ayulockin in #167
- Update README.md by @LouisCastricato in #180
- Update ILQL details by @reciprocated in #156
- Add OpenAI Summarize RLHF with trlX by @PhungVanDuy in #175
- Fix HuggingFace
model.save_pretrained
for DDP by @jon-tow in #181 - Update generation utilities by @reciprocated in #172
New Contributors
- @thomfoster made their first contribution in #124
- @leshanbog made their first contribution in #134
- @dongs0104 made their first contribution in #136
- @osanseviero made their first contribution in #138
- @Mistobaan made their first contribution in #157
- @aicrumb made their first contribution in #133
- @ethankim00 made their first contribution in #110
- @PhungVanDuy made their first contribution in #145
Full Changelog: v0.3...v0.4