Release v0.4 · CarperAI/trlx

Summary of release notes:

Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include:

Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization.
Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here)
Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report).

Other interesting examples are in the works, so stay tuned!

What's Changed

ILQL indicies on wrong device by @cat-state in #105
Fix ppo ratio inaccuracy by @reciprocated in #108
Set RNG seeds across multiple dependencies by @jon-tow in #113
Set seed after default config instantiation by @jon-tow in #114
Move queries on the device by @reciprocated in #115
Add ppo randomwalks example by @reciprocated in #119
Add unit tests to ensure valid example configs by @jon-tow in #120
updating gptj-config by @Dahoas in #109
Fix get distributed config by @reciprocated in #122
Add local rollout logging by @thomfoster in #124
Add support for more CausalLMs by @jon-tow in #103
Add hydra head support for GPTNeo by @jon-tow in #126
Add BloomModel hydra support by @jon-tow in #129
Simplifying logic to merge configs by @leshanbog in #134
add: load function for AccelerateRLModel by @dongs0104 in #136
Add OptimizerConfig and SchedulerConfig by @jon-tow in #135
Remove incorrect default config settings by @jon-tow in #137
Update TRL acknowledgement by @osanseviero in #138
Fix context overflow by @reciprocated in #131
Fix seeding per process by @reciprocated in #141
Set device-specific seeding with global rank by @jon-tow in #143
Freeze hydra model branches by @jon-tow in #140
Refactor RL model wrapper into a trainer module by @jon-tow in #144
Logging learning rate by @leshanbog in #147
Fix instantiating base transformer from a custom config by @reciprocated in #149
Linear LR scheduler by @leshanbog in #150
Update pre-commit version and add isort by @jon-tow in #152
fix: configure flake8, fix errors, add trackers config by @Mistobaan in #157
Features/use-python-3.8-in-ci by @Mistobaan in #159
Add bitsandbytes optimizer support by @aicrumb in #133
initial commit for trlx LORA support by @ethankim00 in #110
Fix default delta_kwargs handling by @jon-tow in #171
Add T5 model by @PhungVanDuy in #145
Fix wandb.errors.RequireError as reported in #162 by @ayulockin in #167
Update README.md by @LouisCastricato in #180
Update ILQL details by @reciprocated in #156
Add OpenAI Summarize RLHF with trlX by @PhungVanDuy in #175
Fix HuggingFace model.save_pretrained for DDP by @jon-tow in #181
Update generation utilities by @reciprocated in #172

New Contributors

@thomfoster made their first contribution in #124
@leshanbog made their first contribution in #134
@dongs0104 made their first contribution in #136
@osanseviero made their first contribution in #138
@Mistobaan made their first contribution in #157
@aicrumb made their first contribution in #133
@ethankim00 made their first contribution in #110
@PhungVanDuy made their first contribution in #145

Full Changelog: v0.3...v0.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4

Summary of release notes:

What's Changed

New Contributors

Contributors