Tools and training scripts I have developed for building large language models in PyTorch.
This repository provides:
- data preprocessing scripts,
- training scripts, and
- training guides.
This repository is the successor to my old training tools BERT-PyTorch as the old code had a lot of technical debt and was not well tested. Compared to the old repository, this codebase aims to have better code health and maintainability thanks to tests, type checking, linters, documentation, etc.
See the Installation Guide.
See the available Guides.