- Added fast inference for tortoise with HiFi Decoder (inspired by xtts by coquiTTS 🐸, check out their multilingual model for noncommercial uses)
- Added custom tokenizer for non-english models
- Bug fixes
- Added Apple Silicon Support
- Updated Transformer version
- Bug fixes
- Added kv_cache support 5x faster
- Added deepspeed support 10x faster
- Added half precision support
- Removed CVVP model. Found that it does not, in fact, make an appreciable difference in the output.
- Add better debugging support; existing tools now spit out debug files which can be used to reproduce bad runs.
- New CLVP-large model for further improved decoding guidance.
- Improvements to read.py and do_tts.py (new options)
- Added several new voices from the training set.
- Automated redaction. Wrap the text you want to use to prompt the model but not be spoken in brackets.
- Bug fixes
- Added ability to produce totally random voices.
- Added ability to download voice conditioning latent via a script, and then use a user-provided conditioning latent.
- Added ability to use your own pretrained models.
- Refactored directory structures.
- Performance improvements & bug fixes.