What's Changed
- Dynamic language tag handling by @Uinelj in #57
- Validate/Fix rebuilding for OSCAR Doc by @Uinelj in #65
- KenLM based content detection by @Uinelj in #72
- Locality sensitive hashing annotation by @Uinelj in #69
- Fix bug in MeanLength filter by @sadra-barikbin in #71
- feat(blocklists): ability to use multiple blocklists by @Uinelj in #76
- Removal of custom domain blocklists from the CLI by @Uinelj in #80
- refactor: remove old pipelines, old io code and old langtags by @Uinelj in #82
- Move IO out of Ungoliant by @Uinelj in #83
- Change
annotation
toquality_warnings
by @Uinelj in #85 - Move TLSH out of annotations by @Uinelj in #86
New Contributors
- @sadra-barikbin made their first contribution in #71
Full Changelog: v1.2.3...v2.0.0