Releases: pemistahl/lingua-rs
Lingua 1.6.2
Improvements
- Type stubs for the Python bindings are now available, allowing better static code analysis, better code completion in supported IDEs and easier understanding of the library's API.
Bug Fixes
- The method
LanguageDetector.detect_multiple_languages_of
still returned character indices instead of byte indices when only a singleDetectionResult
was produced. This has been fixed.
Lingua 1.6.1
Bug Fixes
-
The method
LanguageDetector.detect_multiple_languages_of
returns byte indices. For creating string slices in Python and JavaScript, character indices are needed but were not provided. This resulted in incorrectDetectionResult
s for Python and JavaScript. This has been fixed now by converting the byte indices to character indices. (pemistahl/lingua-py#192) -
Some minor bugs in the WASM module have been fixed to prepare the first release of Lingua for JavaScript.
Lingua 1.6.0
Features
-
Python bindings are now available for the library. These bindings replace the pure Python implementation of Lingua in order to benefit from Rust's performance in any Python software. (#262)
-
Parallel equivalents for all methods in
LanguageDetector
have been added to give the user the choice of using the library single-threaded or multi-threaded. (#271)
Bug Fixes
-
Several bugs in multiple languages detection have been fixed that caused incomplete results to be returned in several cases.
-
A significant amount of Kazakh texts were incorrectly classified as Mongolian. This has been fixed.
Lingua 1.5.0
Features
-
The new method
LanguageDetector.detect_multiple_languages_of()
has been introduced. It allows to detect multiple languages in mixed-language text. (#1) -
The new method
LanguageDetectorBuilder.with_low_accuracy_mode()
has been introduced. By activating it, detection accuracy for short text is reduced in favor of a smaller memory footprint and faster detection performance. (#119) -
The new method
LanguageDetector.compute_language_confidence()
has been introduced. It allows to retrieve the confidence value for one specific language only, given the input text. (#102)
Improvements
-
The computation of the confidence values has been revised and the softmax function is now applied to the values, making them better comparable by behaving more like real probabilities. (#120)
-
The WASM API has been revised. Now it makes use of the same builder pattern as the Rust API. (#122)
-
The language model files are now compressed with the Brotli algorithm which reduces the file size by 15 %, on average. (#189)
-
The language model ngrams are now stored in a
CompactString
type which reduces the amount of consumed memory by 20 %. (#198) -
Several performance optimizations have been applied which makes the library nearly twice as fast as the previous version. Big thanks go out to @serega and @koute for their help. (#82, #148, #177)
-
The enums
IsoCode639_1
andIsoCode639_3
now implement some new traits such asCopy
,Hash
and Serde'sSerialize
andDeserialize
. The enumLanguage
now implementsCopy
as well. (#175)
Lingua 1.4.0
Features
- The library can now be compiled to WebAssembly and be used in any JavaScript project. Big thanks to @martindisch for bringing this forward. (#14)
Improvements
- Some minor performance tweaks have been applied to the rule engine.
Lingua 1.3.3
Bug Fixes
- This release updates outdated dependencies and fixes an incompatibility between different versions of the
include_dir
crate which are used in the mainlingua
crate and the language model crates.
Lingua 1.3.2
Bug Fixes
- Another compilation error has been fixed which occurred when the Latin language was left out as Cargo feature.
Lingua 1.3.1
Bug Fixes
- When Chinese, Japanese or Korean were left out as Cargo features, there were compilation errors. This has been fixed.
Lingua 1.3.0
Features
- The language model dependencies are separate Cargo features now. Users can decide which languages shall be downloaded and used in the library. (#12)
Improvements
- The code that does the lazy-loading of the language models has been refactored significantly, making the code more stable and less error-prone.
Bug Fixes
Lingua 1.2.2
Features
- The enums
Language
,IsoCode639_1
andIsoCode639_3
now implementstd::str::FromStr
in order to instantiate enum variants by string values. This comes in handy for JavaScript bindings and the like. (#15)
Improvements
- The performance of preloading the language models has been improved.
Bug Fixes
- Language detection for sentences with more than 120 characters was supposed to be done by iterating through trigrams only but this was never the case. This has been corrected.