Releases: jacksonllee/pycantonese
v3.1.0.dev3
This is another development release towards v3.1.0. Compared to v3.1.0.dev2, this dev release has more word segmentation issues fixed in order to improve part-of-speech tagging being developed.
Installing this version from the GitHub source requires Git LFS on your system, if it's not already installed.
Corresponding PyPI release: https://pypi.org/project/pycantonese/3.1.0.dev3/
v3.1.0.dev2
This is a development release to tag some unreleased features, particularly a part-of-speech tagger under development. (Installing this version from the GitHub source likely requires Git LFS on your system.)
Corresponding PyPI release: https://pypi.org/project/pycantonese/3.1.0.dev2/
v3.0.0
[3.0.0] - 2020-10-25
Added
- Word segmentation:
- Segmentation is now customizable for the following:
- Maximum word length
- A user-supplied list of words to allow as words
- A user-supplied list of words to disallow as words
- The default segmentation model has been improved with the rime-cantonese data (CC BY 4.0 license).
- Segmentation is now customizable for the following:
- Characters-to-Jyutping conversion:
- The conversion returns results in a word-segmented form.
- The conversion model has been improved with the rime-cantonese data (CC BY 4.0 license).
- Added the following functions; they are equivalent to their (now deprecated)
x2y
counterparts:characters_to_jyutping
jyutping_to_tipa
jyutping_to_yale
- Added support for Python 3.9.
Changed
API-breaking Changes
jyutping_to_yale
: The default value of the keyword argumentas_list
has
been changed fromFalse
toTrue
, so that this function is now more in
line with the other "jyutping_to_X" functions for returning a list.characters_to_jyutping
: The returned value is now a list of segmented words,
where each is a 2-tuple of (Cantonese characters, Jyutping).
Previously, it was a list of Jyutping strings for the individual
Cantonese characters.
Non-API-breaking Changes
- Switched documentation to the readthedocs theme and numpydoc docstring style.
- Improved CircleCI builds with orbs.
Deprecated
- The following
x2y
functions have been deprecated in favor of their
counterparts named asx_to_y
.characters2jyutping
jyutping2tipa
jyutping2yale
Security
- Turned on HTTPS for the pycantonese.org domain.
v2.4.1
[2.4.1] - 2020-10-10
Fixed
- Switched the
wordseg
dependency to the PyPI source instead of a GitHub direct link.
v2.4.0
[2.4.0] - 2020-10-10
Added
- Added the
characters2jyutping()
function for converting
Cantonese characters to Jyutping romanization. - Added the
segment()
function for word segmentation.
v2.3.0
[2.3.0] - 2020-07-24
Added
- Added support for Python 3.7 and 3.8.
Removed
- Dropped support for Python 3.4 and 3.5 (supporting 3.6, 3.7, and 3.8 now).
v2.2.0
[2.2.0] - 2018-06-30
Added
- 104 stop words.
v2.1.0
[2.1.0] - 2018-06-11
Added
- Exposed the
exclude
parameter in various reader methods
for excluding specific participants. This parameter was implemented at
pylangacq v0.10.0.
Fixed
- Allowed "n" to be a syllabic nasal.
- Fixed corpus reader not picking up the characters.
v2.0.0 release
Major update: Shift to the CHAT transcription format for HKCanCor and custom corpus datasets.
v1.0 release
- Overall code restructuring
- Only Python 3.x is supported from this point onwards
- Used generators instead of lists for corpus access methods
- Added the part-of-speech search criterion
- Added Jyutping-to-Yale conversion
- Added Jyutping-to-TIPA conversion
- Disabled the function for reading a custom corpus dataset (it will come back)