Releases · jacksonllee/pycantonese

28 Dec 21:34

jacksonllee

v3.4.0

8541495

v3.4.0 Latest

Latest

[3.4.0] - 2021-12-28

Added

Added the parse_text for analyzing Cantonese text data.
Characters-to-Jyutping conversion:
The characters_to_jyutping function now has the segmenter kwarg for
customizing word segmentation.
Added support for Python 3.10.
Turned on Windows testing on CircleCI.
Added pyproject.toml. Related to preferring setup.cfg for specifying
build metadata and options.

Changed

Characters-to-Jyutping conversion:
For the characters_to_jyutping function,
in case rime-cantonese and HKCanCor don't agree,
rime-cantonese data (more accurate) is preferred.
Updated the rime-cantonese data to the latest 2021.05.16 release,
improving both characters-to-Jyutping conversion and word segmentation.
Updated the PyLangAcq dependency to v0.16.0, allowing PyCantonese's CHATReader
to use the new methods to_chat, to_strs, info, head, and tail.
Switched to setup.cfg to fully specify build metadata and options,
while keeping a minimal setup.py for backward compatibility.
Related to the new pyproject.toml.

Removed

Dropped support for Python 3.6.

Security

Turned on safety and bandit checks at CircleCI builds.

Assets 2

15 May 00:34

jacksonllee

v3.3.1

f03314e

v3.3.1

[3.3.1] - 2021-05-14

Fixed

Allowed PyLangAcq v0.14.* for real.

Assets 2

14 May 05:30

jacksonllee

v3.3.0

fc6c524

v3.3.0

[3.3.0] - 2021-05-14

Changed

Allow PyLangAcq v0.14.*, thereby adding the new features of the filter method to CHATReader
and optional parallelization for CHAT data processing.

Fixed

Fixed the search method of CHATReader when by_tokens is False.

Assets 2

08 May 05:08

jacksonllee

v3.2.4

29aef7d

v3.2.4

[3.2.4] - 2021-05-07

Fixed

Fixed the previously inoperational methods append, append_left, extend, and extend_left
of the class CHATReader through the upstream PyLangAcq package.
Retrained the part-of-speech tagger, after the minor character fix from v3.2.3.
Raised NotImplementedError for the method ipsyn of CHATReader,
since the upstream method works only for English.

Assets 2

12 Apr 14:03

jacksonllee

v3.2.3

f42f39a

v3.2.3

[3.2.3] - 2021-04-12

Fixed

Fixed character issues in the built-in HKCanCor data: 𥄫

Assets 2

23 Mar 17:41

jacksonllee

v3.2.2

99e4544

v3.2.2

[3.2.2] - 2021-03-23

Fixed

Fixed a CHAT parsing issue when correction and repetition are combined,
by bumping the pylangacq dependency from v0.13.0 to v0.13.1.

Assets 2

21 Mar 16:54

jacksonllee

v3.2.1

9bd0d74

v3.2.1

[3.2.1] - 2021-03-21

Fixed

Fixed character issues in the built-in HKCanCor data: 𠮩𠹌, 𠻗

Assets 2

20 Mar 14:50

jacksonllee

v3.2.0

e746fd3

v3.2.0

[3.2.0] - 2021-03-20

Note: The underlying CHAT parser, the PyLangAcq package, has been bumped to v0.13.0.
All of the updates of PyLangAcq's CHAT reader apply to this PyCantonese release as well.
The details are in PyLangAcq's changelog for v0.13.0.
The changelog entries below only document updates specific to PyCantonese.

Added

Defined the Jyutping class to better represent parsed Jyutping romanization.

Changed

Bumped the PyLangAcq dependency to v0.13.0.
The function parse_jyutping now returns a list of Jyutping objects,
rather than tuples of strings.

Deprecated

The following methods in the CHATReader class have been deprecated:
- character_sents (use characters with by_utterances=True instead)
- jyutping_sents (use jyutping with by_utterances=True instead)
The following arguments of the search method of CHATReader have been deprecated:
- sent_range (use utterance_range instead)
- tagged (use by_tokens instead)
- sents (use by_utterances instead)

Fixed

Fixed the character issues in the built-in HKCanCor data: 𠺢, 𠺝, 𡁜, 𧕴, 𥊙, 𡃓, 𠴕, 𡀔

Assets 2

18 Mar 12:15

jacksonllee

v3.1.1

3f2919e

v3.1.1

[3.1.1] - 2021-03-18

Fixed

Pinned pylangacq at 0.12.0 (the new 0.13.0 has breaking changes).

Assets 2

21 Feb 17:24

jacksonllee

v3.1.0

a463b72

v3.1.0

[3.1.0] - 2021-02-21

Added

Part-of-speech tagging:
- Added the function pos_tag that takes a segmented sentence or phrase
  and returns its part-of-speech tags.
- Added the function hkcancor_to_ud that maps a part-of-speech tag
  from the original HKCanCor annotated data to one of the tags from the
  Universal Dependencies v2 tagset.
Word segmentation:
- Improved segmentation quality by revising the underlying wordlist data.
The test suite now covers code snippets in both the docstrings and .rst doc files.

Fixed

Fixed the issue of not opening text files with UTF-8 encoding
(a possible issue on Windows).
jyutping_to_yale and parse_jyutping now return a null value
(rather than raise an error) when the input is null.
The word segmentation function segment now strips all whitespace
from the input unsegmented string before segmenting it.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.4.0] - 2021-12-28

Added

Changed

Removed

Security

[3.3.1] - 2021-05-14

Fixed

[3.3.0] - 2021-05-14

Changed

Fixed

[3.2.4] - 2021-05-07

Fixed

[3.2.3] - 2021-04-12

Fixed

[3.2.2] - 2021-03-23

Fixed

[3.2.1] - 2021-03-21

Fixed

[3.2.0] - 2021-03-20

Added

Changed

Deprecated

Fixed

[3.1.1] - 2021-03-18

Fixed

[3.1.0] - 2021-02-21

Added

Fixed

Releases: jacksonllee/pycantonese

v3.4.0

[3.4.0] - 2021-12-28

Added

Changed

Removed

Security

v3.3.1

[3.3.1] - 2021-05-14

Fixed

v3.3.0

[3.3.0] - 2021-05-14

Changed

Fixed

v3.2.4

[3.2.4] - 2021-05-07

Fixed

v3.2.3

[3.2.3] - 2021-04-12

Fixed

v3.2.2

[3.2.2] - 2021-03-23

Fixed

v3.2.1

[3.2.1] - 2021-03-21

Fixed

v3.2.0

[3.2.0] - 2021-03-20

Added

Changed

Deprecated

Fixed

v3.1.1

[3.1.1] - 2021-03-18

Fixed

v3.1.0

[3.1.0] - 2021-02-21

Added

Fixed