Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency charset-normalizer to v3 #11

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

renovate[bot]
Copy link

@renovate renovate bot commented Mar 17, 2023

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
charset-normalizer ==2.0.12 -> ==3.4.0 age adoption passing confidence

Release Notes

Ousret/charset_normalizer (charset-normalizer)

v3.4.0

Compare Source

Added
  • Argument --no-preemptive in the CLI to prevent the detector to search for hints.
  • Support for Python 3.13 (#​512)
Fixed
  • Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
  • Improved the general reliability of the detector based on user feedbacks. (#​520) (#​509) (#​498) (#​407) (#​537)
  • Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#​381)

v3.3.2

Compare Source

Fixed
  • Unintentional memory usage regression when using large payload that match several encoding (#​376)
  • Regression on some detection case showcased in the documentation (#​371)
Added
  • Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)

v3.3.1

Compare Source

Changed
  • Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
  • Improved the general detection reliability based on reports from the community

v3.3.0

Compare Source

Added
  • Allow to execute the CLI (e.g. normalizer) through python -m charset_normalizer.cli or python -m charset_normalizer
  • Support for 9 forgotten encoding that are supported by Python but unlisted in encoding.aliases as they have no alias (#​323)
Removed
  • (internal) Redundant utils.is_ascii function and unused function is_private_use_only
  • (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
Changed
  • (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
  • Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7
Fixed
  • Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in __lt__ (#​350)

v3.2.0

Compare Source

Changed
  • Typehint for function from_path no longer enforce PathLike as its first argument
  • Minor improvement over the global detection reliability
Added
  • Introduce function is_binary that relies on main capabilities, and optimized to detect binaries
  • Propagate enable_fallback argument throughout from_bytes, from_path, and from_fp that allow a deeper control over the detection (default True)
  • Explicit support for Python 3.12
Fixed
  • Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #​289)

v3.1.0

Compare Source

Added
  • Argument should_rename_legacy for legacy function detect and disregard any new arguments without errors (PR #​262)
Removed
  • Support for Python 3.6 (PR #​260)
Changed
  • Optional speedup provided by mypy/c 1.0.1

v3.0.1

Compare Source

Fixed
  • Multi-bytes cutter/chunk generator did not always cut correctly (PR #​233)
Changed
  • Speedup provided by mypy/c 0.990 on Python >= 3.7

v3.0.0

Compare Source

Added
  • Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
  • Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
  • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
  • normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)
Changed
  • Build with static metadata using 'build' frontend
  • Make the language detection stricter
  • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
Fixed
  • CLI with opt --normalize fail when using full path for files
  • TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
  • Sphinx warnings when generating the documentation
Removed
  • Coherence detector no longer return 'Simple English' instead return 'English'
  • Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
  • Breaking: Method first() and best() from CharsetMatch
  • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
  • Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
  • Breaking: Top-level function normalize
  • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
  • Support for the backport unicodedata2

v2.1.1

Compare Source

Deprecated
  • Function normalize scheduled for removal in 3.0
Changed
  • Removed useless call to decode in fn is_unprintable (#​206)
Fixed

v2.1.0

Compare Source

Added
  • Output the Unicode table version when running the CLI with --version (PR #​194)
Changed
Fixed
  • Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #​175)
  • CLI default threshold aligned with the API threshold from @​oleksandr-kuzmenko (PR #​181)
Removed
  • Support for Python 3.5 (PR #​192)
Deprecated
  • Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0 (PR #​194)

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from e711745 to bb9d828 Compare July 8, 2023 08:34
@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from bb9d828 to f73a71d Compare October 1, 2023 03:00
@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from f73a71d to 0ec5a61 Compare October 23, 2023 05:03
@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from 0ec5a61 to 532b447 Compare November 2, 2023 08:17
@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from 532b447 to 36f8613 Compare October 11, 2024 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants