Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Can't handle nan #291

Closed
Zeroto521 opened this issue Nov 22, 2022 · 3 comments
Closed

BUG: Can't handle nan #291

Zeroto521 opened this issue Nov 22, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@Zeroto521
Copy link

In [1]: from rapidfuzz import fuzz

In [2]: fuzz.ratio("this is a test", float("nan"))  # same to `np.nan`
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 fuzz.ratio("this is a test", float("nan"))
File src/rapidfuzz/fuzz_cpp.pyx:72, in rapidfuzz.fuzz_cpp.ratio()
File ./src/rapidfuzz/cpp_common.pxd:379, in cpp_common.preprocess_strings()
File ./src/rapidfuzz/cpp_common.pxd:332, in cpp_common.conv_sequence()
File ./src/rapidfuzz/cpp_common.pxd:300, in cpp_common.hash_sequence()
TypeError: object of type 'float' has no len()

closed to seatgeek/thefuzz#41

@maxbachmann maxbachmann added the enhancement New feature or request label Nov 26, 2022
@maxbachmann
Copy link
Member

Since this was never supported, I do not think this is really a bug. However I think it would be a reasonable extension to handle float("nan") similar to None.

@maxbachmann
Copy link
Member

rapidfuzz.fuzz.*, rapidfuzz.distance.*.normalized_distance and rapidfuzz.distance.*.normalized_similarity are now able to handle both None and nan. Other scorers do not support them, since it is unclear what the result in this case should be. This is supported by rapidfuzz.process.* as well. The only exception is rapidfuzz.process.cdist as described in #293.

Zeroto521 added a commit to Zeroto521/my-data-toolkit that referenced this issue Dec 7, 2022
@Zeroto521
Copy link
Author

Zeroto521 commented Dec 7, 2022

Other scorers do not support them, since it is unclear what the result in this case should be.

These conditions could return None or nan.

  • None or nan means the result is missing or doesn't know.
  • It also could avoid breaking the calculation instead of raising an error.

Zeroto521 added a commit to Zeroto521/my-data-toolkit that referenced this issue Dec 9, 2022
Zeroto521 added a commit to Zeroto521/my-data-toolkit that referenced this issue Dec 18, 2022
* PERF: Speed up

* BOT: auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update optional arguments

* use sphinx anchor

* update test case

* simplify a bit

* update via advice from RapidFuzz#291

based on rapidfuzz/RapidFuzz#291

* update documentation

* remove importing

* update testcase

* add missing splash quote

* link to source

* Update condition

* PERF: Speed up

* BOT: auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update optional arguments

* use sphinx anchor

* update test case

* simplify a bit

* update via advice from RapidFuzz#291

based on rapidfuzz/RapidFuzz#291

* update documentation

* remove importing

* update testcase

* add missing splash quote

* link to source

* Update condition

* Add notes

* compare nan require rapidfuzz > 2.13.4

* use `rtol` instead of `check_less_precise`

* use `rtol` instead of `check_less_precise`

* remove workers

* Revert "remove workers"

This reverts commit e41ab01.

* Update test_textdistance_matrix.py

* reorder methods

* Handle nan

* BOT: auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants