You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But it doesn't seem to just end there. difflib, the fallback for when you don't have python-Levenshtein installed, doesn't seem to use Levenshtein distance either. Issue 128 in the older repository/version of this package mentioned that difflib uses "Ratcliff-Obershelp algorithm, not the Levenshtein distance".
To conclude:
This library doesn't seem to use Levenshtein distance at all.
Installing python-Levenshtein doesn't just simply give speed up. It will produce different results because different algorithms are used.
Note: This library uses ratio function from Levenshtein package or difflib for all of its function with ratio in its name.
The text was updated successfully, but these errors were encountered:
Note that the Indel distance is the same as the Levenshtein distance with substitutions weighted as 2. Still I agree this is misleading. I think to a part this stems from python-Levenshtein, where Levenshtein.distance calculates the Levenshtein distance, but Levenshtein.ratio calculates the normalized Indel similarity (In my defense this was implemented before I became the maintainer of python-Levenshtein 😅 )
Note that the Indel distance is the same as the Levenshtein distance with substitutions weighted as 2.
Yeah, I also mentioned they're similar. Still, they're not the same.
I think to a part this stems from python-Levenshtein, where Levenshtein.distance calculates the Levenshtein distance, but Levenshtein.ratio calculates the normalized Indel similarity
Yup. I mentioned that too. I can't help but wonder why. The library is literally named Levenshtein. It could've been an option instead of the default, having "Indel" in the function name like it does for Jaro, or as an optional parameter.
Well, I think the readme should at least be changed to tell people about this. It's where they'd read first and currently it says "It uses Levenshtein Distance" at the very beginning, while it doesn't.
This library is described as fuzzy string matching with Levenshtein distance. However, it doesn't seem to use Levenshtein at all?
fuzz.ratio("tide", "diet") returns:
This is because python-Levenshtein uses Indel distance for ratio instead of Levenshtein. Issue 47 in python-Levenshtein mentioned that "Levenshtein.ratio is based upon the Indel distance and not on the Levenshtein distance". (That's the correct repo I think). Indel distance is similar to Levenshtein, but they are not the same and may produce different results.
But it doesn't seem to just end there. difflib, the fallback for when you don't have python-Levenshtein installed, doesn't seem to use Levenshtein distance either. Issue 128 in the older repository/version of this package mentioned that difflib uses "Ratcliff-Obershelp algorithm, not the Levenshtein distance".
To conclude:
Note: This library uses ratio function from Levenshtein package or difflib for all of its function with ratio in its name.
The text was updated successfully, but these errors were encountered: