You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, there can be two or more tokenized forms corresponding to a single surface form. In fact, the default dictionary file does contain a few entries having such alternatives. However, I thought it would be more convenient to get the result as a single tokenized form rather than an array. This decision may not sound very thoughtful--especially if you have your own dict files that have many entries with multiple tokenized forms. Still, I would like to keep it this way so as not to break the existing code of many users. Thanks anyway!
I would like to keep it this way so as not to break the existing code of many users.
well if it is a not quite expected behavior which we can improve then, maybe we can think about release with this kind of "breaking" change?
Because thing like: "break the existing code of many users" should be guarded by the gem version.
The logic
w, s = line.split(/\s+/)
compute only for 2 first matches even for cases with 3 matcheslemmatizer/lib/lemmatizer/lemmatizer.rb
Lines 123 to 129 in af70f99
For example:
lemmatizer/lib/dict/noun.exc
Line 2046 in af70f99
The word
zemindari
is out of the compute range, is it a bug?The text was updated successfully, but these errors were encountered: