You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that this program use jieba.cut to cut Chinese words,but it seems not works well at some time;
egg,use Chinese word 永永远远是龙的传人,jieba.cut will result to 永永远远/是/龙的传人, but when use jieba.cut_for_search, the result would be 永远/远远/永永远远/是/传人/龙的传人, I think its better for index search.
The text was updated successfully, but these errors were encountered:
Hello @sivdead,
you're right, using cut_for_search would increase the recall of Meilisearch by splitting words in different ways.
However, Meilisearch relies on words position for queries, and Jieba.cut_for_search doesn't give any clues on the position of each token, moreover, charabia does not support shifting tokens.
In order to support this kind of position shifting behavior, the charabia output should be changed in a tree shape for instance 永永远远是龙的传人 would be shaped as:
永永远远 ──┬─► 是 ─┬─► 龙的传人
永远 ─────┤ └─► 传人
远远 ─────┘
Which is not possible without doing a huge job,
But I have to admit that it would enhance significantly the search recall.
Thank you for your report and sorry for the time to answer,
I notice that this program use
jieba.cut
to cut Chinese words,but it seems not works well at some time;egg,use Chinese word
永永远远是龙的传人
,jieba.cut
will result to永永远远/是/龙的传人
, but when usejieba.cut_for_search
, the result would be永远/远远/永永远远/是/传人/龙的传人
, I think its better for index search.The text was updated successfully, but these errors were encountered: