Redesigned Search Algorithm Issue #65

LTR14 · 2023-06-26T13:12:52Z

Source: (read-in-spreadsheet branch) PPUC/PxPUC/views.py in ResearcherSearchList

Description: In an attempt to sort by sentences with a Fuzz ratio rank assigned to them, the loop from the original algorithm is removed and the new algorithm works on the full user query (post stopwords being removed). The current structure of the algorithm:

Prefetch_queryset is created using sentences that contain the current query
Each sentence is looped over and annotated a new field for score
This score is the fuzz.token_set_ratio between the current sentence text and the tokenized user query
The prefetch is first ordered by these scores before being passed to the location_queryset
A count for sentences containing the user query per location is created
The location_queryset annotates a new field for the count and connects locations to their corresponding sentences, excluding locations where count is 0.
This queryset is the one that will be returned to the frontend

Issue: Steps 1 and 5 might have some difficulties in getting the best results as they are now working with a full query and not a fragmented one as used in the original algorithm. Another problem comes from step 2 where the loop occurs. Annotations don't seem to work in that way, so finding another way to give a unique rank per sentence must be discovered

Additional Notes: https://www.jashds.com/blog/2019/05/13/fuzzy-stringmatching-python#:~:text=This%20ratio%20uses%20a%20simple,differences%20existing%20between%20both%20strings.

LTR14 added bug Something isn't working help wanted Extra attention is needed labels Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesigned Search Algorithm Issue #65

Redesigned Search Algorithm Issue #65

LTR14 commented Jun 26, 2023

Redesigned Search Algorithm Issue #65

Redesigned Search Algorithm Issue #65

Comments

LTR14 commented Jun 26, 2023