Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesigned Search Algorithm Issue #65

Open
LTR14 opened this issue Jun 26, 2023 · 0 comments
Open

Redesigned Search Algorithm Issue #65

LTR14 opened this issue Jun 26, 2023 · 0 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@LTR14
Copy link
Collaborator

LTR14 commented Jun 26, 2023

Source: (read-in-spreadsheet branch) PPUC/PxPUC/views.py in ResearcherSearchList

Description: In an attempt to sort by sentences with a Fuzz ratio rank assigned to them, the loop from the original algorithm is removed and the new algorithm works on the full user query (post stopwords being removed). The current structure of the algorithm:

  1. Prefetch_queryset is created using sentences that contain the current query
  2. Each sentence is looped over and annotated a new field for score
  3. This score is the fuzz.token_set_ratio between the current sentence text and the tokenized user query
  4. The prefetch is first ordered by these scores before being passed to the location_queryset
  5. A count for sentences containing the user query per location is created
  6. The location_queryset annotates a new field for the count and connects locations to their corresponding sentences, excluding locations where count is 0.
  7. This queryset is the one that will be returned to the frontend

Issue: Steps 1 and 5 might have some difficulties in getting the best results as they are now working with a full query and not a fragmented one as used in the original algorithm. Another problem comes from step 2 where the loop occurs. Annotations don't seem to work in that way, so finding another way to give a unique rank per sentence must be discovered

Additional Notes: https://www.jashds.com/blog/2019/05/13/fuzzy-stringmatching-python#:~:text=This%20ratio%20uses%20a%20simple,differences%20existing%20between%20both%20strings.

@LTR14 LTR14 added bug Something isn't working help wanted Extra attention is needed labels Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant