Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split name part of the query by . in addition to space. #23

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

panglesd
Copy link
Contributor

@panglesd panglesd commented Feb 6, 2024

This allows to find ListLabels.map when searching for List.map.

Remarks:

  • Would it make sense to also split the name by . when indexing? Docstrings are indexed "by words"
  • Certainly the cost function could be improved.

What do you think?

This allows to find `ListLabels.map` when searching for `List.map`.

Signed-off-by: Paul-Elliot <[email protected]>
Co-authored-by: Emile Trotignon <[email protected]>
@art-w
Copy link
Owner

art-w commented Feb 6, 2024

Nice thanks! I've caught myself wanting to write Foo.bar out of muscle memory, when Foo bar was actually a better search (since the bar value is often hidden in a submodule)... but at the same time I liked that the extra "dot" precision in the query was rewarded by more precise search results!

But now that I think about it, the Query.Name_cost already favors "nice" word boundaries (eg with dots on either side of List and map) and the fact that words are found in query order (so Foo_map.List is worse), so I believe that by accident the exact matches will still be favored!

edit: Your PR has been deployed to doc.sherlocode.com so that we can try it out :)

If there's no big issue, I think it make sense to split the names by . during indexing as it should result in a nice database size improvement! (... or not) We had to comment the test for the opam release as the exact sizes varies depending on the OCaml version but you could test it here:

$ du -s *.js *.gz

@art-w
Copy link
Owner

art-w commented Feb 6, 2024

After testing this a bit, I really like it :)

There's a tiny issue when searching for +. as the dot is lost, but it's not a blocker as searching for operators already has similar issues that we should address in a future PR #26

@panglesd
Copy link
Contributor Author

panglesd commented Feb 6, 2024

Thanks! Very cool that it is already deployed!

If there's no big issue, I think it make sense to split the names by . during indexing as it should result in a nice database size improvement! (... or not)

I'll do that and test the change in size, and undraft the PR when it's done :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants