You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #143 broke our hyphen-based test, which was beta-secretase. If we use the query (beta\-secretase) OR (beta\-secretase*) (as we used to before PR 143) then everything works fine. But when we change it to (beta\-secretase*) (as we do in PR 143), this stops working -- although (and this is the weird bit) (beta\-secretase) works just fine (i.e. when autocomplete=False).
I'm not sure what's wrong -- the one clue I have is that hyphens in the preferred name still works, so this is probably something to do with solr.StandardTokenizerFactory we use to tokenize names, which specifically splits on hyphens. The answer might be to choose a better tokenizer or something.
For now, the most comprehensive solution appears to be to replace special characters with spaces in autocomplete (i.e. (beta secretase*)), but escaping them without autocomplete (i.e. (beta\-secretase)). That's what I've done in PR #143, but we should figure out if there's a better solution here.
The text was updated successfully, but these errors were encountered:
This PR combines several improvements to search, results and filtering:
* It updates the search query to no longer duplicate the search query when doing an autocomplete query (see #142).
* This breaks hyphenated search terms in the autocomplete query, and I can't figure out why. For now, I've set it up so that we replace special characters with spaces in the autocomplete query (i.e. beta-secretase becomes `(beta secretase*)`) but we escape special characters in the non-autocomplete query (i.e. beta-secretase becomes `(beta\-secretase*)` since that still appears to work. I'll dig into this more deeply in #146.
* It adds taxon and clique identifier count to values indexed during data loading.
* It incorporates clique identifier count into both the returned results as well as the boosting and sorting of the returned results. It also tweaks the boosting values used in query fields and phrase fields.
* It adds an `only_taxa` input field that allows filtering results to a list of NCBITaxon taxon identifiers (note that this will only work for terms that have taxon information, which at the moment is only cliques containing NCBIGene identifiers).
PR #143 broke our hyphen-based test, which was
beta-secretase
. If we use the query(beta\-secretase) OR (beta\-secretase*)
(as we used to before PR 143) then everything works fine. But when we change it to(beta\-secretase*)
(as we do in PR 143), this stops working -- although (and this is the weird bit)(beta\-secretase)
works just fine (i.e. when autocomplete=False).I'm not sure what's wrong -- the one clue I have is that hyphens in the preferred name still works, so this is probably something to do with
solr.StandardTokenizerFactory
we use to tokenize names, which specifically splits on hyphens. The answer might be to choose a better tokenizer or something.For now, the most comprehensive solution appears to be to replace special characters with spaces in autocomplete (i.e.
(beta secretase*)
), but escaping them without autocomplete (i.e.(beta\-secretase)
). That's what I've done in PR #143, but we should figure out if there's a better solution here.The text was updated successfully, but these errors were encountered: