-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike: Explore the possibility of boosting the precise title field to get better results with exact title searching #1346
Comments
Here's how I reproduced one of the examples- the "Central Station" one. This is a title search, so lets look at the definition of a title search.
From this we know how to replicate the search in Solr.
The results should mostly match what was reported by the committee, as long as the data in the selected collection is the same as prod. At the bottom of the results, Solr will provide a 'debug' section, which includes an 'explain' subsection. This shows the math that creates the scores that provide our relevancy. The two most important explanations are for the result the committee thinks should be higher (id '990000746450302486', score 264.82336), and for the result above it (id '990022621180302486', score 270.9209). Full and truncated explanations are in Sharepoint. The main takeaway is that the lower ranking result has more text in its description than the higher ranking one, and so "Central Station" is less of that item's description than it is for the higher ranking result. |
@abelemlih I am surprised to see |
@tclayton33 I must warn you and the committee: changes in relevance will have knock-on effects. We cannot boost one term without effectively de-boosting all the rest. If the committee is happy with results generally, there is no way to change boosts without changing those other results. |
@rotated8 @abelemlih The committee did discuss that any changes we make in this area could have undesirable consequences, and we do want to prevent that. But, a considerable number of members are also dissatisfied with some of the results for short, exact titles. I just added a new example that came in from a faculty member last week (no.7). I've also run some comparable searches in Stanford's catalog and added those links to the example document. I'm not sure what Stanford is doing (it may be a lot more complicated than boosting the one field the committee was proposing), but I think their title search results for al-Khaṣāʼiṣ , JAMA, Radiographics, and Traditio are more in line with the behavior our users are expecting. It's hard to talk about possible consequences in the abstract. We were hoping to ultimately alleviate the concern of creating unfavorable consequences by conducting thorough testing in the blackcat-test environment. Because production and blackcat-test are using the same indexes, the committee members would be able to run side by side comparisons to make sure the results are acceptable...or not. |
@abelemlih and @rotated8 Pardon my newbie question, but that score that Ayoub assigned is just to complete the research for the spike, correct? |
I have several examples from the Library Search Committee of title searches (mostly of short, exact titles) that are producing unsatisfactory results. I'd like to have a discussion with the developers to explore what could happen if the precise title fields is boosted.
Here is the example file
The text was updated successfully, but these errors were encountered: