Spike: Explore the possibility of boosting the precise title field to get better results with exact title searching #1346

tclayton33 · 2023-04-25T16:22:48Z

I have several examples from the Library Search Committee of title searches (mostly of short, exact titles) that are producing unsatisfactory results. I'd like to have a discussion with the developers to explore what could happen if the precise title fields is boosted.

Here is the example file

rotated8 · 2023-05-30T17:53:51Z

Here's how I reproduced one of the examples- the "Central Station" one.

This is a title search, so lets look at the definition of a title search.

First, look at the fields title search queries.
Next, look at the definition of the title search itself.

From this we know how to replicate the search in Solr.

Go to the Solr interface for the environment we want to try this query in, and select the collection with the appropriate data.
On the query page, set q to be our query, "Central Station", select the checkboxes for debugQuery and edismax, and finally, concatenate the list of title fields from above (do this programmatically unless you like removing all the quotes) and set that as qf.
Click the 'Execute Query' button.

The results should mostly match what was reported by the committee, as long as the data in the selected collection is the same as prod.

At the bottom of the results, Solr will provide a 'debug' section, which includes an 'explain' subsection. This shows the math that creates the scores that provide our relevancy. The two most important explanations are for the result the committee thinks should be higher (id '990000746450302486', score 264.82336), and for the result above it (id '990022621180302486', score 270.9209). Full and truncated explanations are in Sharepoint.

The main takeaway is that the lower ranking result has more text in its description than the higher ranking one, and so "Central Station" is less of that item's description than it is for the higher ranking result.

rotated8 · 2023-05-30T17:55:26Z

@abelemlih I am surprised to see all_text_timv affecting the score of a title search. Can you look into that? I should note that I cannot get similar results from Solr if pf='' is set, as in the search definition.

rotated8 · 2023-05-30T18:00:04Z

@tclayton33 I must warn you and the committee: changes in relevance will have knock-on effects. We cannot boost one term without effectively de-boosting all the rest. If the committee is happy with results generally, there is no way to change boosts without changing those other results.

tclayton33 · 2023-06-05T23:05:19Z

@rotated8 @abelemlih The committee did discuss that any changes we make in this area could have undesirable consequences, and we do want to prevent that. But, a considerable number of members are also dissatisfied with some of the results for short, exact titles. I just added a new example that came in from a faculty member last week (no.7). I've also run some comparable searches in Stanford's catalog and added those links to the example document. I'm not sure what Stanford is doing (it may be a lot more complicated than boosting the one field the committee was proposing), but I think their title search results for al-Khaṣāʼiṣ , JAMA, Radiographics, and Traditio are more in line with the behavior our users are expecting.

It's hard to talk about possible consequences in the abstract. We were hoping to ultimately alleviate the concern of creating unfavorable consequences by conducting thorough testing in the blackcat-test environment. Because production and blackcat-test are using the same indexes, the committee members would be able to run side by side comparisons to make sure the results are acceptable...or not.

tclayton33 · 2023-06-12T19:52:00Z

@abelemlih and @rotated8 Pardon my newbie question, but that score that Ayoub assigned is just to complete the research for the spike, correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Explore the possibility of boosting the precise title field to get better results with exact title searching #1346

Spike: Explore the possibility of boosting the precise title field to get better results with exact title searching #1346

tclayton33 commented Apr 25, 2023

rotated8 commented May 30, 2023 •

edited

Loading

rotated8 commented May 30, 2023 •

edited

Loading

rotated8 commented May 30, 2023

tclayton33 commented Jun 5, 2023

tclayton33 commented Jun 12, 2023

Spike: Explore the possibility of boosting the precise title field to get better results with exact title searching #1346

Spike: Explore the possibility of boosting the precise title field to get better results with exact title searching #1346

Comments

tclayton33 commented Apr 25, 2023

rotated8 commented May 30, 2023 • edited Loading

rotated8 commented May 30, 2023 • edited Loading

rotated8 commented May 30, 2023

tclayton33 commented Jun 5, 2023

tclayton33 commented Jun 12, 2023

rotated8 commented May 30, 2023 •

edited

Loading

rotated8 commented May 30, 2023 •

edited

Loading