Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authority search very poor #6089

Closed
RichardTaylor opened this issue Jan 30, 2021 · 16 comments · Fixed by #6120
Closed

Authority search very poor #6089

RichardTaylor opened this issue Jan 30, 2021 · 16 comments · Fixed by #6120

Comments

@RichardTaylor
Copy link

The search box at:

https://www.whatdotheyknow.com/select_authority

is particularly poor, it produces worse results than the general site search.

A search for cabinet office on the select authority page on WhatDoTheyKnow.com currently gives no results.

I suspect a number of the cases of reported problems logged at #1179 are not actually issues with the general site search, but with the select authority search.

See also: #4426

@garethrees
Copy link
Member

#1179 is the the same mechanism and #4426 covers the most annoying specific issue, so closing this as this just covers the same ground.

@garethrees garethrees added f:search improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) user-experience x:uk labels Feb 1, 2021
@RichardTaylor
Copy link
Author

I want to stress the /select_authority search produces different results to the search box in the header and at /search

the /select_authority search often produces no results when a public body name is entered into it.

Currently for example on WhatDoTheyKnow.com /select_authority has no results for cabinet office whereas the main search returns many and the Cabinet Office is second.

Many public body names result in no hits on the authority search eg
https://www.whatdotheyknow.com/select_authority?utf8=%E2%9C%93&query=cambridgeshire+county+council&bodies=1&commit=Search
https://www.whatdotheyknow.com/select_authority?utf8=%E2%9C%93&query=oxfordshire+county+council&bodies=1&commit=Search

Where there are results, putting bodies with many requests to the top (#40470 might help too.

caboffice2

caboffice1

@RichardTaylor RichardTaylor added the bug Breaks expected functionality label Feb 2, 2021
@RichardTaylor
Copy link
Author

Email to WhatDoTheyKnow today:

I am trying to make a new request to the Department of Health and Social Care. When I put that in the search box and click Search nothing happens.

They are correct, there are no results at:

https://www.whatdotheyknow.com/select_authority?utf8=%E2%9C%93&query=Department+of+Health+and+Social+Care&bodies=1&commit=Search

I'm going to reopen this, it's clearly a bug which is affecting many users who use a work-flow which takes them via the /select_authority page.

@RichardTaylor RichardTaylor reopened this Feb 2, 2021
@garethrees
Copy link
Member

Yeah okay something looks wrong here. Thanks for clarifying.

@garethrees
Copy link
Member

Seems related notanumber/xapian-haystack#154

@RichardTaylor
Copy link
Author

A WhatDoTheyKnow user has asked if we can make the results for searches on HS2 and High Speed 2 include the entry we have for High Speed Two (HS2) Limited, currently those terms don't result in a hit for the body.

Also related - where the /select_authority search has no hits there is no message to say "no results" it just appears to do nothing.

@RichardTaylor
Copy link
Author

This isn't just an issue of case sensitivity, the search for Cabinet Office doesn't result in any hits irrespective of capitalisation.

Interestingly, when typing cabinet office into the search box letter by letter the instant results shown while typing include the cabinet office while the search term is cabi, cabin, cabine, cabinet, cabinet o, and cabinet of but not when further letters are added.

@garethrees
Copy link
Member

This isn't just an issue of case sensitivity, the search for Cabinet Office doesn't result in any hits irrespective of capitalisation.

Yet "cabinet office" and "Cabinet Office" (quoted) do provide reasonable results.

Interestingly, when typing cabinet office into the search box letter by letter the instant results shown while typing include the cabinet office while the search term is cabi, cabin, cabine, cabinet, cabinet o, and cabinet of but not when further letters are added.

Yeah, notanumber/xapian-haystack#154 mentions issues to do with stemming which is related to this behaviour.

@garethrees
Copy link
Member

garethrees commented Feb 10, 2021

Noting that I think we upgraded Xapian (or at least the underlying data format – can't remember) recently, and that I need to dig out the issue where this was discussed. (EDIT discussion: https://github.com/mysociety/sysadmin/issues/1305#issuecomment-630688979 / commit d54ba8e)

@gbp
Copy link
Member

gbp commented Feb 15, 2021

@garethrees I don't think this is doing a Xapian search. It looks like this is all happening in SQL via the PublicBody.with_query method.

@gbp
Copy link
Member

gbp commented Feb 15, 2021

Actually, I'm wrong I was on the "View authorities" action at /body/list/all. Guess this means we have 3 differnt ways of searching for an authority

@garethrees
Copy link
Member

garethrees commented Feb 15, 2021

Yeah, /select_authority uses the xapian-based typeahead_search.

@gbp
Copy link
Member

gbp commented Feb 15, 2021

Also seems to be affecting the Pro batch authority search at /alaveteli_pro/batch_request_authority_searches

@garethrees garethrees added professional and removed improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) labels Feb 15, 2021
@gbp
Copy link
Member

gbp commented Feb 15, 2021

In production console:

irb(main):001:0> TypeaheadSearch.new('Cabinet Office', { model: PublicBody }).xapian_search
=> nil

Dev:

[1] pry(main)> TypeaheadSearch.new('Cabinet Office', { model: PublicBody }).xapian_search
=> #<ActsAsXapian::Search:0x00007f99d3f4dea8 snip>

@gbp
Copy link
Member

gbp commented Feb 16, 2021

This fix has now been deployed

@RichardTaylor
Copy link
Author

Difficulty has been reported finding the body Transport for the North via the search at

https://www.whatdotheyknow.com/select_authority

When searching for

Transport for the North

the body with that name appears in 14th place in those authority search results, but it appears in 2nd place in the results of a research via the general site header.

There is a related recent support inbox thread, subject: "Add authority - Transport for the North"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants