Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sanity Check] Run query once per search #207

Merged
merged 1 commit into from
Mar 16, 2024

Conversation

redrun45
Copy link
Collaborator

Not urgent, but if I'm on the right track, we can improve search performance by a fair bit.


It looks like controllers/catalog/Search.php is running each search query twice: once for the items on the current page, and then again to count the total number of matches for the search. There doesn't seem to be a clean way to count the total results, but here's one way.

We can't use a regular COUNT(*), because that will stop at our LIMIT of 25. So: the OVER() call (1, 2) lets us COUNT before we discard the other results.
But: doing this on each of our UNION sets will give different results, which we'd need to add together selectively. Collecting them all in a sub-query (WITH results AS () gives us a combined set to count.

I don't know how much of a hack this is, and how much overhead it adds, but it at least seems much faster than running the entire search query twice.

It looks like controllers/catalog/Search.php is running each search
query twice: once for the items on the current page, and then again to
count the total number of matches for the search.

There doesn't seem to be a clean way to count the results, but here's
_one_ way.  We bundle up the whole bunch of `UNION` sets as a sub-query,
and we do a count(*) on that set.  We can't do just a _regular_ count(*),
or it would return our limit of 25.

I'm not sure just how much of a hack this `OVER()` call[1][2] is, but it
at least seems faster than running the entire query twice.

[1]: https://stackoverflow.com/a/8242764
[2]: https://stackoverflow.com/a/28888696
@garethsime
Copy link
Contributor

I'll throw in a "works on my machine" for the advanced search (quick search is broken, but that's expected as it hasn't been updated with the window functions yet). It seems to be correct in that it still returns the same results with the same pagination.

As far as performance goes, on my machine, in a very simple test, it does seem to work faster. ab is what I had handy, so I used that against my local install:

ab -n 100 --verbose 'https://librivox.org/advanced_search?title=wells&author=&reader=&keywords=&genre_id=0&status=all&project_type=either&recorded_language=&sort_order=catalog_date&search_page=1&search_form=advanced&q='

New query:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        5    5   0.8      5      12
Processing:    73   76   2.1     75      86
Waiting:       73   76   2.1     75      86
Total:         78   81   2.5     80      98

Percentage of the requests served within a certain time (ms)
  50%     80
  66%     81
  75%     81
  80%     82
  90%     83
  95%     86
  98%     89
  99%     98
 100%     98 (longest request)

Old query:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        5    5   0.4      5       8
Processing:   136  146  13.9    141     208
Waiting:      136  146  13.9    141     208
Total:        141  152  14.0    146     214

Percentage of the requests served within a certain time (ms)
  50%    146
  66%    151
  75%    155
  80%    157
  90%    163
  95%    171
  98%    214
  99%    214
 100%    214 (longest request)

I tried a few other searches, and the results were pretty much the same - the old one doing two searches took about twice as long as the new one doing one search.

I have no idea how to check the cost of a query on MariaDB à la EXPLAIN on Postgres, googling around it doesn't seem like a doable thing? Either way, that we're getting faster results implies we're probably tying up fewer DB resources.

I had a play around adding different indexes to the search_table table, but I only ended up making performance worse, and getting a p99 of 98ms is already going to be pretty hard to beat I think.

@notartom
Copy link
Member

notartom commented Mar 16, 2024

I'm sure there are many other things wrong with search that we can improve and fix and change, but as small incremental improvements go, this one is pretty good. @redrun45 you wrote "[Sanity check]", are you opposed to merging this?

@redrun45
Copy link
Collaborator Author

@notartom if the technical direction is sound, then no, I'm not opposed. I just have to patch the simplesearch part too. I'll get a second commit ready soon!

@notartom notartom merged commit cc24635 into LibriVox:master Mar 16, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants