-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeking views of administrators on proposal to expose audiobook keywords and promote exploration of our catalog via keyword hyperlinks #210
Comments
Personally, I like this a lot. As for consensus: discussion has just begun. Public discussion here, and admins will probably have an internal discussion as well. On a purely technical note (else I'd post it in the linked forum thread), I think we could benefit from counting up the projects that use each keyword with a |
Thank you! This is a great example of why it’s good to get other people’s perspectives. I had not given any consideration at all to performance issues, and certainly not to this elegant potential solution.
I did a bit of hunting around today for some kind of off-the-shelf load testing application and came across locust, which seems to fit the bill nicely for this use case. Sadly, however, locust is unable to issue a successful GET to any page on my local Apache server because of the dreaded SSL certificate issue, which none of the workarounds I have read about online this afternoon seem to address successfully anymore. I can see locust’s requests in the Apache access log, where they all show up with a 301 return code, while locust itself reports the result of every GET as “SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1007)'))”. That’s a pity, because it looks like locust is a really brilliant tool.
Best wishes,
Peter Dann
… On 20 Mar 2024, at 3:27 am, redrun45 ***@***.***> wrote:
Personally, I like this a lot. As for consensus: discussion has just begun. Public discussion here <https://forum.librivox.org/viewtopic.php?p=2310123#p2310123>, and admins will probably have an internal discussion as well.
On a purely technical note (else I'd post it in the linked forum thread), I think we could benefit from counting up the projects that use each keyword with a cron job, just like we do with authors and genres <https://github.com/LibriVox/librivox-catalog/blob/master/application/controllers/cron/Project_status_stats.php>.
It would grow our database, but I think it would still be much faster than cross-referencing and counting projects on page load. Worth testing!
—
Reply to this email directly, view it on GitHub <#210 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APUJ53H43FADGUF6KB74QIDYZBRONAVCNFSM6AAAAABE42RKFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGYZDOMZRGQ>.
You are receiving this because you authored the thread.
|
If you're writing your own 'locustfile' test specs, this might be of use. Apparently, you can use the same options as this module. |
Suggestion from redrun45 proved to be a lifesaver. Tbank you!
I have attached in a zip file reports from all three runs, together with the locust file I used for these tests.
At face value, anyway, this suggests to me that we could display keywords to catalog users with no discernible degradation of server performance provided we kept the keyword usage statistics updated with a cron job, rather than calculating them each time on the fly. |
That's a pretty massive hit for dynamic statistics. We could always go with the Cron job, but I'm curious what the SQL to calculate those was, and whether it can be made faster by adding foreign keys and/or indexes to our schema. |
Good point. I have used the following SQL for a dynamic count of the number of instances where a keyword is in use. As you can see, it performs a join involving project_keywords.keyword.id. This column, however, is not currently indexed in the database.
It would certainly be a good idea to establish a foreign key relationship between project_keywords.keyword_id and keywords.id. My first attempt to set one up was frustrated by the fact that there are 1910 instances where project_keywords.keyword_id = 0. I needed to delete these before I could set up a foreign key on this column. I then set up an index on. project_keywords.keyword_id. Used the following SQL through these adventures: SELECT keyword_id from project_keywords where keyword_id NOT IN (select id from keywords); === DELETE from project_keywords where keyword_id < 1; === ALTER TABLE project_keywords === CREATE INDEX keyword_id After making these DB changes, I load tested my version of the code that counts keyword instances at runtime (ie, no cron job updating required) using the same load test I used previously, and this time got an average response time of 546 ms. I have attached this load test report. |
After sleeping on it, realised that logically there should also be a foreign key relationship between project_keywords and projects table. Set this up: After making this further DB change, ran the same load test again, this time getting an average response time of 571 ms. |
We've kicked this back and forth and not had any objections, other than to changing the 'keyword' term to 'keyterm'. When the PR is ready, I'll be glad to test it, and we might also want this on the staging server for other admins. 😄 |
redrun has suggested to me that before creating any signifcant pull request, it would be a good idea if I were to put what I have in mind before administrators to get their views on whether the proposed change makes sense at all, or perhaps makes sense in principle, but requires modification. I think this is a most reasonable suggestion, but am not sure by what mechanism I should make any such approach.
I have attached a PDF file which I hope shows pretty clearly, at a GUI level at least, what I have in mind, with screenshots taken on my local development environment.
I'd be interested in the thoughts of all stakeholders. If nothing else, I've learned a hell of a lot by exploring this question. If this doesn't go anywhere, I've at least got that to hang onto!
Proposal to expose keyterms to catalog users 2024 03 19.pdf
The text was updated successfully, but these errors were encountered: