Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Do not use SELECT DISTINCT #126

Merged
merged 1 commit into from
Aug 14, 2021
Merged

Conversation

notartom
Copy link
Member

@notartom notartom commented Aug 12, 2021

For now this is just a dumb search and replace. Let's test this and
see what breaks. The performance problems are getting untenable.

Resolves: #82

@notartom notartom force-pushed the fix-82 branch 2 times, most recently from 5933810 to 10f2173 Compare August 14, 2021 20:09
SELECT DISTINCT is bad for performance because it forces the use of
temporary tables [1]. Stop using it. As far as we could tell in our
testing, a simple s/SELECT DISTINCT/SELECT/ is enough in most places,
except two.

First, when listing a reader's projects. Readers are associated with
sections, not projects, and there's a many-to-one relation between
sections and projects, so we need the SELECT DISTINCT to avoid
repeating the project once for every section for a particular reader.

Second, when listing users according to a specific role. There's a
one-to-many relation between users and roles, and the latter are
stored as rows in the database. We need the SELECT DISTINCT to avoid
listing a user once for every role that they have.

Other than that, we seem to be able to get away with just not doing
DISTINCT.

[1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables

Resolves: LibriVox#82
@notartom notartom merged commit 29255e2 into LibriVox:master Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid SELECT DISTINCT SQL queries where possible
1 participant