-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid SELECT DISTINCT SQL queries where possible #82
Comments
For now this is just a dump search and replace. Let's test this and see what breaks. The performance problems are getting untenable. Resolves: LibriVox#82
For now this is just a dumb search and replace. Let's test this and see what breaks. The performance problems are getting untenable. Resolves: LibriVox#82
SELECT DISTINCT is bad for performance because it foces the use of temporary tables [1]. Stop using it. As far as we could tell in our testing, a simple s/SELECT DISTINCT/SELECT/ is enough in most places, except two. First, when listing a reader's projects. Readers are associated with sections, not projects, and there's a many-to-one relation between sections and projects, so we need the SELECT DISTINC to avoid repeating the project once for every section for a particular reader. Second, when listing users according to a specific role. There's a one-to-many relation between users and roles, and the latter are stored as rows in the database. We need the SELECT DISTINCT to avoid listing a user once for every role that they have. Other than that, we seem to be able to get away with just not doing DISTINCT. [1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables Resolves: LibriVox#82
SELECT DISTINCT is bad for performance because it forces the use of temporary tables [1]. Stop using it. As far as we could tell in our testing, a simple s/SELECT DISTINCT/SELECT/ is enough in most places, except two. First, when listing a reader's projects. Readers are associated with sections, not projects, and there's a many-to-one relation between sections and projects, so we need the SELECT DISTINCT to avoid repeating the project once for every section for a particular reader. Second, when listing users according to a specific role. There's a one-to-many relation between users and roles, and the latter are stored as rows in the database. We need the SELECT DISTINCT to avoid listing a user once for every role that they have. Other than that, we seem to be able to get away with just not doing DISTINCT. [1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables Resolves: LibriVox#82
SELECT DISTINCT is bad for performance because it forces the use of temporary tables [1]. Stop using it. As far as we could tell in our testing, a simple s/SELECT DISTINCT/SELECT/ is enough in most places, except two. First, when listing a reader's projects. Readers are associated with sections, not projects, and there's a many-to-one relation between sections and projects, so we need the SELECT DISTINCT to avoid repeating the project once for every section for a particular reader. Second, when listing users according to a specific role. There's a one-to-many relation between users and roles, and the latter are stored as rows in the database. We need the SELECT DISTINCT to avoid listing a user once for every role that they have. Other than that, we seem to be able to get away with just not doing DISTINCT. [1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables Resolves: #82
Let's re-open this to remind ourselves that there are still 2 spots where SELECT DISTINCT is in use. |
This is what is causing names to show up on the Archive description more than once, when the reader has more than one section, correct? It would be nice to get this fixed. |
... maybe? :) Could you link to an example? If that's indeed the case, we can revert the part of 29255e2 that's causing it. |
Right now this one shows duplicate readers: https://archive.org/details/fairylady_2112_librivox but we MCs go in and edit the description, so you won't see it for long. |
Every so often, running
mytop
shows a bunch of queries running for many seconds (sometimes over 10), in stateCopying
.Copying
means copying to a temporary table, and is inefficient.It looks as though using
SELECT DISTINCT
queries forces the creation of temporary tables [1]. We should get rid of these unless there's no way around them.[1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables
The text was updated successfully, but these errors were encountered: