Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid SELECT DISTINCT SQL queries where possible #82

Open
notartom opened this issue Feb 7, 2021 · 4 comments · Fixed by #126
Open

Avoid SELECT DISTINCT SQL queries where possible #82

notartom opened this issue Feb 7, 2021 · 4 comments · Fixed by #126

Comments

@notartom
Copy link
Member

notartom commented Feb 7, 2021

Every so often, running mytop shows a bunch of queries running for many seconds (sometimes over 10), in state Copying. Copying means copying to a temporary table, and is inefficient.

It looks as though using SELECT DISTINCT queries forces the creation of temporary tables [1]. We should get rid of these unless there's no way around them.

[1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables

notartom added a commit to notartom/librivox-catalog that referenced this issue Aug 12, 2021
For now this is just a dump search and replace. Let's test this and
see what breaks. The performance problems are getting untenable.

Resolves: LibriVox#82
notartom added a commit to notartom/librivox-catalog that referenced this issue Aug 13, 2021
For now this is just a dumb search and replace. Let's test this and
see what breaks. The performance problems are getting untenable.

Resolves: LibriVox#82
notartom added a commit to notartom/librivox-catalog that referenced this issue Aug 14, 2021
SELECT DISTINCT is bad for performance because it foces the use of
temporary tables [1]. Stop using it. As far as we could tell in our
testing, a simple s/SELECT DISTINCT/SELECT/ is enough in most places,
except two.

First, when listing a reader's projects. Readers are associated with
sections, not projects, and there's a many-to-one relation between
sections and projects, so we need the SELECT DISTINC to avoid
repeating the project once for every section for a particular reader.

Second, when listing users according to a specific role. There's a
one-to-many relation between users and roles, and the latter are
stored as rows in the database. We need the SELECT DISTINCT to avoid
listing a user once for every role that they have.

Other than that, we seem to be able to get away with just not doing
DISTINCT.

[1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables

Resolves: LibriVox#82
notartom added a commit to notartom/librivox-catalog that referenced this issue Aug 14, 2021
SELECT DISTINCT is bad for performance because it forces the use of
temporary tables [1]. Stop using it. As far as we could tell in our
testing, a simple s/SELECT DISTINCT/SELECT/ is enough in most places,
except two.

First, when listing a reader's projects. Readers are associated with
sections, not projects, and there's a many-to-one relation between
sections and projects, so we need the SELECT DISTINCT to avoid
repeating the project once for every section for a particular reader.

Second, when listing users according to a specific role. There's a
one-to-many relation between users and roles, and the latter are
stored as rows in the database. We need the SELECT DISTINCT to avoid
listing a user once for every role that they have.

Other than that, we seem to be able to get away with just not doing
DISTINCT.

[1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables

Resolves: LibriVox#82
notartom added a commit that referenced this issue Aug 14, 2021
SELECT DISTINCT is bad for performance because it forces the use of
temporary tables [1]. Stop using it. As far as we could tell in our
testing, a simple s/SELECT DISTINCT/SELECT/ is enough in most places,
except two.

First, when listing a reader's projects. Readers are associated with
sections, not projects, and there's a many-to-one relation between
sections and projects, so we need the SELECT DISTINCT to avoid
repeating the project once for every section for a particular reader.

Second, when listing users according to a specific role. There's a
one-to-many relation between users and roles, and the latter are
stored as rows in the database. We need the SELECT DISTINCT to avoid
listing a user once for every role that they have.

Other than that, we seem to be able to get away with just not doing
DISTINCT.

[1] https://dba.stackexchange.com/questions/34674/mysql-performance-tuning-queries-stuck-on-copying-to-tmp-tables

Resolves: #82
@notartom
Copy link
Member Author

Let's re-open this to remind ourselves that there are still 2 spots where SELECT DISTINCT is in use.

@notartom notartom reopened this Aug 14, 2021
@twinkietoes-on
Copy link
Collaborator

This is what is causing names to show up on the Archive description more than once, when the reader has more than one section, correct? It would be nice to get this fixed.

@notartom
Copy link
Member Author

This is what is causing names to show up on the Archive description more than once, when the reader has more than one section, correct? It would be nice to get this fixed.

... maybe? :) Could you link to an example? If that's indeed the case, we can revert the part of 29255e2 that's causing it.

@twinkietoes-on
Copy link
Collaborator

This is what is causing names to show up on the Archive description more than once, when the reader has more than one section, correct? It would be nice to get this fixed.

... maybe? :) Could you link to an example? If that's indeed the case, we can revert the part of 29255e2 that's causing it.

Right now this one shows duplicate readers: https://archive.org/details/fairylady_2112_librivox but we MCs go in and edit the description, so you won't see it for long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants