You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 18, 2020. It is now read-only.
A typical Varify query in our installation generates between five and six thousand SQL queries to show the initial 20 HTML results.
Varify could benefit greatly from some caching or elimination of redundant database lookups. (If there is caching in place that doesn't seem to be working, could it possibly be due to memcached's default 1MB object limit or a too-small MAX_ENTRIES value in the Django CACHES setting?)
The main problem is highly redundant pmid queries. These add up to about the same as the big money queries (the top 3 slowest) and in terms of overall processing time may be significantly more costly.
I doubt the details of the query matter, but my test query was as follows:
Sample is one of the following:
P-PSeq_0038-P-A from project U01 (CPF1308003)
P-Pseq_0043-P-A from project U01 (CPF1309006)
P-Pseq_0044-P-A from project U01 (CPF1309006)
P-Pseq_0045-P-A from project U01 (CPF1309006)
1007 from project U01 (NEMRCD)
1007p1 from project U01 (NEMRCD)
P-PSeq0065-P-A from project U01 (Pseq_batch17)
Gene Symbol is BRCA1
Effect is Non-Synonymous Coding
SQL was collected for the first page of HTML results.
The biggest offenders were the pmid queries:
1800 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7061
960 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7888
360 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7887
240 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7374
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 8044
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7609
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 6878
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 12332
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 11075
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 11074
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 11070
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 10401
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 10397
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "gene_pubmed" ON ("pubmed"."pmid" = "gene_pubmed"."pubmed_id") WHERE "gene_pubmed"."gene_id" = 2071
6 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 254416
5 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 254415
5 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 254414
3 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 1108096
1 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 254403
There are also a number of queries that are each executed 120 times, e.g.:
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75417
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75416
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75415
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75414
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75413
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75412
These are similar redundant queries for effect, gene, phenotype, pmid, and effect_impact.
I don't really understand the redundancy patterns. E.g. for gene, we have:
120 SELECT "gene"."id", "gene"."chr_id", "gene"."symbol", "gene"."name", "gene"."hgnc_id" FROM "gene" WHERE "gene"."id" = 2071
whereas for phenotype we have:
120 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" INNER JOIN "gene_phenotype" ON ("phenotype"."id" = "gene_phenotype"."phenotype_id") WHERE "gene_phenotype"."gene_id" = 2071
6 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 11075
5 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 11074
5 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 11070
3 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 12332
1 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 7887
1 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 7061
The text was updated successfully, but these errors were encountered:
Expect a roughly six-fold performance improvement for this query if all SQL redundancy is removed. The time for psql to execute all the SQL queries behind this Varify query over the local network is 42.8 sec. If all of the redundant queries are removed via | sort | uniq, the time taken is 7.2 sec. The database server cache was warm in both cases. The real-world effect including Python and caching overhead remains to be seen.
A typical Varify query in our installation generates between five and six thousand SQL queries to show the initial 20 HTML results.
Varify could benefit greatly from some caching or elimination of redundant database lookups. (If there is caching in place that doesn't seem to be working, could it possibly be due to memcached's default 1MB object limit or a too-small MAX_ENTRIES value in the Django CACHES setting?)
The main problem is highly redundant pmid queries. These add up to about the same as the big money queries (the top 3 slowest) and in terms of overall processing time may be significantly more costly.
I doubt the details of the query matter, but my test query was as follows:
Sample is one of the following:
P-PSeq_0038-P-A from project U01 (CPF1308003)
P-Pseq_0043-P-A from project U01 (CPF1309006)
P-Pseq_0044-P-A from project U01 (CPF1309006)
P-Pseq_0045-P-A from project U01 (CPF1309006)
1007 from project U01 (NEMRCD)
1007p1 from project U01 (NEMRCD)
P-PSeq0065-P-A from project U01 (Pseq_batch17)
Gene Symbol is BRCA1
Effect is Non-Synonymous Coding
SQL was collected for the first page of HTML results.
The biggest offenders were the pmid queries:
1800 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7061
960 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7888
360 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7887
240 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7374
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 8044
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 7609
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 6878
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 12332
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 11075
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 11074
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 11070
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 10401
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "phenotype_articles" ON ("pubmed"."pmid" = "phenotype_articles"."pubmed_id") WHERE "phenotype_articles"."phenotype_id" = 10397
120 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "gene_pubmed" ON ("pubmed"."pmid" = "gene_pubmed"."pubmed_id") WHERE "gene_pubmed"."gene_id" = 2071
6 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 254416
5 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 254415
5 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 254414
3 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 1108096
1 SELECT "pubmed"."pmid" FROM "pubmed" INNER JOIN "variant_pubmed" ON ("pubmed"."pmid" = "variant_pubmed"."pubmed_id") WHERE "variant_pubmed"."variant_id" = 254403
There are also a number of queries that are each executed 120 times, e.g.:
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75417
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75416
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75415
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75414
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75413
20 SELECT "transcript"."id", "transcript"."refseq_id", "transcript"."strand", "transcript"."start", "transcript"."end", "transcript"."coding_start", "transcript"."coding_end", "transcript"."coding_start_status", "transcript"."coding_end_status", "transcript"."exon_count", "transcript"."gene_id" FROM "transcript" WHERE "transcript"."id" = 75412
These are similar redundant queries for effect, gene, phenotype, pmid, and effect_impact.
I don't really understand the redundancy patterns. E.g. for gene, we have:
120 SELECT "gene"."id", "gene"."chr_id", "gene"."symbol", "gene"."name", "gene"."hgnc_id" FROM "gene" WHERE "gene"."id" = 2071
whereas for phenotype we have:
120 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" INNER JOIN "gene_phenotype" ON ("phenotype"."id" = "gene_phenotype"."phenotype_id") WHERE "gene_phenotype"."gene_id" = 2071
6 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 11075
5 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 11074
5 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 11070
3 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 12332
1 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 7887
1 SELECT "phenotype"."id", "phenotype"."term", "phenotype"."description", "phenotype"."hpo_id" FROM "phenotype" WHERE "phenotype"."id" = 7061
The text was updated successfully, but these errors were encountered: