You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, the API calls on LineageComparisonComponent (outbreak.info/compare-lineages) are very large, moving large amounts of data which is slow. As a result, our API backend often crashes when there are too many requests to this endpoint, as large amounts of data get shuttled around.
To create the heatmap on the page, the function getLineagesComparison calls getCharacteristicMutations(apiurl, lineage, 0, true, includeSublineages), which gives all mutations within a lineage, and then filters it to any mutation which appears in the lineage at a prevalence greater than the frequency threshold (default = 0.75). This step is necessary, because if you set frequency = 0.75, you would be missing data for mutations which exist in the lineage below the threshold:
Incorrect: missing cells for B.1.427 x A67V, B.1.427 x DEL69/70, B.1.427 x T95O, etc., which implies those mutations have not been found in the lineage, as opposed to "have been found, but at low prevalence":
Correct but super slow, since the frequency=0 query is HUGE.
To improve this, we could first get all the mutations which exist in the lineages above that threshold, then calculate the mutation prevalence in each lineage.
The initial API call should identify the mutations which occur in either of those lineages (BA.3 or B.1.427) at 75% or greater. This should identify the following set of mutations for each, just looking at gene == "S":
Profile if this approach would actually improve speed for a realistic set of lineages (for instance, the default set of lineages on outbreak.info/compare-lineages)
If so, implement it in the front-end.
Alternative approaches are welcome too.
The text was updated successfully, but these errors were encountered:
Right now, the API calls on LineageComparisonComponent (outbreak.info/compare-lineages) are very large, moving large amounts of data which is slow. As a result, our API backend often crashes when there are too many requests to this endpoint, as large amounts of data get shuttled around.
To create the heatmap on the page, the function
getLineagesComparison
callsgetCharacteristicMutations(apiurl, lineage, 0, true, includeSublineages)
, which gives all mutations within a lineage, and then filters it to any mutation which appears in the lineage at a prevalence greater than thefrequency
threshold (default = 0.75). This step is necessary, because if you setfrequency = 0.75
, you would be missing data for mutations which exist in the lineage below the threshold:Incorrect: missing cells for B.1.427 x A67V, B.1.427 x DEL69/70, B.1.427 x T95O, etc., which implies those mutations have not been found in the lineage, as opposed to "have been found, but at low prevalence":
Correct but super slow, since the
frequency=0
query is HUGE.To improve this, we could first get all the mutations which exist in the lineages above that threshold, then calculate the mutation prevalence in each lineage.
For instance, BA.3 and B.1.427 Comparison page:
gene == "S"
:mutations
as each of the mutations andpango_lineage
as each of the lineages. (e.g. https://api.outbreak.info/genomics/mutations-by-lineage?mutations=S:A67V&pangolin_lineage=BA.3). You can combine mutations byAND
to loop over each of them simultaneously -- however, mutations that don't exist within the lineage (like S:S13I in BA.3) will cause the entire API call to fail with a status code of 500.First steps:
The text was updated successfully, but these errors were encountered: