Refactor API calls within LineageComparisonComponent #586

flaneuse · 2022-11-19T00:23:34Z

Right now, the API calls on LineageComparisonComponent (outbreak.info/compare-lineages) are very large, moving large amounts of data which is slow. As a result, our API backend often crashes when there are too many requests to this endpoint, as large amounts of data get shuttled around.

To create the heatmap on the page, the function getLineagesComparison calls getCharacteristicMutations(apiurl, lineage, 0, true, includeSublineages), which gives all mutations within a lineage, and then filters it to any mutation which appears in the lineage at a prevalence greater than the frequency threshold (default = 0.75). This step is necessary, because if you set frequency = 0.75, you would be missing data for mutations which exist in the lineage below the threshold:

Incorrect: missing cells for B.1.427 x A67V, B.1.427 x DEL69/70, B.1.427 x T95O, etc., which implies those mutations have not been found in the lineage, as opposed to "have been found, but at low prevalence":

Correct but super slow, since the frequency=0 query is HUGE.

To improve this, we could first get all the mutations which exist in the lineages above that threshold, then calculate the mutation prevalence in each lineage.

For instance, BA.3 and B.1.427 Comparison page:

The initial API call should identify the mutations which occur in either of those lineages (BA.3 or B.1.427) at 75% or greater. This should identify the following set of mutations for each, just looking at gene == "S":

BA.3: ['s:g142d', 's:n211i', 's:d614g', 's:h655y', 's:n679k', 's:a67v', 's:del69/70', 's:n969k', 's:q954h', 's:d796y', 's:p681h', 's:del143/145', 's:del212/212', 's:t95i', 's:n764k'],
B.1.427: ['s:d614g', 's:l452r', 's:s13i', 's:w152c']

Then, you can call https://api.outbreak.info/genomics/mutations-by-lineage with mutations as each of the mutations and pango_lineage as each of the lineages. (e.g. https://api.outbreak.info/genomics/mutations-by-lineage?mutations=S:A67V&pangolin_lineage=BA.3). You can combine mutations by AND to loop over each of them simultaneously -- however, mutations that don't exist within the lineage (like S:S13I in BA.3) will cause the entire API call to fail with a status code of 500.

First steps:

Profile if this approach would actually improve speed for a realistic set of lineages (for instance, the default set of lineages on outbreak.info/compare-lineages)
If so, implement it in the front-end.
Alternative approaches are welcome too.

The text was updated successfully, but these errors were encountered:

flaneuse added guru 1-optimization labels Nov 19, 2022

olekkorob mentioned this issue Dec 14, 2022

refactor: API call in LineageComparisonComponent #605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor API calls within LineageComparisonComponent #586

Refactor API calls within LineageComparisonComponent #586

flaneuse commented Nov 19, 2022

Refactor API calls within LineageComparisonComponent #586

Refactor API calls within LineageComparisonComponent #586

Comments

flaneuse commented Nov 19, 2022