Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantify how well we pick up saltational variants #337

Open
szhan opened this issue Oct 7, 2024 · 7 comments
Open

Quantify how well we pick up saltational variants #337

szhan opened this issue Oct 7, 2024 · 7 comments

Comments

@szhan
Copy link
Contributor

szhan commented Oct 7, 2024

These are the major saltational VoCs before Omicron:
B.1.1.7 (Alpha)
B.1.351 (Beta)
P.1 (Gamma)
B.1.617.2 (Delta)

We should see in our trees that there are an unusually many mutations occurring along the path from the root to their samples, compared with "pre-VoC" variants (e.g. B.1. and B.1.1.1).

I think a check is to get the 5-point summary of the number of mutations across the samples of each variant. For example, the Alpha samples should have 20 or so mutations, whereas pre-VoCs 5 or so.

@szhan
Copy link
Contributor Author

szhan commented Oct 7, 2024

Should we also check whether the samples of a variant are monophyletic? In some cases, e.g. Alpha, it should be.

@szhan
Copy link
Contributor Author

szhan commented Oct 7, 2024

PR #335 helps with this issue.

@jeromekelleher
Copy link
Owner

We'll need to quantify how monophyletic the samples are, as nothing will have a simple yes/no answer here

@szhan
Copy link
Contributor Author

szhan commented Oct 7, 2024

For a given VoC, in the ideal case, we have the same MRCA whose descendants are all samples assigned to the VoC across the trees.

For each VoC, we can keep track of:

  • The number of unique MRCAs across the trees.
  • The fraction of samples under the MRCA assigned to the VoC per tree.

In the ideal case, the first quantity should be 1, and the fraction is 1.0 for all trees.

@szhan
Copy link
Contributor Author

szhan commented Oct 7, 2024

Something like this would get us the above.

def get_mrca_per_tree(ts, samples):
    """
    Get the MRCA node for all the samples for each tree in a ts.
    Note that if not found, it is tskit.NULL (-1).
    """
    mrcas = np.zeros(ts.num_trees, dtype=np.int32) - 1
    i = 0
    for tree in ts.trees():
        mrcas[i] = tree.mrca(*samples)
        i += 1
    return mrcas


def get_fraction_focal_samples(ts, mrcas, samples):
    """
    Get the fraction of samples under the MRCA, which are among the specified samples.
    """
    num_focal_samples = np.zeros(ts.num_trees, dtype=np.int32)
    num_samples_under_mrca = np.zeros(ts.num_trees, dtype=np.int32)
    i = 0
    for tree in ts.trees():
        if mrcas[i] != tskit.NULL:
            _samples = np.array([s for s in tree.samples(mrcas[i])], dtype=np.int32)
            num_samples_under_mrca[i] = len(_samples)
            num_focal_samples[i] = np.sum(np.isin(samples, _samples))
        i += 1
    frac_focal_samples = num_focal_samples / num_samples_under_mrca
    return frac_focal_samples

@szhan
Copy link
Contributor Author

szhan commented Oct 7, 2024

For example, if we look at the above quantifies for P.1 in clipends-v2-mm_4-noexact-f500-clustloc-mrm_2-rw_10-mgs_10-2021-03-04.ts.tsz, there is a unique MRCA across the trees (node = 125220), and the fraction of focal samples under the MRCA is 0.77777778 across all the trees.

The unique MRCAs for B.1.351 are [72, 147, 46155], and the mean fraction of focal samples under the MRCA across the trees is 0.3260.

@szhan
Copy link
Contributor Author

szhan commented Oct 7, 2024

We should add Beta and Gamma to LineageDetails. For each VoC, we can do:

mrcas = get_mrca_per_tree(ts, samples)
frac_focal_samples = get_fraction_focal_samples(ts, mrcas, samples)
num_uniq_mrcas = np.unique(mrcas)
mean_frac_focal_samples = np.mean(frac_focal_samples)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants