-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantify how well we pick up saltational variants #337
Comments
Should we also check whether the samples of a variant are monophyletic? In some cases, e.g. Alpha, it should be. |
PR #335 helps with this issue. |
We'll need to quantify how monophyletic the samples are, as nothing will have a simple yes/no answer here |
For a given VoC, in the ideal case, we have the same MRCA whose descendants are all samples assigned to the VoC across the trees. For each VoC, we can keep track of:
In the ideal case, the first quantity should be 1, and the fraction is 1.0 for all trees. |
Something like this would get us the above. def get_mrca_per_tree(ts, samples):
"""
Get the MRCA node for all the samples for each tree in a ts.
Note that if not found, it is tskit.NULL (-1).
"""
mrcas = np.zeros(ts.num_trees, dtype=np.int32) - 1
i = 0
for tree in ts.trees():
mrcas[i] = tree.mrca(*samples)
i += 1
return mrcas
def get_fraction_focal_samples(ts, mrcas, samples):
"""
Get the fraction of samples under the MRCA, which are among the specified samples.
"""
num_focal_samples = np.zeros(ts.num_trees, dtype=np.int32)
num_samples_under_mrca = np.zeros(ts.num_trees, dtype=np.int32)
i = 0
for tree in ts.trees():
if mrcas[i] != tskit.NULL:
_samples = np.array([s for s in tree.samples(mrcas[i])], dtype=np.int32)
num_samples_under_mrca[i] = len(_samples)
num_focal_samples[i] = np.sum(np.isin(samples, _samples))
i += 1
frac_focal_samples = num_focal_samples / num_samples_under_mrca
return frac_focal_samples |
For example, if we look at the above quantifies for P.1 in The unique MRCAs for B.1.351 are [72, 147, 46155], and the mean fraction of focal samples under the MRCA across the trees is 0.3260. |
We should add Beta and Gamma to LineageDetails. For each VoC, we can do: mrcas = get_mrca_per_tree(ts, samples)
frac_focal_samples = get_fraction_focal_samples(ts, mrcas, samples)
num_uniq_mrcas = np.unique(mrcas)
mean_frac_focal_samples = np.mean(frac_focal_samples) |
These are the major saltational VoCs before Omicron:
B.1.1.7 (Alpha)
B.1.351 (Beta)
P.1 (Gamma)
B.1.617.2 (Delta)
We should see in our trees that there are an unusually many mutations occurring along the path from the root to their samples, compared with "pre-VoC" variants (e.g. B.1. and B.1.1.1).
I think a check is to get the 5-point summary of the number of mutations across the samples of each variant. For example, the Alpha samples should have 20 or so mutations, whereas pre-VoCs 5 or so.
The text was updated successfully, but these errors were encountered: