Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test whether splitting the root node helps or hinders dating #287

Open
hyanwong opened this issue Jul 11, 2023 · 1 comment
Open

Test whether splitting the root node helps or hinders dating #287

hyanwong opened this issue Jul 11, 2023 · 1 comment

Comments

@hyanwong
Copy link
Member

hyanwong commented Jul 11, 2023

Code at tskit-dev/tsinfer#850.

Very preliminary tests don't indicate to me that this helps dating much, though. It would be nice if it helped to fix the Brandt problem.

@hyanwong
Copy link
Member Author

hyanwong commented Jul 11, 2023

Hmm, well when using the variational_gamma method on an inferred tree sequence, it looks to me as if this extra splitting does improve the correlation coefficient, if only marginally:

unsplit_dts = tsdate.date(inferred_ts.simplify(), mutation_rate=1e-8, Ne=1e4, progress=True, method="variational_gamma")

split_dts = tsdate.date(break_root_nodes(inferred_ts_split.simplify()), mutation_rate=1e-8, Ne=1e4, progress=True, method="variational_gamma")


### Now plot etc

fig, axes = plt.subplots(1,2, figsize=(15, 5))

# exclude sites with multiple mutations (e.g. placed by parsimony)
sites, counts1 = np.unique(unsplit_dts.mutations_site, return_counts=True)
sites, counts2 = np.unique(mts.mutations_site, return_counts=True)
sites = sites[np.logical_and(counts1 == 1, counts2 == 1)]
x = mts.nodes_time[mts.mutations_node[np.isin(mts.mutations_site, sites)]]
y = unsplit_dts.nodes_time[unsplit_dts.mutations_node[np.isin(unsplit_dts.mutations_site, sites)]]
axes[0].scatter(x, y, alpha=0.1)
axes[0].set_title("Only ultimate ancestor split")
axes[0].set_xscale('log')
axes[0].set_yscale('log')
axes[0].plot(np.logspace(1, 4), np.logspace(1, 4), c="k")
print("Default corr coeff", np.corrcoef(np.log(x[x * y != 0]), np.log(y[x * y != 0]))[0][1])

sites, counts1 = np.unique(split_dts.mutations_site, return_counts=True)
sites, counts2 = np.unique(mts.mutations_site, return_counts=True)
sites = sites[np.logical_and(counts1 == 1, counts2 == 1)]
x = mts.nodes_time[mts.mutations_node[np.isin(mts.mutations_site, sites)]]
y = split_dts.nodes_time[split_dts.mutations_node[np.isin(split_dts.mutations_site, sites)]]
axes[1].scatter(x, y, alpha=0.1)
axes[1].set_title("Root also split")
axes[1].set_xscale('log')
axes[1].set_yscale('log')
axes[1].plot(np.logspace(1, 4), np.logspace(1, 4), c="k")
print("Split corr coeff", np.corrcoef(np.log(x[x * y != 0]), np.log(y[x * y != 0]))[0][1])
Screenshot 2023-07-11 at 18 10 24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant