Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate RF distance diameter #58

Open
willdumm opened this issue Jan 17, 2023 · 0 comments
Open

Estimate RF distance diameter #58

willdumm opened this issue Jan 17, 2023 · 0 comments
Assignees

Comments

@willdumm
Copy link
Collaborator

True RF distance diameter:

For a DAG which doesn't contain too many histories, it's possible to compute the true RF distance diameter with something like:

max(dag.optimal_rf_distance(history, optimal_func=max) for history in dag)

However, this is prohibitively slow for large DAGs. It's possible that there's a way to get a good estimate for diameter much more efficiently.

Under-estimate for diameter:

We can always get the maximum RF distance between a reference history and any other history in a DAG with

dag.optimal_rf_distance(ref_history, optimal_func=max)

This gives an under-estimate for the diameter, but possibly not a very good one depending on our choice of ref_history.
I propose it may be pretty good if the reference history is a topological outlier, such as what we get from dag.trim_optimal_sum_rf_distance(dag, optimal_func=max).sample()

Over-estimate for diameter:

Since RF distance is a proper distance, the triangle inequality tells us that given any tree a in the DAG, and a tree b that's as far as possible from a in the DAG, the diameter of the set of trees in the DAG is no more than twice the distance from a to b. This is true regardless of the tree a that we start with, but which tree we choose for a will determine how close the over-estimate (twice the distance from a to b) is to the true value of diameter.

It seems that a good choice for a would be the topological median tree, because that should have generally smaller maximum distance to any other tree in the DAG. So, perhaps a good over-estimate of RF distance diameter for a dag would be

2 * dag.optimal_rf_distance(dag.trim_optimal_sum_rf_distance(dag, optimal_func=min).sample(), optimal_func=max)

Evaluating these estimates:

It would be interesting to implement these over/under-estimates, and compare them to the true diameters for a bunch of DAGs that are small enough to compute the true diameters of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants