Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comparison to skbio #2

Open
wdwvt1 opened this issue Nov 13, 2016 · 6 comments
Open

comparison to skbio #2

wdwvt1 opened this issue Nov 13, 2016 · 6 comments

Comments

@wdwvt1
Copy link

wdwvt1 commented Nov 13, 2016

Hi guys - looks super cool! Very excited to try it out. I checked out the paper on bioarxiv and I was wondering if you had benchmarked it against the UniFrac implementation in scikit-bio (it has replaced pycogent as the backend to QIIME etc.)? I suspect the main developers of scikit-bio will be interested in implementing this there.

@dkoslicki
Copy link
Owner

Thanks! I used PyCogent as that's what the FastUnifrac paper pointed to (if I recall correctly). There are apparently a bunch of different versions of Unifrac out there, so thanks for the pointer about scikit-bio! I'll see if I can't get it installed on my server and modify the Reproducibles.py file to compare to that version too.

@dkoslicki
Copy link
Owner

Daniel McDonald is implementing/incorporating this in the new update to "state" Unifrac (he's from Rob Knight's lab). Apparently we had very similar (if not identical) ideas for the basic Unifrac computation. I'll post an update when he gets the comparisons done (I've seen preliminary results, and it (Daniel's implementation of the basic idea) does compare favorably to the scikit-bio implementation).

@jianshu93
Copy link

Hi All,

Just curious, EMDunifrac is not identical to unifrac right, for example simple unweighted unifrac for the examples below:

SampleA SampleB SampleC

T1 1 0 1
T2 0 1 0
T3 1 0 1
T4 1 0 0
T5 0 1 0
T6 0 1 0

((T1:0.1,(T2:0.05,T3:0.05):0.02):0.3,(T4:0.2,(T5:0.1,T6:0.15):0.05):0.4);

I get different values for the original unifrac (unweighted) and the EMDUnifac.

Thanks,
Jianshu

@jianshu93
Copy link

Another question is, is EMDUnifrac a metric distance? e.g., triangle inequality et.al.
Thanks,
Jianshu

@dkoslicki
Copy link
Owner

Hi @jianshu93 EMDUnifrac is an honest to goodness distance metric satisfying all the properties including the triangle inequality.

As for your first question, EMDUnifrac is identical to Unifrac (or should be, barring bugs in the code). You'll see at this line and the following tests that compare EMDUnifrac with manually calculated Unifrac values. Also, the associated manuscript did mathematically prove that they are equivalent.

For your specific example, can you share with me how you are observing differences? Doing it manually on my end for small examples like these show agreement between EMDUnifrac and Unifrac, so perhaps seeing your calculations will help me notice any discrepancies.

@jianshu93
Copy link

Hi @dkoslicki,

I was trying to implement EMDUniFrac using Rust and compare with the original UniFrac. Attached is the example I have. A newick format tree and a feature table. Note: original unifrac requires a rooted tree (midpoint root) but I did not see the rooting step in EMDUniFrac. I attached the tree and feature table. I cannot get the exact same UniFrac when manually calculating according to EMDUnifrac_weighted() and EMDUnifrac_unweighted(). I did midpoint rooting and then follow the exact method step by step, for unweighted, I have A and B, EMDUniFrac 29/60, original UniFrac 35/71. I think there are something I misunderstood.

My code gave the same results: https://github.com/jianshu93/EMDUniFrac-rs, any help will be appreciated.

Thanks,
Jianshu

Archive.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants