-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
comparison to skbio #2
Comments
Thanks! I used PyCogent as that's what the FastUnifrac paper pointed to (if I recall correctly). There are apparently a bunch of different versions of Unifrac out there, so thanks for the pointer about scikit-bio! I'll see if I can't get it installed on my server and modify the Reproducibles.py file to compare to that version too. |
Daniel McDonald is implementing/incorporating this in the new update to "state" Unifrac (he's from Rob Knight's lab). Apparently we had very similar (if not identical) ideas for the basic Unifrac computation. I'll post an update when he gets the comparisons done (I've seen preliminary results, and it (Daniel's implementation of the basic idea) does compare favorably to the scikit-bio implementation). |
Hi All, Just curious, EMDunifrac is not identical to unifrac right, for example simple unweighted unifrac for the examples below: SampleA SampleB SampleCT1 1 0 1 ((T1:0.1,(T2:0.05,T3:0.05):0.02):0.3,(T4:0.2,(T5:0.1,T6:0.15):0.05):0.4); I get different values for the original unifrac (unweighted) and the EMDUnifac. Thanks, |
Another question is, is EMDUnifrac a metric distance? e.g., triangle inequality et.al. |
Hi @jianshu93 EMDUnifrac is an honest to goodness distance metric satisfying all the properties including the triangle inequality. As for your first question, EMDUnifrac is identical to Unifrac (or should be, barring bugs in the code). You'll see at this line and the following tests that compare EMDUnifrac with manually calculated Unifrac values. Also, the associated manuscript did mathematically prove that they are equivalent. For your specific example, can you share with me how you are observing differences? Doing it manually on my end for small examples like these show agreement between EMDUnifrac and Unifrac, so perhaps seeing your calculations will help me notice any discrepancies. |
Hi @dkoslicki, I was trying to implement EMDUniFrac using Rust and compare with the original UniFrac. Attached is the example I have. A newick format tree and a feature table. Note: original unifrac requires a rooted tree (midpoint root) but I did not see the rooting step in EMDUniFrac. I attached the tree and feature table. I cannot get the exact same UniFrac when manually calculating according to EMDUnifrac_weighted() and EMDUnifrac_unweighted(). I did midpoint rooting and then follow the exact method step by step, for unweighted, I have A and B, EMDUniFrac 29/60, original UniFrac 35/71. I think there are something I misunderstood. My code gave the same results: https://github.com/jianshu93/EMDUniFrac-rs, any help will be appreciated. Thanks, |
Hi guys - looks super cool! Very excited to try it out. I checked out the paper on bioarxiv and I was wondering if you had benchmarked it against the UniFrac implementation in scikit-bio (it has replaced pycogent as the backend to QIIME etc.)? I suspect the main developers of scikit-bio will be interested in implementing this there.
The text was updated successfully, but these errors were encountered: