comparison to skbio #2

wdwvt1 · 2016-11-13T19:51:40Z

Hi guys - looks super cool! Very excited to try it out. I checked out the paper on bioarxiv and I was wondering if you had benchmarked it against the UniFrac implementation in scikit-bio (it has replaced pycogent as the backend to QIIME etc.)? I suspect the main developers of scikit-bio will be interested in implementing this there.

dkoslicki · 2016-11-13T20:27:16Z

Thanks! I used PyCogent as that's what the FastUnifrac paper pointed to (if I recall correctly). There are apparently a bunch of different versions of Unifrac out there, so thanks for the pointer about scikit-bio! I'll see if I can't get it installed on my server and modify the Reproducibles.py file to compare to that version too.

dkoslicki · 2016-12-07T06:06:33Z

Daniel McDonald is implementing/incorporating this in the new update to "state" Unifrac (he's from Rob Knight's lab). Apparently we had very similar (if not identical) ideas for the basic Unifrac computation. I'll post an update when he gets the comparisons done (I've seen preliminary results, and it (Daniel's implementation of the basic idea) does compare favorably to the scikit-bio implementation).

jianshu93 · 2024-12-17T19:16:20Z

Hi All,

Just curious, EMDunifrac is not identical to unifrac right, for example simple unweighted unifrac for the examples below:

SampleA SampleB SampleC

T1 1 0 1
T2 0 1 0
T3 1 0 1
T4 1 0 0
T5 0 1 0
T6 0 1 0

((T1:0.1,(T2:0.05,T3:0.05):0.02):0.3,(T4:0.2,(T5:0.1,T6:0.15):0.05):0.4);

I get different values for the original unifrac (unweighted) and the EMDUnifac.

Thanks,
Jianshu

jianshu93 · 2024-12-17T19:18:12Z

Another question is, is EMDUnifrac a metric distance? e.g., triangle inequality et.al.
Thanks,
Jianshu

dkoslicki · 2024-12-18T19:35:29Z

Hi @jianshu93 EMDUnifrac is an honest to goodness distance metric satisfying all the properties including the triangle inequality.

As for your first question, EMDUnifrac is identical to Unifrac (or should be, barring bugs in the code). You'll see at this line and the following tests that compare EMDUnifrac with manually calculated Unifrac values. Also, the associated manuscript did mathematically prove that they are equivalent.

For your specific example, can you share with me how you are observing differences? Doing it manually on my end for small examples like these show agreement between EMDUnifrac and Unifrac, so perhaps seeing your calculations will help me notice any discrepancies.

jianshu93 · 2024-12-20T05:10:19Z

Hi @dkoslicki,

I was trying to implement EMDUniFrac using Rust and compare with the original UniFrac. Attached is the example I have. A newick format tree and a feature table. Note: original unifrac requires a rooted tree (midpoint root) but I did not see the rooting step in EMDUniFrac. I attached the tree and feature table. I cannot get the exact same UniFrac when manually calculating according to EMDUnifrac_weighted() and EMDUnifrac_unweighted(). I did midpoint rooting and then follow the exact method step by step, for unweighted, I have A and B, EMDUniFrac 29/60, original UniFrac 35/71. I think there are something I misunderstood.

My code gave the same results: https://github.com/jianshu93/EMDUniFrac-rs, any help will be appreciated.

Thanks,
Jianshu

Archive.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comparison to skbio #2

comparison to skbio #2

wdwvt1 commented Nov 13, 2016

dkoslicki commented Nov 13, 2016

dkoslicki commented Dec 7, 2016

jianshu93 commented Dec 17, 2024

jianshu93 commented Dec 17, 2024

dkoslicki commented Dec 18, 2024

jianshu93 commented Dec 20, 2024

comparison to skbio #2

comparison to skbio #2

Comments

wdwvt1 commented Nov 13, 2016

dkoslicki commented Nov 13, 2016

dkoslicki commented Dec 7, 2016

jianshu93 commented Dec 17, 2024

SampleA SampleB SampleC

jianshu93 commented Dec 17, 2024

dkoslicki commented Dec 18, 2024

jianshu93 commented Dec 20, 2024