Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture HiC data #20

Open
yfarjoun opened this issue Mar 25, 2023 · 3 comments
Open

Capture HiC data #20

yfarjoun opened this issue Mar 25, 2023 · 3 comments

Comments

@yfarjoun
Copy link

Hello @XiaoTaoWang.

Thanks for maintaining such an organized site for installing and running EagleC.

I noticed that in the paper it is noted that eaglec can run on Capture HiC data but I didn't see any detailed instructions about how to actually do this. In particular, how does one get around the fact that capture data has a particular pattern due to the capture technology? does the normalization help with that? if so, which normalization should be done? CNV or ICE? (as an aside, what does "ICE" stand for?)

Related: what methods/scripts/functions did you use to evaluate performance? clearly there are many ways to compare a call-set to a truth set and the details matter, so I was wondering if you have the evaluation scripts made publicly available?

Thanks!

Yossi

@XiaoTaoWang
Copy link
Owner

Dear Yossi,

Thank you for your interest. First of all, ICE refers to "Iterative Correction and Eigenvector decomposition", a Hi-C data normalization method developed by Dr. Leonid A Mirny's lab in 2012 (DOI: 10.1038/nmeth.2148).

In our paper, we assessed the performance of EagleC on several region capture Hi-C datasets, where we knew the actual SV coordinates, and we found that EagleC accurately predicted them in all cases (achieving 100% recall), with no other pixels on the capture Hi-C maps being identified as SVs (achieving 100% precision). Since region capture Hi-C is essentially high-resolution Hi-C in local regions, it is reasonable to use the same Hi-C guidelines to predict SVs on these contact maps, i.e., predictions should be combined from 5kb, 10kb, and 50kb resolutions, and both raw and normalized matrices can be used (although sensitivity and specificity may differ for different normalization methods).

However, EagleC has not yet been optimized for promoter capture Hi-C (or any capture Hi-C that enriches discrete loci/elements in the genome). And based on our limited tests, ICE normalization should be used to ensure reasonable accuracy on these platforms.

I hope this information is helpful.

Best,
Xiaotao

@yfarjoun
Copy link
Author

Thanks for the references and the information!

@clolalan7
Copy link

considering that KR is similar to ICE, would it be possible to add this normalization? That way data from pipelines producing .hic files could be used without having to re-normalize to have Hi-C?

Thank you for considering,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants