-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CN_Learn Pipeline Issues #1
Comments
P.S. Another issue in
This results in a bed file with differing columns which bedtools does not like during the |
Hello Eugene, Our apologies for the delay in response. Ever since we posted the initial draft of the manuscript on archive, we have been working on simplifying and testing the pipeline. Specifically, we have simplified the software installation process by providing a Docker image with all the software tools preinstalled. We will address each issue you reported here and respond within the next week. Once you hear back from us, please feel free to download the latest version and try using CN-Learn again. Sorry for the inconvenience. We will keep you posted. Vijay |
Hello Eugene, Vijay |
Hello Girirajan Lab,
Thank you for the recent BioRxiv manuscript on your method for QCing WES CNVs — the metrics appear to prove helpful in generating a good, QC'd set of WES CNVs. I am doing QC of some calls of my own and have a few questions regarding some of the code deposited here on github:
In doing step 5a/b vs 5, what is the major difference here? I am having issues with running merge_overlapping_CNVs_readdepth.sh due to excessive memory usage by
bedtools intersect
. I am wondering if the intended behavior is to get the coverage of every basepair over a WES probe, or just the mean coverage across that probe. My bp_coverage_dir/*.bpcov files have approx. 80-90M lines per file.generate_bp_coverage.sh
is actually namedextract_bp_coverage.sh
and leads to confusion when running.Throughout the pipeline (e.g.
merge_overlapping_CNVs_endjoin.sh
,calculate_CNV_overlap.sh
), you havefor sample in 'cat ${SAMPLE_LIST}' | head -10;
. As programmed, this takes only the first 10 samples in the sample list. Is this as intended? I would have imagined all samples would have to be processed through the pipeline.I think the script
merge_overlapping_CNVs_endjoin.sh
is slightly broken as written. The lineif using only three callers (such as myself) prints what I imagine to be the incorrect output of something like:
rather than what I imagine is the intended output of:
I assume this is likely due to the code:
As the output from bedtools intersectBed shouldn't change based on the total number of callers used. Or am I mistaken?
/data/CN_learn/config.params
) and the CpG file inextract_gc_map_vals.sh
may make it difficult for some users to follow your code. I suppose this depends on your intention as to the use of your code-base (reproducibility of your paper or broader use like my own).Thanks in advance for the help!
The text was updated successfully, but these errors were encountered: