TCAG-WGS-CNV-workflow

This repository contains scripts involved in our workflow for detecting CNVs from WGS data using read depth-based methods.

If you use any of these scripts, please cite:

B. Trost, S. Walker, Z. Wang, B. Thiruvahindrapuram, J.R. MacDonald, W.W.L. Sung, S.L. Pereira, J. Whitney, A.J.S. Chan, G. Pellecchia, M.S. Reuter, S. Lok, R.K.C. Yuen, C.R. Marshall, D. Merico, and S.W. Scherer. A comprehensive workflow for read depth-based identification of copy-number variation from whole-genome sequence data. American Journal of Human Genetics 102(1):142-155, 2018.

The raw HuRef and NA12878 sequencing data used in the paper are available from:

https://www.ncbi.nlm.nih.gov/sra/PRJNA542535

The workflow in the above paper relies on two CNV-detection tools, which can be obtained as follows.

ERDS: https://github.com/igm-team/ERDS
CNVnator: https://github.com/abyzovlab/CNVnator

This README file lists, and explains the purpose of, each script. The scripts are divided into three categories:

"main scripts", which are designed to be called directly;
"accessory scripts", which are only meant to be called by/used by the main scripts; and
"commands", which are not meant to be called as scripts, but rather contain commands (for running, e.g., BWA, GATK, the CNV detection algorithms) that the user can copy-and-paste, replace placeholder filenames with their own, and then execute sequentially on their system/cluster.

For instructions on each script, as well as example usage, please refer to the comments at the beginning of each one.

Main scripts (designed to be called directly)

benchmark_overlap_counts.py: Output counts summarizing how CNVs in different categories overlap with the benchmark.
CNV_overlap.py: Finding overlapping CNV calls from either the CNV-detection algorithms, or different benchmark methods, or both.
CNV_read_depth_checker.sh: Calculate the ratio between the read depth of a CNV and the read depth of the same-size surrounding regions.
compare_CNVs_to_benchmark.py: Compare CNVs output from the CNV-detection algorithms to a CNV benchmark.
compare_with_RLCR_definition.py: Compare CNV calls that have been converted to the common format with the RLCR definition. Requires the "intervaltree" Python module to be installed.
convert_CNV_calls_to_common_format.py: Convert CNV calls to common format.
IQR_samtools_depth.sh: Calculates IQR from a BAM file.
merge_Genome_STRiP.py: Use this script on a Genome STRiP file that has already been converted to the common format using convert_CNV_calls_to_common_format.py in order to merge overlapping calls.
process_cnvs.erds+.sh: Use this script to perform the CNVnator-ERDS merging that was used in Stage 3 of the study (the analysis of rare, genic CNVs in the Autism Speaks MSSNG cohort).
reproduce_results.sh: A script that runs all the other main scripts in an appropriate sequence.
split_HuRef_benchmark.sh: Split the file containing the HuRef benchmark CNVs into separate files, one for each benchmark technology.

Accessory scripts (NOT to be called directly)

add_features.py: Used by process_cnvs.erds+.sh.
Canvas.py: Custom library for converting Canvas output to common format.
cnMOPS.py: Custom library for converting cn.MOPS output to common format.
CNVnator.py: Custom library for converting CNVnator output to common format.
CNVworkflowlib.py: Custom library of python functions used by other python scripts.
ERDS.py: Custom library for converting ERDS output to common format.
format_cnvnator_results.py: Used by process_cnvs.erds+.sh.
format_erds_results.py: Used by process_cnvs.erds+.sh.
functions.py: Used by process_cnvs.erds+.sh.
Genome_STRiP.py: Custom library for converting Genome_STRiP output to common format.
get_normalized_depth.py: Used by CNV_read_depth_checker.sh to actually calculate the normalized depth of a CNV.
index_samtools_depth.py: Used by CNV_read_depth_checker.sh to index a "samtools depth" file for fast use of get_normalized_depth.py.
IQR_samtools_depth.py: Does most of the work involved in calculating IQR from a BAM file.
merge_cnvnator_results.py: Used by process_cnvs.erds+.sh.
merge_erds_results.py: Used by process_cnvs.erds+.sh.
myvcf.py: Custom library of python functions for dealing with VCF files.
RDXplorer.py: Custom library for converting RDXplorer output to common format.
SV.py: Custom library of python functions for representing SVs/CNVs.

Data files (used by scripts)

hg19_gap.bed: Used by process_cnvs.erds+.sh

Commands (designed for the user to execute commands one-by-one)

commands.sh: a list of commands for running BWA and the CNV-detection algorithms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCAG-WGS-CNV-workflow

Main scripts (designed to be called directly)

Accessory scripts (NOT to be called directly)

Data files (used by scripts)

Commands (designed for the user to execute commands one-by-one)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
CNV_overlap.py		CNV_overlap.py
CNV_read_depth_checker.sh		CNV_read_depth_checker.sh
CNVnator.py		CNVnator.py
CNVworkflowlib.py		CNVworkflowlib.py
COMMANDS.sh		COMMANDS.sh
Canvas.py		Canvas.py
ERDS.py		ERDS.py
Genome_STRiP.py		Genome_STRiP.py
IQR_samtools_depth.py		IQR_samtools_depth.py
IQR_samtools_depth.sh		IQR_samtools_depth.sh
LICENSE		LICENSE
LUMPY.py		LUMPY.py
MANTA.py		MANTA.py
RDXplorer.py		RDXplorer.py
README.md		README.md
SV.py		SV.py
VCF.py		VCF.py
add_features.py		add_features.py
benchmark_overlap_counts.py		benchmark_overlap_counts.py
cnMOPS.py		cnMOPS.py
compare_CNVs_to_benchmark.py		compare_CNVs_to_benchmark.py
compare_with_RLCR_definition.py		compare_with_RLCR_definition.py
convert_CNV_calls_to_common_format.py		convert_CNV_calls_to_common_format.py
format_cnvnator_results.py		format_cnvnator_results.py
format_erds_results.py		format_erds_results.py
functions.py		functions.py
get_normalized_depth.py		get_normalized_depth.py
hg19_gap.bed		hg19_gap.bed
index_samtools_depth.py		index_samtools_depth.py
merge_Genome_STRiP.py		merge_Genome_STRiP.py
merge_cnvnator_results.py		merge_cnvnator_results.py
merge_erds_results.py		merge_erds_results.py
myvcf.py		myvcf.py
process_cnvs.erds+.sh		process_cnvs.erds+.sh
reproduce_results.sh		reproduce_results.sh
split_HuRef_benchmark.sh		split_HuRef_benchmark.sh

License

bjtrost/TCAG-WGS-CNV-workflow

Folders and files

Latest commit

History

Repository files navigation

TCAG-WGS-CNV-workflow

Main scripts (designed to be called directly)

Accessory scripts (NOT to be called directly)

Data files (used by scripts)

Commands (designed for the user to execute commands one-by-one)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages