CNV Array Pipeline ReadMe

CNV Array Analysis Workflow

CNV Pipeline (--workflow cnv)

• Step 1: Takes in a CSV and parse the file to verify sampleID, idat_red, and idat_green fields, checks valid gender values.
• Step 2: IAAP CLI converts the IDAT files to gtc format.
• Step 3 : Takes in GTC files, processes them through a series of BCFtools commands to convert them into a sorted, normalized, and indexed VCF file.
• Step 4: Extracts BAF and LRR values from a BCF file, formats the values, and outputs them into separate files 'bcftools_convert.BAF' and 'bcftools_convert.LRR'.
• Step 5: This Module uses the ASCAT package to analyze BAF and LRR data for identifying CNVs.
• Step 6: This module annotates CNV segments with gene information and produces visualizations.

CNV Flowchart

flowchart TB
    p0((Sample))
    p1[IAAP_CLI]
    p2[BCFTOOLS_GTC2VCF]
    p3[BCFTOOLS_QUERY_ASCAT]
    p4[ASCAT]
    p5[ASCAT_ANNOTATION]
    o1([VCF with BAF/LRR]):::output
    o2([BAF File]):::output
    o3([LRR File]):::output
    o4([Raw CNV segments]):::output
    o5([Sample ploidy]):::output
    o6([Additional ASCAT Output]):::output
    o7([Genes Annotated with CNV]):::output
    o8([Annotated CNV Segments]):::output

    p0 -->|IDAT Files\nRed/Green| p1

    subgraph " "
    p1 --> p2
    p2 --> o1
    p2 --> p3
    p3 --> o2
    p3 --> o3
    o2 --> p4
    o3 --> p4
    p4 --> o4
    p4 --> o5
    p4 --> o6
    o4 --> p5
    o5 --> p5
    p5 --> o7
    p5 --> o8
    end

classDef output fill:#90aaff,stroke:#6c8eff,stroke-width:2px,color:#000000

Parameters for the Workflow

--pubdir
- Default: /<PATH>
- Comment: The directory that the saved outputs will be stored.
-w
- Default: /<PATH>
- Comment: The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
--bpm
- Default: /<PATH>
- Comment: The path to the BPM file.
--egt
- Default: /<PATH>
- Comment: The path to the EGT file.
--gtc_csv
- Default: /<PATH>
- Comment: The path to the GTC CSV file.
--gtc_output
- Default: /<PATH>
- Comment: The directory of GTC files output from the previous step.
--ref_fa
- Default: /<PATH>
- Comment: The path to the reference FASTA file.
--BAF
- Default: /<PATH>
- Comment: The BAF file output from the BCFTOOLS_QUERY_ASCAT module.
--LRR
- Default: /<PATH>
- Comment: The LRR file output from the BCFTOOLS_QUERY_ASCAT module.
--segments_raw
- Default: *segments_raw.txt
- Comment: The raw segments file.
--ploidy
- Default: *ploidy.txt
- Comment: The ploidy file.
--chromosome_arm
- Default: /<PATH>
- Comment: The path to the chromosome arm file.
--cnv_gene_annotation
- Default: /<PATH>
- Comment: The path to the CNV gene annotation file.

Pipeline Default Outputs

NOTE: * Represents a wild card that is a placeholder for values that will be filled by sample names/id's when the pipeline is run.

Naming Convention	Description
`*_convert.vcf`	VCF file containing B allele frequency and LogR ratios for each SNP in the array
`*_convert_info.tsv`	The TSV file contains additional information extracted from the IDAT files, which include metadata and auxiliary information
`*_convert.BAF`	The BAF file is a measure which represent the reads that support the B allele at a particular variant site
`*_convert.LRR`	LRR file has the log ratio of observed read depth to the expected read depth at a particular variant site
`*_sample.QC.txt`	Quality control metrics for each sample
`*.png`	PNG image files generated by the ASCAT process
`*_ASCAT_objects.Rdata`	R objects from the ASCAT analysis containing ASCAT data, and quality control metrics
`*.segments_raw.extend.txt`	Raw segmented data, including start and end positions of the chromosomes and the number of probes in each segment
`*.ploidy.txt`	Estimated sample ploidy, as calculated by ASCAT
`*.ensgene_cnvbreak.txt`	Ensembl gene information annotationed with CNV breakpoints information

Pipeline Options Outputs

Naming Convention	Description
`*.gtc`	Genotype call files generated by the IAAP_CLI process
`iaap_cli.log`	Log file capturing the execution details of the IAAP_CLI process

CSV Input Sample Sheet

The required input header is: sampleID,lane,fastq_1,fastq_2. Samples can be provided either paired or un-paired.

The sampleID column is a unique identifies for each individual sample, which is associated with other samples based on status and patient ID.
The gender column contains gender information for the sample. Accepted values are 'XX', 'XY' or '' (unknown).
The idat_red and idat_green columns must contain absolute paths to the red and green IDAT files output from an Illumina array.

Basic examples:

An example of the csv file:

sampleID,gender,idat_red,idat_green
Sample_42,XY,206967180008_R01C01_Red.idat,206967180008_R01C01_Grn.idat
Sample_101,XY,206967180008_R02C02_Red.idat,206967180008_R02C02_Grn.idat
Sample_10191,,206967180180_R02C02_Red.idat,206967180180_R02C02_Grn.idat

Home

Quick Start for JAX Users

Troubleshooting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly