-
Notifications
You must be signed in to change notification settings - Fork 10
CNV Array Pipeline ReadMe
• Step 1: Takes in a CSV and parse the file to verify sampleID, idat_red, and idat_green fields, checks valid gender values.
• Step 2: IAAP CLI converts the IDAT files to gtc format.
• Step 3 : Takes in GTC files, processes them through a series of BCFtools commands to convert them into a sorted, normalized, and indexed VCF file.
• Step 4: Extracts BAF and LRR values from a BCF file, formats the values, and outputs them into separate files 'bcftools_convert.BAF' and 'bcftools_convert.LRR'.
• Step 5: This Module uses the ASCAT package to analyze BAF and LRR data for identifying CNVs.
• Step 6: This module annotates CNV segments with gene information and produces visualizations.
flowchart TB
p0((Sample))
p1[IAAP_CLI]
p2[BCFTOOLS_GTC2VCF]
p3[BCFTOOLS_QUERY_ASCAT]
p4[ASCAT]
p5[ASCAT_ANNOTATION]
o1([VCF with BAF/LRR]):::output
o2([BAF File]):::output
o3([LRR File]):::output
o4([Raw CNV segments]):::output
o5([Sample ploidy]):::output
o6([Additional ASCAT Output]):::output
o7([Genes Annotated with CNV]):::output
o8([Annotated CNV Segments]):::output
p0 -->|IDAT Files\nRed/Green| p1
subgraph " "
p1 --> p2
p2 --> o1
p2 --> p3
p3 --> o2
p3 --> o3
o2 --> p4
o3 --> p4
p4 --> o4
p4 --> o5
p4 --> o6
o4 --> p5
o5 --> p5
p5 --> o7
p5 --> o8
end
classDef output fill:#90aaff,stroke:#6c8eff,stroke-width:2px,color:#000000
-
--pubdir
- Default:
/<PATH>
- Comment: The directory that the saved outputs will be stored.
- Default:
-
-w
- Default:
/<PATH>
- Comment: The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
- Default:
-
--bpm
- Default:
/<PATH>
- Comment: The path to the BPM file.
- Default:
-
--egt
- Default:
/<PATH>
- Comment: The path to the EGT file.
- Default:
-
--gtc_csv
- Default:
/<PATH>
- Comment: The path to the GTC CSV file.
- Default:
-
--gtc_output
- Default:
/<PATH>
- Comment: The directory of GTC files output from the previous step.
- Default:
-
--ref_fa
- Default:
/<PATH>
- Comment: The path to the reference FASTA file.
- Default:
-
--BAF
- Default:
/<PATH>
- Comment: The BAF file output from the BCFTOOLS_QUERY_ASCAT module.
- Default:
-
--LRR
- Default:
/<PATH>
- Comment: The LRR file output from the BCFTOOLS_QUERY_ASCAT module.
- Default:
-
--segments_raw
- Default:
*segments_raw.txt
- Comment: The raw segments file.
- Default:
-
--ploidy
- Default:
*ploidy.txt
- Comment: The ploidy file.
- Default:
-
--chromosome_arm
- Default:
/<PATH>
- Comment: The path to the chromosome arm file.
- Default:
-
--cnv_gene_annotation
- Default:
/<PATH>
- Comment: The path to the CNV gene annotation file.
- Default:
NOTE: *
Represents a wild card that is a placeholder for values that will be filled by sample names/id's when the pipeline is run.
Naming Convention | Description |
---|---|
*_convert.vcf |
VCF file containing B allele frequency and LogR ratios for each SNP in the array |
*_convert_info.tsv |
The TSV file contains additional information extracted from the IDAT files, which include metadata and auxiliary information |
*_convert.BAF |
The BAF file is a measure which represent the reads that support the B allele at a particular variant site |
*_convert.LRR |
LRR file has the log ratio of observed read depth to the expected read depth at a particular variant site |
*_sample.QC.txt |
Quality control metrics for each sample |
*.png |
PNG image files generated by the ASCAT process |
*_ASCAT_objects.Rdata |
R objects from the ASCAT analysis containing ASCAT data, and quality control metrics |
*.segments_raw.extend.txt |
Raw segmented data, including start and end positions of the chromosomes and the number of probes in each segment |
*.ploidy.txt |
Estimated sample ploidy, as calculated by ASCAT |
*.ensgene_cnvbreak.txt |
Ensembl gene information annotationed with CNV breakpoints information |
Naming Convention | Description |
---|---|
*.gtc |
Genotype call files generated by the IAAP_CLI process |
iaap_cli.log |
Log file capturing the execution details of the IAAP_CLI process |
The required input header is: sampleID,lane,fastq_1,fastq_2
. Samples can be provided either paired or un-paired.
- The
sampleID
column is a unique identifies for each individual sample, which is associated with other samples based on status and patient ID. - The
gender
column contains gender information for the sample. Accepted values are 'XX', 'XY' or '' (unknown). - The
idat_red
andidat_green
columns must contain absolute paths to the red and green IDAT files output from an Illumina array.
sampleID,gender,idat_red,idat_green
Sample_42,XY,206967180008_R01C01_Red.idat,206967180008_R01C01_Grn.idat
Sample_101,XY,206967180008_R02C02_Red.idat,206967180008_R02C02_Grn.idat
Sample_10191,,206967180180_R02C02_Red.idat,206967180180_R02C02_Grn.idat