Skip to content

Ancestry Pipeline ReadMe

MikeWLloyd edited this page Jun 5, 2024 · 3 revisions

Genetic Ancestry Estimation Documentation

Ancestry Pipeline (--workflow ancestry)

•	Step 1: BAM Index  
•	Step 2: SNP Region Pileup  
•	Step 3: SNP Calling  
•	Step 4: SNP Filtering  
•	Step 5: SNP Annotation  
•	Step 6: VCF to Eigenstrat format  
•	Step 7: SNPweights Infer Ancestry    
flowchart TD
	p1((Sample\nAlignment File))
	p2[SAMTOOLS_INDEX]
	p3[BCFTOOLS_MPILEUP]
	p4[BCFTOOLS_CALL]
	p5[BCFTOOLS_FILTER]
    p6[BCFTOOLS_ANNOTATE]
    p7[VCF2EIGENSTRAT]
    p8[SNPWEIGHTS_INFERANC]
    
    o1([Genetic Ancestry Estimation]):::output
 
    p1 --> p2
    p2 --> p3
    p3 --> p4
    p4 --> p5
    p5 --> p6
    p6 --> p7
    p7 --> p8
    p8 --> o1

classDef output fill:#90aaff,stroke:#6c8eff,stroke-width:2px,color:#000000
Loading

Parameters for Ancestry Pipeline

  • --pubdir

    • Default: /<PATH>
    • Comment: The directory that the saved outputs will be stored.
  • --organize_by

    • Default: sample
    • Comment: How to organize the output folder structure. Options: sample or analysis.
  • -w

    • Default: /<PATH>
    • Comment: The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
  • --sample_folder

    • Default: /<PATH>
    • Comment: The path to the folder that contains all the samples to be run by the pipeline. The files in this path can also be symbolic links.
  • --csv_input

    • Default: null
    • Comment: Provide a CSV manifest file with the header: "sampleID,bam". See below for an example file.
    • --download_data can be specified.
  • --gen_org

    • Default: human
    • Comment: Options: human.
  • --ref_fa

    • Default: '/projects/omics_share/human/GRCh38/genome/sequence/gatk/Homo_sapiens_assembly38.fasta'
    • Comment: The reference fasta to be used throughout the process for alignment as well as any downstream analysis. JAX users should not change this parameter.
  • --genotype_targets

    • Default: '/projects/compsci/omics_share/human/GRCh38/supporting_files/ancestry_panel/snp_panel_v2_targets_annotations.snpwt.bed.gz'
    • Comment: Target SNP bed file for the ancestry panel. Can contain annotation information.
  • --snpID_list

    • Default: '/projects/compsci/omics_share/human/GRCh38/supporting_files/ancestry_panel/snp_panel_v2.list'
    • Comment: Target SNPs in list used in BCFtools filtering step.
  • --snp_annotations

    • Default: '/projects/compsci/omics_share/human/GRCh38/supporting_files/ancestry_panel/snp_panel_v2_targets_annotations.snpwt.bed.gz'
    • Comment: Target SNP bed file with annotations for the ancestry panel.
  • --snpweights_panel

    • Default: '/projects/compsci/omics_share/human/GRCh38/supporting_files/ancestry_panel/ancestry_panel_v2.snpwt'
    • Comment: SNP weights panel in the appropriate format.

Pipeline Default Outputs

NOTE: * Represents a wild card that is a placeholder for values that will be filled by input file names and/or parameters when the pipeline is run.

Naming Convention Description
ancestry_report.html Nextflow autogenerated report
trace.txt Nextflow trace of processes
*.ancestry.tsv Genetic ancestry report. See https://www.biorxiv.org/content/10.1101/2022.10.24.513591v1 for details on report and methods

CSV Input Sample Sheet

The required input header is: sampleID,lane,fastq_1,fastq_2. Samples can be provided either paired or un-paired.

  • The sampleID column is a unique identifies for each individual sample, which is associated with other samples based on status and patient ID.
  • The lane column contains lane information for individual samples. If a single sample ID is provided with multiple lanes, the sequences from each lane will be concatenated prior to analysis.
  • The fastq_1 and fastq_2 columns must contain absolute paths or URLs to read 1 and read 2 from an Illumina paired-end sequencing run.

Basic examples:

An example csv file:

sampleID,bam
Sample_42,/path/to/sample_42_dedup_realigned.bam
Sample_101,Lane_1,/path/to/sample_101_dedup_realigned.bam
Sample_10191,Lane_1,/path/to/sample_10191_dedup_realigned.bam
Clone this wiki locally