Skip to content

ChrisMaherLab/PACT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PACT: A pipeline for analysis of circulating tumor DNA

Developed by the Christopher Maher Lab at Washington University in St. Louis in collaboration with the labs of Dr. Aadel Chaudhuri and Dr. Russell Pachynski.

Overview

Standardized workflows for sensitive and reproducible detection of both small and large genomic alterations using targeted ctDNA sequencing, shared in a Common Workflow Language (CWL) pipeline.

For additional details and benchmarking, see: Jace Webster, Ha X Dang, Pradeep S Chauhan, Wenjia Feng, Alex Shiang, Peter K Harris, Russell K Pachynski, Aadel A Chaudhuri, Christopher A Maher. 2023. PACT: A pipeline for analysis of circulating tumor DNA. Bioinformatics. 39(8). doi:10.1093/bioinformatics/btad489

Quick Start

Download the repository with git clone https://github.com/ChrisMaherLab/PACT.git

A number of tools exist for running CWL pipelines. In our benchmarking analysis, all pipelines were run using the Cromwell CWL interpreter (v54), which can be downloaded here. For additional information about using Cromwell, we suggest their user guide and their configuration tutorials.

As PACT is designed for use in high performance computing environments (HPCs) and HPCs can be highly customizable and variable between different institutions, a comprehensive guide on how to configure different CWL interpreters for specific HPCs is not possible here. We highly recommend reviewing the above documentation (or the documentation for your preferred CWL interpreter) to ensure correct integration with your HPC.

After installation and configuration of Cromwell (if that is your preferred interpreter), the pipeline(s) can be run using:

java -Dconfig.file=<config.file> -jar <cromwell.jar> run -t cwl -i <input_yaml> pipelines/<pipeline>.cwl

For additional information about writing, reading and using CWL files, see the official CWL user guide.

To help ensure proper installation and setup, example files (sample bam, matched control bam, healthy bam, targeted regions bed, blacklist bed, low complexity regions bed) are located in the example_data folder. Note that due to file size, git lfs may be required for download. In order to run these files with the SV pipeline, the hg19 reference genome and annotation is also needed (see instructions at the bottom of this page for installation). If run correctly, the output from the SV pipeline should be consistent with the output file at example_data/example.out.bedpe which describes a single translocation between chromosomes 10 and 13. The example_ymls/sv_example.yml can be used to run this analysis, but filepaths in the yml will need to be updated to reflect your PACT installation and the locations of your genome reference and annotation data.

Structure

This repository is organized as follows:

Directory Description
pipelines Full workflows, which rely on subworkflows and tools
subworkflows Workflows called by pipelines that combine tools to form intermediate files
tools Individual steps in the workflow containing single commands or scripts
example_ymls Example format for input yml files using minimal inputs
example_data Example input and output data for setup and testing purposes

Inputs

The provided workflows accept a variety of optional and/or required input files. Example input yaml files have been provided in the example_ymls directory, which contain all required inputs and a brief description of expected values. Additional inputs are available for additional customization of the pipeline(s), and can be seen in the inputs section of the corresponding CWL file in the pipelines directory.

Common/required inputs are described below, including how to label the information in an input yaml file, the workflows the file is used in, and a brief description.

Reference Genome Inputs
Input label Applicable workflow(s) Description
reference All workflows (required) Reference genome fasta file. A .fai index file made using samtools faidx and a .dict file made using Picard's CreateSequenceDictionary command should be present in the directory.
ref_genome SV and CNA workflows (required) Name of reference genome used. Should match the name used by any applicable annotation databases (eg. hg19)
ref_flat CNA workflow (required) Genome annotation file in refFlat format
Annotation Information
Input label Applicable workflow(s) Description
snpEff_data SV workflow (required) snpEff annotation database directory. This can be downloaded using snpEff's download command: java -jar snpEff.jar download <database>.
vep_cache_dir SNV workflow (required) vep annotation cache information. See the ensembl website (https://useast.ensembl.org/info/docs/tools/vep/script/vep_cache.html) for information about downloading the cache.
vep_ensembl_assembly SNV workflow (required) A string containing the name of the genome assembly associated with the provided vep cache (eg GRCh37)
vep_ensembl_version SNV workflow (required) A string containing the version number of the provided cache (eg 106)
all_genes CNA workflow (required) Bed file of all annotated genes. First three columns are standard bed format, 4th column has gene name, 5th column has score value (arbitrary number, is not used), 6th column has +/- strand. No headed is expcted.
Region and Variant Information
Input label Applicable workflow(s) Description
target_regions All workflows (required) A bed file containing the genomic regions covered by the targeted panel used for sequencing
neither_region SV workflow (required) A bed file. All SVs that contain a breakpoint within these regions will be discarded. We recommend the blacklist regions provided by 10xgenomics. Their hg19 bed file can be found here: http://cf.10xgenomics.com/supp/genome/hg19/sv_blacklist.bed.
notboth_region SV workflow (required) A bed file. SVs with >1 breakpoint within these regions will be discarded. We recommend Heng Li's low complexity regions, found here: https://github.com/lh3/varcmp/raw/master/scripts
sv_whitelist SV workflow (optional) A bed file. Contains regions that include expected SV breakpoint sites. This will reduce the read support requirement for SVs from these regions, which will allow the user to manually review variants of interest.
whitelist_vcf SNV workflow (required) VCF and accompanying .tbi file (using the tabix -p) command. VCF represents any whitelisted SNVs/Indels. VCF file may be empty (but still properly formatted) if desired
target_genes CNA workflow (required) Bed file describing all genes targeted by the target panel. First three columns are standard bed format, 4th column is gene name, 5th column is description. Copy number control genes should be labeled as 'CN-control' in the description, all others can use any desired description
Samples and Controls
Input label Applicable workflow(s) Description
sample_bams All workflows (required) An array of bam files that contain reads generated from targeted sequencing of cfDNA. Arrays can be provided in the input .yaml file as described by the (CWL user guide) or as shown in our example input .yamls
matched_control_bams All workflows (required) An array of matched control bam files. The order of the array should be the same order as the sample_bams array (eg the nth entry in both arrays should correspond to the nth patient)
panel_of_normal_bams All workflows (required) An array of bam files containing reads from healthy, normal samples sequenced using the same targeted panel used on the samples/matched controls. If such a panel is unavailable, this panel can instead be composed of any available matched control samples.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •