To download the pipeline and all associated files, you can run
git clone https://github.com/LUMC/PacBio-variantcalling.git
cd PacBio-variantcalling
git submodule update --init --recursive
Next, install the conda environment that is used to execute the pipeline, and activate the environment
conda env create --file environment.yml
conda activate PacBio-variantcalling
You can test the workflow using the following two commands. The first command runs the sanity checks to make sure everything required is installed. The second command will run the integration tests.
pytest --kwd tests --tag sanity
pytest --kwd tests --tag integration
To get a better sense of the pipeline and its inputs, you can manually run the most general test case
cromwell run \
--options tests/data/config/cromwell.options.json \
--inputs tests/data/config/variant_calling.json \
PacBio-variantcalling.wdl
This will run the pipeline for you using the
tests/data/config/variant_calling.json
example configuration file. After the
pipeline has completed, you can find the full execution folder in
cromwell-executions
. The workflow outputs have also
been copied to the test-output
folder in the current directory, as is
specified in the tests/data/config/cromwell.options.json
options file.
To generate an input configuration file for the PacBio pipeline, please run the following command.
womtool inputs --optional-inputs false PacBio-variantcalling.wdl
{
"VariantCalling.samples": "Array[WomCompositeType {\n name -> String\nbamfiles -> Array[File]+ \n}]+",
"VariantCalling.referenceFileDict": "File",
"VariantCalling.referenceFileIndex": "File",
"VariantCalling.referenceFile": "File",
"VariantCalling.referencePrefix": "String"
}
If you also want to see the optional pipeline inputs, you can leave out the
--optional-inputs false
argument.
Setting | Type | Required | Description |
---|---|---|---|
VariantCalling.samples | Array | Required | One or more sample structs. |
VariantCalling.referenceFileDict | File | Required | The picard dictionary file for the reference. |
VariantCalling.referenceFileIndex | File | Required | The samtools index file for the reference. |
VariantCalling.referenceFile | File | Required | The fasta reference file. |
VariantCalling.referencePrefix | String | Required | The name of the reference. |
VariantCalling.useDeepVariant | Boolean | Optional | Use DeepVariant instead of GATK4 for variant calling. |
VariantCalling.generateGVCF | Boolean | Optional | Generate g.vcf files for all sample. This is extremely slow when used in combination with VariantCalling.useDeepVariant . |
VariantCalling.targetGenes | File | Optional | Bed file containing the target genes. Used to determine the PGx phasing and Picard HsMetrics. |
VariantCalling.dbsnp | File | Optional | dbSNP file used to annotate the discovered variants. The results are displayed in the MultiQC report. |
VariantCalling.dbsnpIndex | File | Optional | Index for the dbSNP file, required when VariantCalling.dbsnp is specified. |
If you have create your own configuration file, you can use the following
command to make sure all inputs are valid. Replace
tests/data/config/variant_calling.json
with the path to your own
configuration file.
womtool validate --inputs tests/data/config/variant_calling.json PacBio-variantcalling.wdl
Success!