Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL2: genotyping #1016

Merged
merged 121 commits into from
Mar 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
67220ce
add pileupcaller params to config and schema
TCLamnidis Jun 23, 2023
eb61d10
genotyping parameter requirement checks
TCLamnidis Jun 23, 2023
1bf3529
Add required modules
TCLamnidis Jun 30, 2023
05f1b13
Install modules
TCLamnidis Jun 30, 2023
1b82eaa
wip genotyping
TCLamnidis Jul 3, 2023
0e8b3c7
Install GATK UG modules
TCLamnidis Jul 14, 2023
d35ff76
Merge remote-tracking branch 'origin/dev' into dsl2-genotyping
TCLamnidis Jul 14, 2023
c0cd49c
started adding GATK_UG
TCLamnidis Jul 14, 2023
9e6b407
Update gatk3 modules
TCLamnidis Jul 21, 2023
ecc4257
Add genotyping SWF
TCLamnidis Jul 21, 2023
5ca5043
work on gatk ug
TCLamnidis Jul 21, 2023
89a7aab
Add gatk UG
TCLamnidis Jul 21, 2023
1ef7d1a
no intervals in ug call
TCLamnidis Jul 21, 2023
29a3e89
add version
TCLamnidis Jul 21, 2023
2d7b622
emit UG output
TCLamnidis Jul 21, 2023
0b4598e
tweak gatk UG outputs
TCLamnidis Jul 21, 2023
aab831f
rename emissions
TCLamnidis Jul 21, 2023
6e8d819
delete leftover debug print from map.nf
TCLamnidis Jul 21, 2023
8b4b79d
Add params for gatk and gatkUG
TCLamnidis Jul 28, 2023
8575592
WIP adding params to GATK UG
TCLamnidis Jul 28, 2023
60e7eec
reorder params
TCLamnidis Jul 28, 2023
2f9496b
add dbSNP placeholder to be able to test. parameters passes to gatk now
TCLamnidis Jul 28, 2023
020eba2
convert bcftools_stats to skip
TCLamnidis Aug 4, 2023
2fa9508
finish schema
TCLamnidis Aug 4, 2023
dd6ddc6
Merge branch 'dev' into dsl2-genotyping
TCLamnidis Aug 4, 2023
a299a26
Merge branch 'dsl2-genotyping' of github.com:nf-core/eager into dsl2-…
TCLamnidis Aug 4, 2023
b0da6bd
merge conflict
TCLamnidis Aug 4, 2023
16d567a
Update bcftools_stats
TCLamnidis Aug 11, 2023
97f1471
add bcftools stats to UG
TCLamnidis Aug 11, 2023
64d0df5
add todo comment
TCLamnidis Aug 11, 2023
c5bbb3b
Add config for bcftools stats
TCLamnidis Aug 11, 2023
2cf5eac
record manual tests
TCLamnidis Aug 11, 2023
947ee60
remove unnecessary bash block
TCLamnidis Aug 11, 2023
32fdd2c
Merge branch 'dev' into dsl2-genotyping
TCLamnidis Oct 27, 2023
fd16f48
attempt to add dbsnp to reference sheet
TCLamnidis Nov 3, 2023
a056508
pass dbsnp to genotyping
TCLamnidis Nov 3, 2023
f312446
Include ploidy into ref_meta
TCLamnidis Nov 24, 2023
05675d5
gatk UG done with dbsnp
TCLamnidis Nov 24, 2023
dcc3e47
fix indentation
TCLamnidis Nov 24, 2023
a91659e
update haplotypecaller module
TCLamnidis Nov 24, 2023
c88aaa3
port UG channels to HC
TCLamnidis Nov 24, 2023
4e88678
Add gatk HC params. Update some gatk UG param text
TCLamnidis Dec 8, 2023
a8dc2d4
add gatk HC params
TCLamnidis Dec 8, 2023
bf9cd87
add gatk HC. Add patterns to genotyping module publishDir
TCLamnidis Dec 8, 2023
edb5e58
add HC. fix indent. move bcftools
TCLamnidis Dec 8, 2023
49d531d
HC manual tests. update UG tests
TCLamnidis Dec 8, 2023
f86bdad
update TODOs
TCLamnidis Dec 8, 2023
d174435
update freebayes module
TCLamnidis Dec 13, 2023
297ea51
Add Freebayes
TCLamnidis Dec 13, 2023
8f1aa9d
manual tests
TCLamnidis Dec 13, 2023
68386e4
add pileupcaller aux files
TCLamnidis Jan 12, 2024
a963652
remove old dbsnp input. fix genotyping swf cardinality
TCLamnidis Jan 12, 2024
b070eea
add pileupcaller bed and snp files
TCLamnidis Jan 12, 2024
69dae31
Add pileupcaller. simplify input channels.
TCLamnidis Jan 12, 2024
6371e9f
add pileupcaller and samtools mpileup
TCLamnidis Jan 12, 2024
0f1d71b
no mpileup output. add pattern to pileupcaller
TCLamnidis Jan 26, 2024
c7042c0
deal with optional files.
TCLamnidis Jan 26, 2024
d4dc6f9
clearer formatting of Genotyping call
TCLamnidis Jan 26, 2024
599375c
add warning todo for inconsistent options
TCLamnidis Jan 26, 2024
4e0bb0d
manual tests for genotyping. add multiref per block
TCLamnidis Jan 26, 2024
89478b9
add small todos
TCLamnidis Jan 26, 2024
a6e8274
Merge remote-tracking branch 'origin/dev' into dsl2-genotyping
TCLamnidis Feb 2, 2024
cc94772
Merge remote-tracking branch 'origin/dev' into dsl2-genotyping
TCLamnidis Feb 2, 2024
e732f27
remove empty defaults
TCLamnidis Feb 2, 2024
be20f06
Update all modules
TCLamnidis Feb 2, 2024
5d4eaf6
fix linting warnings
TCLamnidis Feb 2, 2024
2e38e63
add collect_genotypes
TCLamnidis Feb 2, 2024
0c9f4ac
add genotype collection
TCLamnidis Feb 2, 2024
7d0fbc4
update manual tests
TCLamnidis Feb 2, 2024
1279891
linting
TCLamnidis Feb 2, 2024
687a2d0
oopsie bugfix
TCLamnidis Feb 2, 2024
5f6d466
add test for each genotyper.
TCLamnidis Feb 6, 2024
4a0366f
Add errors when pileupcaller is used without bed or snp file
TCLamnidis Feb 6, 2024
056e0f6
small tweaks
TCLamnidis Feb 6, 2024
80d28cb
small changes
TCLamnidis Feb 6, 2024
2cab201
reposition a line
TCLamnidis Feb 6, 2024
6e6ea1c
fix error condition
TCLamnidis Feb 6, 2024
bd063fa
fix error conditional
TCLamnidis Feb 6, 2024
73b7d0a
remove library ids from genotyping configs (libs merged)
TCLamnidis Feb 6, 2024
0b19335
fix file name collision in GATK RTC
TCLamnidis Feb 6, 2024
7a3ea2a
remove debug statements. add python version to version yml
TCLamnidis Feb 7, 2024
a3b16a2
add coverage stats. add mqc files to mqc channel
TCLamnidis Feb 7, 2024
97bdc0b
update manual tests
TCLamnidis Feb 7, 2024
d273610
Merge remote-tracking branch 'origin/dev' into dsl2-genotyping
TCLamnidis Feb 8, 2024
6d4ac28
Apply suggestions from code review to modules.conf
TCLamnidis Feb 23, 2024
156e132
remove commented lines, update comments
TCLamnidis Feb 23, 2024
0c2d9ee
Update parameter name for keeping realigned bam
TCLamnidis Feb 23, 2024
d74de23
rename parameter
TCLamnidis Feb 23, 2024
5046dfe
Apply suggestions from code review to schema wording
TCLamnidis Feb 23, 2024
ea27287
standardise mpileup helptext wording
TCLamnidis Feb 23, 2024
756c170
Update genotyping_pileupcaller_method helptext
TCLamnidis Feb 23, 2024
51400cd
Apply suggestions from code review to schema
TCLamnidis Feb 23, 2024
b21e0f7
Remove TODO about parameter validation
TCLamnidis Feb 23, 2024
784103a
Merge branch 'dsl2-genotyping' of github.com:nf-core/eager into dsl2-…
TCLamnidis Feb 23, 2024
9020037
Apply suggestions from code review to genotype swf
TCLamnidis Feb 23, 2024
61df8b9
remove todo about issue #1054
TCLamnidis Feb 23, 2024
3f95223
merge both ploidy parameters into one genotyping_reference_ploidy param
TCLamnidis Mar 1, 2024
bc1c924
Install BCFTOOLS_INDEX
TCLamnidis Mar 1, 2024
b9ef51f
update gatk_HC module
TCLamnidis Mar 1, 2024
24525b1
index VCF files
TCLamnidis Mar 1, 2024
8edd468
add warning about angsd
TCLamnidis Mar 8, 2024
f34045e
rename meta attribute id -> sample_id for consistency
TCLamnidis Mar 8, 2024
c461478
simplify output channels
TCLamnidis Mar 8, 2024
322ffa1
Update GATK_UG
TCLamnidis Mar 14, 2024
83808ed
remove dumps
TCLamnidis Mar 14, 2024
9b4a484
update manual_tests.md
TCLamnidis Mar 14, 2024
9b448f3
add genotyper to meta of genotypes
TCLamnidis Mar 15, 2024
1036da2
Merge branch 'dev' into dsl2-genotyping
TCLamnidis Mar 15, 2024
8df94c4
remove todos
TCLamnidis Mar 15, 2024
d20d2ce
Add output information on genotypers
TCLamnidis Mar 18, 2024
0e359a4
Clarify pileupcaller
TCLamnidis Mar 18, 2024
2fa3b73
add citations
TCLamnidis Mar 19, 2024
780ebaf
add bcftools citation
TCLamnidis Mar 19, 2024
7f9d6b7
Merge branch 'dev' into dsl2-genotyping
TCLamnidis Mar 19, 2024
309b8b8
update modules.json (remove dumpSV)
TCLamnidis Mar 19, 2024
b78c3f1
validate parameter combinations
TCLamnidis Mar 19, 2024
e8b27df
linting
TCLamnidis Mar 19, 2024
0b93d81
remove lib dependency
TCLamnidis Mar 19, 2024
4763601
typo
TCLamnidis Mar 19, 2024
ccd118c
minor edits and linting
TCLamnidis Mar 19, 2024
d39a10a
Merge branch 'dev' into dsl2-genotyping
TCLamnidis Mar 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,11 @@ jobs:
- "latest-everything"
PARAMS:
- "-profile test,docker --preprocessing_tool fastp --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/fastp/adapters.fasta'"
- "-profile test,docker --preprocessing_tool adapterremoval --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/adapterremoval/adapterremoval_adapterlist.txt' --sequencing_qc_tool falco"
- "-profile test,docker --mapping_tool bwamem --run_mapdamage_rescaling --run_pmd_filtering --run_trim_bam"
- "-profile test,docker --mapping_tool bowtie2 --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100"
- "-profile test,docker --preprocessing_tool adapterremoval --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/adapterremoval/adapterremoval_adapterlist.txt' --sequencing_qc_tool falco --run_genotyping --genotyping_tool 'freebayes' --genotyping_source 'raw'"
- "-profile test,docker --mapping_tool bwamem --run_mapdamage_rescaling --run_pmd_filtering --run_trim_bam --run_genotyping --genotyping_tool 'ug' --genotyping_source 'trimmed'"
- "-profile test,docker --mapping_tool bowtie2 --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100 --run_genotyping --genotyping_tool 'hc' --genotyping_source 'raw'"
- "-profile test,docker --skip_preprocessing"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz' --run_genotyping --genotyping_tool 'pileupcaller' --genotyping_source 'raw'"
- "-profile test_multiref,docker" ## TODO add damage manipulation here instead once it goes multiref
steps:
- name: Check out pipeline code
Expand Down
21 changes: 19 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,10 +100,27 @@

- [QualiMap](https://doi.org/10.1093/bioinformatics/btv566)

> QualiMap Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. Download: http://qualimap.bioinfo.cipf.es/
> QualiMap Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. doi: [10.1093/bioinformatics/btv566](https://doi.org/10.1093/bioinformatics/btv566).

- [DamageProfiler](https://doi.org/10.1093/bioinformatics/btab190)
> DamageProfiler Neukamm, J., Peltzer, A., & Nieselt, K. (2020). DamageProfiler: Fast damage pattern calculation for ancient DNA. In Bioinformatics (btab190). doi: [10.1093/bioinformatics/btab190](https://doi.org/10.1093/bioinformatics/btab190). Download: https://github.com/Integrative-Transcriptomics/DamageProfiler

> DamageProfiler Neukamm, J., Peltzer, A., & Nieselt, K. (2020). DamageProfiler: Fast damage pattern calculation for ancient DNA. In Bioinformatics (btab190). doi: [10.1093/bioinformatics/btab190](https://doi.org/10.1093/bioinformatics/btab190).

- [GATK 3.5](https://console.cloud.google.com/storage/browser/gatk)

> DePristo M, Banks E, Poplin R, Garimella K, Maguire J, Hartl C, Philippakis A, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell T, Kernytsky A, Sivachenko A, Cibulskis K, Gabriel S, Altshuler D, Daly M. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43(5), 491–498. doi: [10.1038/ng.806](https://doi.org/10.1038/ng.806).

- [GATK 4.X](https://github.com/broadinstitute/gatk/releases)

> Poplin R, Ruano-Rubio V, DePristo MA, Fennell TJ, Carneiro MO, Van der Auwera GA, Kling DE, Gauthier LD, Levy-Moonshine A, Roazen D, Shakir K, Thibault J, Chandran S, Whelan C, Lek M, Gabriel S, Daly MJ, Neale B, MacArthur DG, Banks E. (2017). Scaling accurate genetic variant discovery to tens of thousands of samples bioRxiv, 201178. doi: [10.1101/201178](https://doi.org/10.1101/201178).

- [FreeBayes](https://github.com/freebayes/freebayes)

> Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 \[q-bio.GN] 2012. doi: [10.48550/arXiv.1207.3907](https://doi.org/10.48550/arXiv.1207.3907).

- [BCFtools](https://github.com/samtools/bcftools)

> Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (2011) 27(21) 2987-93.doi: [10.1093/bioinformatics/btr509](https://doi.org/10.1093/bioinformatics/btr509).

## Software packaging/containerisation tools

Expand Down
102 changes: 102 additions & 0 deletions bin/collect_genotypes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#!/usr/bin/env python

# MIT License (c) Thiseas C. Lamnidis (@TCLamnidis)

import argparse
import filecmp

def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f):
pass
return i + 1

## A function to return the number of genotypes per line in a .geno file.
def file_width(fname):
with open(fname) as f:
for i in f:
return(len(i.strip()))
break

## A function to check that there are no duplicate individual IDs across ind files.
def check_for_duplicate_ids(indf1, indf2):
with open(indf1) as f:
inds1 = [x.strip().split()[0] for x in f.readlines()]
with open(indf2) as f:
inds2 = [x.strip().split()[0] for x in f.readlines()]
intersection = set(inds1).intersection(inds2)
if len(intersection) > 0:
raise IOError("Input .ind files contain duplicate individual IDs. Duplicates: {}".format(intersection))

## Function to check that the snp files are identical
def check_snp_files(snpf1, snpf2):
if not filecmp.cmp(snpf1, snpf2):
raise IOError("Input .snp files are not identical.")

## Function to check the consistency of an eigenstrat database
def validate_eigenstrat(genof, snpf, indf):
dimsGeno = [file_len(genof), file_width(genof)]
linesSnp = file_len(snpf)
linesInd = file_len(indf)

# print(dimsGeno,linesSnp,linesInd)
##Check geno and snp compatibility
if dimsGeno[0] != linesSnp:
raise IOError("Input .snp and .geno files do not match.")

##Check geno and ind compatibility
if dimsGeno[1] != linesInd:
raise IOError("Input .ind and .geno files do not match.")

VERSION = "1.0.0"

parser = argparse.ArgumentParser(usage="%(prog)s (-i <Input file prefix>) (-c <input ind file> | -R | -E) [-L <SAMPLE LIST> | -S Ind [-S Ind2]] [-o <OUTPUT FILE PREFIX>]" , description="A tool to check two different EingenStrat databses for shared individuals, and extract or remove individuals from an EigenStrat database.")
parser._optionals.title = "Available options"
parser.add_argument("-g1", "--genoFn1", type = str, metavar = "<GENO FILE 1 NAME>", required = True, help = "The path to the input geno file of the first dataset.")
parser.add_argument("-s1", "--snpFn1", type = str, metavar = "<SNP FILE 1 NAME>", required = True, help = "The path to the input snp file of the first dataset.")
parser.add_argument("-i1", "--indFn1", type = str, metavar = "<IND FILE 1 NAME>", required = True, help = "The path to the input ind file of the first dataset.")
parser.add_argument("-g2", "--genoFn2", type = str, metavar = "<GENO FILE 2 NAME>", required = True, help = "The path to the input geno file of the second dataset.")
parser.add_argument("-s2", "--snpFn2", type = str, metavar = "<SNP FILE 2 NAME>", required = True, help = "The path to the input snp file of the second dataset.")
parser.add_argument("-i2", "--indFn2", type = str, metavar = "<IND FILE 2 NAME>", required = True, help = "The path to the input ind file of the second dataset.")
parser.add_argument("-o", "--output", type = str, metavar = "<OUTPUT FILES PREFIX>", required = True, help = "The desired output file prefix. Three output files are created, <OUTPUT FILES PREFIX>.geno , <OUTPUT FILES PREFIX>.snp and <OUTPUT FILES PREFIX>.ind .")
parser.add_argument("-v", "--version", action='version', version="{}".format(VERSION), help="Print the version and exit.")
args = parser.parse_args()

## Open input files
GenoFile1 = open(args.genoFn1, "r")
SnpFile1 = open(args.snpFn1, "r")
IndFile1 = open(args.indFn1, "r")

GenoFile2 = open(args.genoFn2, "r")
# SnpFile2 = open(args.snpFn2, "r") ## Never actually read in line by line
IndFile2 = open(args.indFn2, "r")

## open output files
GenoFileOut = open(args.output + ".geno", "w")
SnpFileOut = open(args.output + ".snp", "w")
IndFileOut = open(args.output + ".ind", "w")

## Perform basic validation on inputs
validate_eigenstrat(args.genoFn1, args.snpFn1, args.indFn1)
validate_eigenstrat(args.genoFn2, args.snpFn2, args.indFn2)
check_for_duplicate_ids(args.indFn1, args.indFn2)
check_snp_files(args.snpFn1, args.snpFn2)

## Now actually merge the data
## Geno
for line1, line2 in zip(GenoFile1, GenoFile2):
geno_line="{}{}".format(line1.strip(),line2.strip())
print(geno_line, file=GenoFileOut)

## Snp
## Copying the file would be faster, but this way we do not rely on the os or external packages.
## We already checked that the snp files are byte-identical, so we can just copy one of them.
for line in SnpFile1:
print(line.strip(), file=SnpFileOut)

## Ind
## The indfiles are simply concatenated in the same order as the geno file.
for line in IndFile1:
print(line.strip(), file=IndFileOut)
for line in IndFile2:
print(line.strip(), file=IndFileOut)
168 changes: 168 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -961,4 +961,172 @@ process {
pattern: '*.flagstat'
]
}

//
// GENOTYPING
//

withName: SAMTOOLS_MPILEUP_PILEUPCALLER {
tag = { "${meta.reference}|${meta.strandedness}" }
ext.args = [
"-B",
"-q ${params.genotyping_pileupcaller_min_base_quality}",
"-Q ${params.genotyping_pileupcaller_min_map_quality}",
].join(' ').trim()
ext.prefix = { "${meta.strandedness}_${meta.reference}" }
TCLamnidis marked this conversation as resolved.
Show resolved Hide resolved
publishDir = [
enabled: false
]
}

withName: SEQUENCETOOLS_PILEUPCALLER {
tag = { "${meta.reference}|${meta.strandedness}" }
ext.args = {[
"--${params.genotyping_pileupcaller_method}",
params.genotyping_pileupcaller_transitions_mode == "SkipTransitions" ? "--skipTransitions" : params.genotyping_pileupcaller_transitions_mode == "TransitionsMissing" ? "--transitionsMissing" : "",
"${meta.strandedness}" == 'single' ? "--singleStrandMode" : "" ,
"--sampleNames", meta.sample_id.join(","),
"-e pileupcaller.${meta.strandedness}.${meta.reference}"
].join(' ').trim() }
ext.prefix = { "${meta.strandedness}_${meta.reference}" }
publishDir = [
enabled: false // Not published because the output goes through COLLECT_GENOTYPES
]
}

withName: COLLECT_GENOTYPES {
tag = { "${meta.reference}" }
ext.prefix = { "pileupcaller_genotypes_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.{geno,snp,ind}'
]
}

withName: EIGENSTRATDATABASETOOLS_EIGENSTRATSNPCOVERAGE {
tag = { "${meta.reference}" }
ext.args = { "-j ${prefix}.json" }
ext.prefix = { "pileupcaller_genotypes_${meta.reference}_coverage" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.{tsv}'
]
}

withName: GATK_REALIGNERTARGETCREATOR {
tag = { "${meta.reference}|${meta.sample_id}" }
ext.args = [
params.genotyping_gatk_ug_defaultbasequalities > 0 ? "--defaultBaseQualities ${params.genotyping_gatk_ug_defaultbasequalities}" : "", // Empty string since GATK complains if its default of -1 is provided.
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.reference}_realigntarget" }
publishDir = [
enabled: false
]
}

withName: GATK_INDELREALIGNER {
tag = { "${meta.reference}|${meta.sample_id}" }
ext.args = [
params.genotyping_gatk_ug_defaultbasequalities > 0 ? "--defaultBaseQualities ${params.genotyping_gatk_ug_defaultbasequalities}" : "", // Empty string since GATK complains if its default of -1 is provided.
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.reference}_realigned" }
publishDir = [
path: { "${params.outdir}/genotyping/IndelRealigner" },
mode: params.publish_dir_mode,
enabled: params.genotyping_gatk_ug_keeprealignbam,
pattern: '*.{bam,bai}'
]
}

withName: GATK_UNIFIEDGENOTYPER {
tag = { "${meta.reference}|${meta.sample_id}" }
ext.args = {[
"--sample_ploidy ${meta2.ploidy}",
"-stand_call_conf ${params.genotyping_gatk_call_conf}",
"-dcov ${params.genotyping_gatk_ug_downsample}",
"--output_mode ${params.genotyping_gatk_ug_out_mode}",
"--genotype_likelihoods_model ${params.genotyping_gatk_ug_genotype_mode}",
params.genotyping_gatk_ug_defaultbasequalities > 0 ? "--defaultBaseQualities ${params.genotyping_gatk_ug_defaultbasequalities}" : "", // Empty string since GATK complains if its default of -1 is provided.
].join(' ').trim() }
ext.prefix = { "${meta.sample_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.vcf.gz'
]
}

withName: BCFTOOLS_INDEX_UG {
tag = { "${meta.reference}|${meta.sample_id}" }
ext.args = "--tbi" //tbi indices for consistency with GATK HC
ext.prefix = { "${meta.sample_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.vcf.gz.tbi'
]
}

withName: GATK4_HAPLOTYPECALLER {
tag = { "${meta.reference}|${meta.sample_id}" }
ext.args = {[
// Option names have changed from underscore_separated to hyphen-separated in GATK4
"--sample-ploidy ${meta2.ploidy}",
"-stand-call-conf ${params.genotyping_gatk_call_conf}",
"--output-mode ${params.genotyping_gatk_hc_out_mode}",
"--emit-ref-confidence ${params.genotyping_gatk_hc_emitrefconf}",
].join(' ').trim() }
ext.prefix = { "${meta.sample_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.{vcf.gz,vcf.gz.tbi}'
]
}

withName: FREEBAYES {
tag = { "${meta.reference}|${meta.sample_id}" }
ext.args = {[
"-p ${ref_meta.ploidy}",
"-C ${params.genotyping_freebayes_min_alternate_count}",
params.genotyping_freebayes_skip_coverage != 0 ? "-g ${params.genotyping_freebayes_skip_coverage}" : "",
].join(' ').trim() }
ext.prefix = { "${meta.sample_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.vcf.gz'
]
}

withName: BCFTOOLS_INDEX_FREEBAYES {
tag = { "${meta.reference}|${meta.sample_id}" }
ext.args = "--tbi" //tbi indices for consistency with GATK HC
ext.prefix = { "${meta.sample_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.vcf.gz.tbi'
]
}

withName: BCFTOOLS_STATS_GENOTYPING {
tag = { "${meta.reference}|${meta.sample_id}" }
ext.prefix = { "${meta.sample_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.txt'
]
}
}
4 changes: 2 additions & 2 deletions conf/test_humanbam.config
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ params {
// //Sex Determination
// sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
// // Genotyping
// pileupcaller_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
// pileupcaller_snpfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K_covered_in_JK2067_downsampled_s0.1.numeric_chromosomes.snp'
genotyping_pileupcaller_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
genotyping_pileupcaller_snpfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K_covered_in_JK2067_downsampled_s0.1.numeric_chromosomes.snp'


// BAM filtering
Expand Down
Loading
Loading