-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add clumpify-based dedup #970
Open
tomkinsc
wants to merge
56
commits into
master
Choose a base branch
from
ct-add-clumpify
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
56 commits
Select commit
Hold shift + click to select a range
a5d58b5
add bbmap.BBMapTool().dedup_clumpify()
tomkinsc 595764e
pass JVMmemory; add read_utils.rmdup_clumpify_bam; dedup_bam WDL task
tomkinsc 09901d3
switch from mvicuna to clumpify-based dedup in taxon_filter.py deplete
tomkinsc df208ea
replace unicode apostrophe
tomkinsc 98ac4fc
reduce clumpify max_mismatches 5->3
tomkinsc 784877a
dump dx-toolkit version and update URL to reflect new source
tomkinsc 232f9cd
dedup prior to metagenomics classification in WDL workflows
tomkinsc c01bb5b
add missing import
tomkinsc e25ef52
rename read_utils.wdl -> downsample.wdl, dedup.wdl
tomkinsc 6ba96d4
rename dedup_bam wdl workflow to "dedup"
tomkinsc e8a4081
increase dx instance size for dedup and memory spec
tomkinsc 8280063
correct argparse parser attachement for rmdup_clumpify_bam
tomkinsc b86b1c9
wrap WDL variable in dedup command block for var interpolation
tomkinsc 8afe18f
avoid collision
tomkinsc 7d2f45a
Merge branch 'master' into ct-add-clumpify
tomkinsc f48038e
Merge branch 'master' into ct-add-clumpify
tomkinsc c78f246
Merge branch 'master' into ct-add-clumpify
tomkinsc 72fb4cd
add sambamba since bbtools looks for it?
tomkinsc d97f773
Merge branch 'master' into ct-add-clumpify
tomkinsc a685a8a
remove sambamba
tomkinsc a2ce0f1
specify containment=t for bbmap clumpify
tomkinsc 6f26717
Merge branch 'master' into ct-add-clumpify
tomkinsc b73950e
Merge branch 'master' into ct-add-clumpify
tomkinsc a3010ea
Merge branch 'master' into ct-add-clumpify
tomkinsc 8199b12
enforce containment=False; more tolerant bbmap unit test
tomkinsc 6bb3f6b
update miniconda ssl certs
tomkinsc 97eff11
increase debug info emitted by build-conda.sh
tomkinsc f6f9b85
Merge branch 'master' into ct-add-clumpify
tomkinsc 1674c44
Merge branch 'master' into ct-add-clumpify
tomkinsc 4ca4693
Merge branch 'master' into ct-add-clumpify
tomkinsc 8f8aaae
bump bbmap to 38.71; set containment=True for clumpify
tomkinsc 218a12b
Merge branch 'ct-add-clumpify' of ssh://github.com/broadinstitute/vir…
tomkinsc fa5e01e
update stage number
tomkinsc a0735c7
set bbmap jvmMemDefault='2g'; 1g for clumpify test
tomkinsc 663deba
no longer skip demux_metag from validation/compilation
tomkinsc 5a7ed3b
demux_plus/demux_metag: merge linear parts of scatters, run spike-in …
tomkinsc 2be4a85
add DNAnexus defaults for demux_metag, set inputs in demux_metag
tomkinsc 1d691b2
rmdup_clumpify_bam: preserve sortorder value of input bam
tomkinsc 1ba7415
bump bbmap version 38.71 -> 38.73
tomkinsc bb589a1
fix bug in conda command quiet calling
tomkinsc 472703b
maintain RG info in clumpify dedup; move processing to bbmap.py
tomkinsc ca726d0
demux_plus/demux_metag: update dx defaults and pass explicitly in wor…
tomkinsc 3f9f188
remove redundant defaults from dx wdl test inputs
tomkinsc 995cf0d
move krakenuniq back outside scatter
tomkinsc d54eff3
respecify kaiju deps
tomkinsc 21a6ac4
WDL dedup_bam: report read count before & after dedup
tomkinsc 13f5172
switch to clumpify for downsample dedup
tomkinsc c1d18be
change to clumpify for pre-depletion dedup
tomkinsc 7c45da6
--JVMmemory=1g for TestDepleteHuman
tomkinsc d91eca5
remove rmdup from depletion call
tomkinsc 12f73cb
expand arguments exposed for clumpify dedup
tomkinsc 16a2b50
update expected depletion output now that we're not running dedup on it
tomkinsc f1f9a40
scatter/gather clumpify dedup across libraries
tomkinsc 49bffcb
Merge branch 'master' into ct-add-clumpify
tomkinsc 362d0f3
pass through single-end IDs for bbmap dedup
tomkinsc f816f6b
Merge branch 'master' into ct-add-clumpify
tomkinsc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
{ | ||
"demux_metag.spikein_db": | ||
"dx://file-FZY2v7Q0xf5VBy5FFY3z5fz7", | ||
|
||
"demux_metag.bwaDbs": [ | ||
"dx://file-F9k7Bx00Z3ybJjvY3ZVj7Z9P" | ||
], | ||
"demux_metag.blastDbs": [ | ||
"dx://file-F8B3B6Q09y3bZg3j1FqK7bJ9", | ||
"dx://file-F8BjgXj09y3gkfZGPPQZbZkK", | ||
"dx://file-F8B3Pp809y3jBpXq7xjxbq94", | ||
"dx://file-F8B3B6809y3kK1JP5X8Pg361" | ||
], | ||
|
||
"demux_metag.trim_clip_db": | ||
"dx://file-BXF0vYQ0QyBF509G9J12g927", | ||
|
||
"demux_metag.kraken.krakenuniq_db_tar_lz4": | ||
"dx://file-FVYQqP006zFF064QBGf022X1", | ||
"demux_metag.krona_taxonomy_db_tgz": | ||
"dx://file-F4z0fgj07FZ8jg8yP7yz0Qzb", | ||
|
||
"demux_metag.kaiju_db_lz4": | ||
"dx://file-FVYQyvQ06zF55bFGBGYJ2XxX", | ||
"demux_metag.ncbi_taxonomy_db_tgz": | ||
"dx://file-F8KgJK009y3Qgy3FF1791Vgq" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,16 @@ | ||
import "tasks_metagenomics.wdl" as metagenomics | ||
import "tasks_read_utils.wdl" as reads | ||
|
||
workflow classify_kaiju { | ||
call metagenomics.kaiju | ||
Array[File] unclassified_bams | ||
scatter(reads_bam in unclassified_bams) { | ||
call reads.dedup_bam as dedup { | ||
input: | ||
in_bam = reads_bam | ||
} | ||
} | ||
call metagenomics.kaiju { | ||
input: | ||
reads_unmapped_bam = dedup.dedup_bam | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
import "tasks_read_utils.wdl" as reads | ||
|
||
workflow dedup { | ||
call reads.dedup_bam | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,44 +1,82 @@ | ||
#DX_SKIP_WORKFLOW | ||
|
||
import "tasks_demux.wdl" as demux | ||
import "tasks_metagenomics.wdl" as metagenomics | ||
import "tasks_taxon_filter.wdl" as taxon_filter | ||
import "tasks_assembly.wdl" as assembly | ||
import "tasks_reports.wdl" as reports | ||
import "tasks_read_utils.wdl" as reads | ||
|
||
workflow demux_metag { | ||
call demux.illumina_demux as illumina_demux | ||
|
||
File spikein_db | ||
File trim_clip_db | ||
Array[File]? bmtaggerDbs # .tar.gz, .tgz, .tar.bz2, .tar.lz4, .fasta, or .fasta.gz | ||
Array[File]? blastDbs # .tar.gz, .tgz, .tar.bz2, .tar.lz4, .fasta, or .fasta.gz | ||
Array[File]? bwaDbs | ||
File krona_taxonomy_db_tgz | ||
File kaiju_db_lz4 | ||
File ncbi_taxonomy_db_tgz | ||
|
||
scatter(raw_reads in illumina_demux.raw_reads_unaligned_bams) { | ||
# de-duplicate raw reads | ||
call reads.dedup_bam as dedup { | ||
input: | ||
in_bam = raw_reads | ||
} | ||
|
||
# count spike-ins in the sample | ||
# NB: the spike-in report is created from raw reads | ||
# that have NOT been de-duplicated | ||
call reports.spikein_report as spikein { | ||
input: | ||
reads_bam = raw_reads | ||
reads_bam = raw_reads, | ||
spikein_db = spikein_db | ||
} | ||
|
||
# deplete human/host genomic reads | ||
call taxon_filter.deplete_taxa as deplete { | ||
input: | ||
raw_reads_unmapped_bam = raw_reads | ||
raw_reads_unmapped_bam = dedup.dedup_bam, | ||
bmtaggerDbs = bmtaggerDbs, | ||
blastDbs = blastDbs, | ||
bwaDbs = bwaDbs | ||
} | ||
|
||
# create de novo contigs from depleted reads via spaces | ||
call assembly.assemble as spades { | ||
input: | ||
assembler = "spades", | ||
reads_unmapped_bam = deplete.cleaned_bam | ||
reads_unmapped_bam = deplete.cleaned_bam, | ||
trim_clip_db = trim_clip_db, | ||
always_succeed = true | ||
} | ||
|
||
# classify de-duplicated reads to taxa via kaiju | ||
call metagenomics.kaiju as kaiju { | ||
input: | ||
reads_unmapped_bam = dedup.dedup_bam, | ||
krona_taxonomy_db_tgz = krona_taxonomy_db_tgz, | ||
kaiju_db_lz4 = kaiju_db_lz4, | ||
ncbi_taxonomy_db_tgz = ncbi_taxonomy_db_tgz | ||
} | ||
} | ||
|
||
# classify de-duplicated reads to taxa via krakenuniq | ||
call metagenomics.krakenuniq as kraken { | ||
input: | ||
reads_unmapped_bam = illumina_demux.raw_reads_unaligned_bams, | ||
} | ||
call reports.aggregate_metagenomics_reports as metag_summary_report { | ||
input: | ||
kraken_summary_reports = kraken.krakenuniq_summary_reports | ||
reads_unmapped_bam = dedup.dedup_bam, | ||
krona_taxonomy_db_tgz = krona_taxonomy_db_tgz | ||
} | ||
|
||
# summarize spike-in reports from all samples | ||
call reports.spikein_summary as spike_summary { | ||
input: | ||
spikein_count_txt = spikein.report | ||
} | ||
call metagenomics.kaiju as kaiju { | ||
input: | ||
reads_unmapped_bam = illumina_demux.raw_reads_unaligned_bams, | ||
# summarize kraken reports from all samples | ||
call reports.aggregate_metagenomics_reports as metag_summary_report { | ||
input: | ||
kraken_summary_reports = kraken.krakenuniq_summary_reports | ||
} | ||
# TODO: summarize kaiju reports from all samples | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think this is ready and want to try it out, shouldn't you remove #DX_SKIP_WORKFLOW?