Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cram chunking for PacBio and Nanopore #130

Merged
merged 39 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
52266c9
Merge pull request #109 from sanger-tol/dev
tkchafin Aug 22, 2024
0985352
apply chunking and parallelisation for align_pacbio and align_ont
reichan1998 Sep 24, 2024
2d108e9
fix cannot allocate resource samtools_addreplcerg
reichan1998 Sep 24, 2024
65145ce
patch 1.3.1
tkchafin Sep 24, 2024
bfd2ceb
Merge pull request #127 from tkchafin/patch
tkchafin Sep 24, 2024
35b26b7
pass interleaved fastq after cram conversion
tkchafin Sep 24, 2024
414ac2a
Update align_short.nf
tkchafin Sep 24, 2024
0c96567
Update CHANGELOG.md
tkchafin Sep 24, 2024
5b4f685
Update nextflow.config
tkchafin Sep 24, 2024
e4d4398
Update LICENSE
tkchafin Sep 24, 2024
86a3862
Update download_pipeline.yml
tkchafin Sep 24, 2024
d882db6
Update linting.yml
tkchafin Sep 24, 2024
e846b1d
Update download_pipeline.yml
tkchafin Sep 24, 2024
d8b32e9
Delete .github/workflows/download_pipeline.yml
tkchafin Sep 24, 2024
1db66a4
BBtools citation
tkchafin Sep 24, 2024
1ab5635
Merge pull request #129 from sanger-tol/tkchafin-patch-1
tkchafin Sep 24, 2024
830a861
replace seqtk/subseq by bbmap/filterbyread to fix filtering step for …
reichan1998 Sep 25, 2024
c687234
Update CHANGELOG.md
tkchafin Sep 25, 2024
168dbfc
Update conf/base.config
tkchafin Sep 25, 2024
2f85112
Update CITATIONS.md
tkchafin Sep 25, 2024
485bf9d
prettier linting
tkchafin Sep 25, 2024
da8899c
Merge pull request #131 from tkchafin/patch
tkchafin Sep 25, 2024
ab98d60
Update CHANGELOG.md
tkchafin Sep 25, 2024
54a11d2
prettier linting
tkchafin Sep 25, 2024
4796c8c
Merge pull request #132 from tkchafin/patch
tkchafin Sep 25, 2024
4751ca5
Merge pull request #128 from sanger-tol/patch
tkchafin Sep 25, 2024
2de5ac4
fix editorconfig
reichan1998 Sep 26, 2024
9942a92
update patch 1.3.1
reichan1998 Sep 26, 2024
9ab1daf
fix EC
reichan1998 Sep 26, 2024
1e76d93
fix accidental commit
reichan1998 Sep 26, 2024
1873e42
Merge branch 'dev' into main
tkchafin Oct 1, 2024
c39bb79
Merge pull request #8 from tkchafin/main
tkchafin Oct 1, 2024
b089fa8
Merge branch 'cram_handling' into cram_handling
tkchafin Oct 1, 2024
5a83851
Change default to scale cpus
tkchafin Oct 1, 2024
9df8391
Update LICENSE
tkchafin Oct 1, 2024
5e6f2d0
remove the warning and the config entries for the older aligner calls
reichan1998 Oct 1, 2024
815289d
Merge branch 'cram_handling' of https://github.com/reichan1998/readma…
reichan1998 Oct 1, 2024
83dd8cd
add chunking before filtering for PacBio
reichan1998 Oct 15, 2024
3cd2c01
Revert "add chunking before filtering for PacBio"
reichan1998 Oct 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 0 additions & 88 deletions .github/workflows/download_pipeline.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- uses: actions/setup-node@v3

- name: Install editorconfig-checker
run: npm install -g editorconfig-checker
run: npm install -g editorconfig-checker@3.0.2

- name: Run ECLint check
run: editorconfig-checker -exclude README.md $(find .* -type f | grep -v '.git\|.py\|.md\|json\|yml\|yaml\|html\|css\|work\|.nextflow\|build\|nf_core.egg-info\|log.txt\|Makefile')
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[1.3.1](https://github.com/sanger-tol/readmapping/releases/tag/1.3.0)] - Antipodean Opaleye (patch 1) - [2024-09-24]

### Enhancements & fixes

- Fixed bug in handling CRAM HiC inputs introduced in 1.1.0
- Fixed bug in handling PacBio FASTQ inputs introduced in 1.3.0

| Dependency | Old version | New version |
| ---------- | ----------- | ----------- |
| `bbtools` | | 39.01 |
| `seqtk` | 1.4 | |

## [[1.3.0](https://github.com/sanger-tol/readmapping/releases/tag/1.3.0)] - Antipodean Opaleye - [2024-08-23]

### Enhancements & fixes
Expand Down
12 changes: 6 additions & 6 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@

## Pipeline tools

- [BBTools](http://sourceforge.net/projects/bbmap/)

> Bushnell B. BBTools software package. 2014. http://sourceforge.net/projects/bbmap/

- [Blast](https://pubmed.ncbi.nlm.nih.gov/20003500/)

> Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421. PMID: 20003500; PMCID: PMC2803857.
Expand All @@ -18,7 +22,7 @@

> Vasimuddin Md, Misra S, Li H, Aluru S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. 2019 IEEE International Parallel and Distributed Processing Symposium. 2019 May;314–24. doi: 10.1109/IPDPS.2019.00041.

- [CRUMBLE]
- [CRUMBLE](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330002/)

> Bonfield JK, McCarthy SA, Durbin R. Crumble: reference free lossy compression of sequence quality values. Bioinformatics. 2019 Jan;35(2):337-339. doi: 10.1093/bioinformatics/bty608. PubMed PMID: 29992288; PMCID: PMC6330002.

Expand All @@ -30,14 +34,10 @@

> Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PMID: 33590861; PMCID: PMC7931819.

- [SeqKit]
- [SeqKit](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5051824/)

> Shen W, Le S, Li Y, Hu F. SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One. 2016 Oct 5;11(10):e0163962. doi: 10.1371/journal.pone.0163962. PubMed PMID: 27706213; PMCID: PMC5051824.

- [Seqtk]

> Li H. Toolkit for processing sequences in FASTA/Q formats. GitHub Repository. 2012. https://github.com/lh3/seqtk. Accessed August 2024.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) @priyanka-surana
Copyright (c) 2022-2024 Genome Research Ltd.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
16 changes: 16 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ process {
memory = { check_max( ((meta.datatype == "pacbio_clr" || meta.datatype == "ont") ? 2.GB : 1.GB) * task.attempt, 'memory' ) }
}

// minimum 1GB memory
withName: 'BBMAP_FILTERBYNAME' {
memory = { check_max( 1.GB * task.attempt, 'memory' ) }
}

withName: 'SAMTOOLS_COLLATETOFASTA' {
cpus = { log_increase_cpus(4, 2*task.attempt, 1, 2) }
memory = { check_max( 1.GB * Math.ceil( meta.read_count / 1000000 ) * task.attempt, 'memory' ) }
Expand All @@ -58,6 +63,12 @@ process {
time = { check_max( 2.h * Math.ceil( meta.read_count / 100000000 ) * task.attempt / log_increase_cpus(2, 6*task.attempt, 1, 2), 'time' ) }
}

withName: SAMTOOLS_ADDREPLACERG {
cpus = { log_increase_cpus(2, 6*task.attempt, 1, 2) }
memory = { check_max( 4.GB + 850.MB * log_increase_cpus(2, 6*task.attempt, 1, 2) * task.attempt + 0.6.GB * Math.ceil( meta.read_count / 100000000 ), 'memory' ) }
time = { check_max( 2.h * Math.ceil( meta.read_count / 100000000 ) * task.attempt / log_increase_cpus(2, 6*task.attempt, 1, 2), 'time' ) }
}

withName: BLAST_BLASTN {
time = { check_max( 2.hour * Math.ceil( meta.read_count / 1000000 ) * task.attempt, 'time' ) }
memory = { check_max( 100.MB + 20.MB * Math.ceil( meta.read_count / 1000000 ) * task.attempt, 'memory' ) }
Expand Down Expand Up @@ -109,6 +120,11 @@ process {
memory = { check_max( 1.GB * Math.ceil( 30 * fasta.size() / 1e+9 ) * task.attempt, 'memory' ) }
}

withName: GENERATE_CRAM_CSV {
cpus = { check_max( 4 * task.attempt, 'cpus' ) }
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
}

withName: CRUMBLE {
// No correlation between memory usage and the number of reads or the genome size.
// Most genomes seem happy with 1 GB, then some with 2 GB, then some with 5 GB.
Expand Down
28 changes: 28 additions & 0 deletions conf/modules.config
tkchafin marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ process {
ext.args = '-F 0x200 -nt'
}

withName: BBMAP_FILTERBYNAME {
ext.args = 'include=f'
}

withName: SAMTOOLS_MERGE {
beforeScript = { "export REF_PATH=spoof"}
ext.args = { "-c -p" }
Expand Down Expand Up @@ -107,6 +111,30 @@ process {
ext.args = { "-ax map-ont -R ${meta.read_group} -I" + Math.ceil(meta2.genome_size/1e9) + 'G' }
}

withName: ".*:ALIGN_HIFI:.*:CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT" {
ext.args = ""
ext.args1 = { "-F 0x200 -nt" }
ext.args2 = { "-ax map-hifi --cs=short -I" + Math.ceil(meta.genome_size/1e9) + 'G' }
ext.args3 = "-mpu"
ext.args4 = { "--write-index -l1" }
}

withName: ".*:ALIGN_CLR:.*:CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT" {
ext.args = ""
ext.args1 = { "-F 0x200 -nt" }
ext.args2 = { "-ax map-pb -I" + Math.ceil(meta.genome_size/1e9) + 'G' }
ext.args3 = "-mpu"
ext.args4 = { "--write-index -l1" }
}

withName: ".*:ALIGN_ONT:.*:CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT" {
ext.args = ""
ext.args1 = { "-F 0x200 -nt" }
ext.args2 = { "-ax map-ont -I" + Math.ceil(meta.genome_size/1e9) + 'G' }
ext.args3 = "-mpu"
ext.args4 = { "--write-index -l1" }
}

withName: '.*:CONVERT_STATS:SAMTOOLS_CRAM' {
beforeScript = { "export REF_PATH=spoof"}
ext.prefix = { "${fasta.baseName}.${meta.datatype}.${meta.id}" }
Expand Down
10 changes: 5 additions & 5 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"bbmap/filterbyname": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"blast/blastn": {
"branch": "master",
"git_sha": "583edaf97c9373a20df05a3b7be5a6677f9cd719",
Expand Down Expand Up @@ -91,11 +96,6 @@
"git_sha": "03fbf6c89e551bd8d77f3b751fb5c955f75b34c5",
"installed_by": ["modules"]
},
"seqtk/subseq": {
"branch": "master",
"git_sha": "730f3aee80d5f8d0b5fc532202ac59361414d006",
"installed_by": ["modules"]
},
"untar": {
"branch": "master",
"git_sha": "4e5f4687318f24ba944a13609d3ea6ebd890737d",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/bbmap/filterbyname/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

71 changes: 71 additions & 0 deletions modules/nf-core/bbmap/filterbyname/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading