Dsl2 add sharding of fastqs before alignment #1023

shyama-mama · 2023-08-25T10:09:34Z

PR checklist

Added functionality to shard fastqs before aligning. This uses SeqKit. The sharded fastqs are merged together during the lane merge step.

shyama-mama · 2023-08-25T10:37:26Z

I'm looking into the bug here. Seems to be when --skip_preprocessing is used.

TCLamnidis

Good progress! 🚀 Just a few small things:

Check sharded BAM headers. If they are full of RGs for each shard, we might want to tweak that so the RGs are still merged into one per lane as if sharding did not happen. My reasoning is that sharding is a computational optimisation, but does not alter the result itself. @jfy133 @shyama-mama your thoughts?
Rename sharding parameters (can be done once code is mostly done).
Potentially superfluous module include for SAMTOOLS_MERGE_SHARDS

conf/modules.config

CHANGELOG.md

nextflow.config

nextflow_schema.json

subworkflows/local/map.nf

…//github.com/nf-core/eager into dsl2-add-sharding-of-fastqs-before-alignment

github-actions · 2023-08-25T11:15:05Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 6341f02

+| ✅ 159 tests passed       |+
!| ❗  20 tests had warnings |!

❗ Test warnings:

pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in test_humanbam.config: Specify the paths to your test data on nf-core/test-datasets
pipeline_todos - TODO string in test_humanbam.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
pipeline_todos - TODO string in WorkflowEager.groovy: Optionally add in-text citation tools to this list.
schema_description - No description provided in schema for parameter: skip_qualimap
schema_description - No description provided in schema for parameter: skip_damage_calculation

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-eager_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-eager_logo_light.png
files_exist - File found: docs/images/nf-core-eager_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowEager.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-eager_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 3.0.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-eager_logo_light.png matches the template
files_unchanged - docs/images/nf-core-eager_logo_light.png matches the template
files_unchanged - docs/images/nf-core-eager_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreTemplate.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (192 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: release-announcments.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.10
Run at 2023-11-03 15:04:20

TCLamnidis

Minor changes to parameter docs in schema.
Assigning ch_input_for_mapping without overwriting reads channel.

nextflow_schema.json

subworkflows/local/map.nf

…mapping without overwriting reads channel.

shyama-mama · 2023-09-29T08:50:52Z

Not sure what the issue here is with linting. Unable to fix it automatically.
multiqc_config: 'assets/multiqc_config.yml' does not contain a matching 'report_comment'.

TCLamnidis · 2023-10-27T09:37:28Z

@shyama-mama I think the issue with the linting had to do with the new template that has been merged now. Could you resolve the conflicts? Then I can review again and we merge this, hopefully 🦾 😄

TCLamnidis

LGTM.
One minor comment about the groupTuple call: I'm not sure if it is needed or not. But if it works it works.
Only outstanding thing is to add tests in the CI for sharding. Could you please activate sharding for one of the CI jobs with a small number of reads per shard (1000)? File is at .github/workflows/ci.yml

subworkflows/local/map.nf

shyama-mama · 2023-11-03T09:33:06Z

Would I have to add the test in this branch and push changes for the test to run here? Should I also add tests for sharding with bowtie2 and bwamem options for mapping?

…//github.com/nf-core/eager into dsl2-add-sharding-of-fastqs-before-alignment

TCLamnidis · 2023-11-03T14:48:55Z

Add the test here, and push and it will run on this branch, yes.
But please don't add a new test, jut add sharding to any single mapper. We're trying to minimise the number of test commands ran cause things got out of hand in eager 2.*

shyama-mama · 2023-11-03T14:56:10Z

@TCLamnidis Actually I realised I've got sharding enabled with 5000 reads as part of the 'test' profile in the config in this branch. So it has been running for all these tests. Would you rather have it added explicitly in the ci.yml with one of the tests?

TCLamnidis · 2023-11-03T14:59:42Z

oh right! Skimmed over that >.< nevermind me then. undo the ci.yml changes! all good

TCLamnidis

Feel free to merge once the CI changes are amended!

shyama-mama added 4 commits July 3, 2023 11:19

adding seqkit to eager for sharding

f9dde7a

Adding Sharding of FASTQs before alignment using SeqKit

eab4b48

Added Sharding into the test config

6e13057

Adding sharding to changelog

21a246d

shyama-mama requested review from jfy133 and TCLamnidis August 25, 2023 10:09

Merge branch 'dev' into dsl2-add-sharding-of-fastqs-before-alignment

fe408ea

TCLamnidis reviewed Aug 25, 2023

View reviewed changes

conf/modules.config Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

nextflow.config Outdated Show resolved Hide resolved

nextflow_schema.json Outdated Show resolved Hide resolved

subworkflows/local/map.nf Outdated Show resolved Hide resolved

shyama-mama added 2 commits August 25, 2023 20:42

Bug fix for pair end sharding and Formating changes for prettify

d7a711d

Merge branch 'dsl2-add-sharding-of-fastqs-before-alignment' of https:…

29a3b3a

…//github.com/nf-core/eager into dsl2-add-sharding-of-fastqs-before-alignment

Changing param name from shard_fastq to run_fastq_sharding

efd07f2

shyama-mama requested a review from TCLamnidis August 25, 2023 14:07

TCLamnidis requested changes Sep 8, 2023

View reviewed changes

nextflow_schema.json Outdated Show resolved Hide resolved

nextflow_schema.json Outdated Show resolved Hide resolved

nextflow_schema.json Outdated Show resolved Hide resolved

subworkflows/local/map.nf Outdated Show resolved Hide resolved

shyama-mama added 2 commits September 29, 2023 18:08

Minor changes to parameter docs in schema and Assigning ch_input_for_…

4810578

…mapping without overwriting reads channel.

Linting fixes

a2310a5

Merge branch 'dev' into dsl2-add-sharding-of-fastqs-before-alignment

5bfb860

Merge branch 'dev' into dsl2-add-sharding-of-fastqs-before-alignment

763ec6f

TCLamnidis requested changes Nov 3, 2023

View reviewed changes

subworkflows/local/map.nf Show resolved Hide resolved

shyama-mama added 2 commits November 4, 2023 01:08

Adding testing for sharding

e0c6912

Merge branch 'dsl2-add-sharding-of-fastqs-before-alignment' of https:…

bc6f1a2

…//github.com/nf-core/eager into dsl2-add-sharding-of-fastqs-before-alignment

TCLamnidis approved these changes Nov 3, 2023

View reviewed changes

Removing ci test since it is already part of test config

6341f02

shyama-mama merged commit d421158 into dev Nov 11, 2023
18 checks passed

shyama-mama deleted the dsl2-add-sharding-of-fastqs-before-alignment branch November 11, 2023 02:13

jfy133 mentioned this pull request Mar 15, 2024

Chunking and stitching to parallelise alignment #797

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dsl2 add sharding of fastqs before alignment #1023

Dsl2 add sharding of fastqs before alignment #1023

shyama-mama commented Aug 25, 2023

shyama-mama commented Aug 25, 2023

TCLamnidis left a comment •

edited by shyama-mama

Loading

github-actions bot commented Aug 25, 2023 •

edited

Loading

❗ Test warnings:

✅ Tests passed:

Run details

TCLamnidis left a comment •

edited by shyama-mama

Loading

shyama-mama commented Sep 29, 2023

TCLamnidis commented Oct 27, 2023

TCLamnidis left a comment

shyama-mama commented Nov 3, 2023 •

edited

Loading

TCLamnidis commented Nov 3, 2023

shyama-mama commented Nov 3, 2023

TCLamnidis commented Nov 3, 2023

TCLamnidis left a comment

Dsl2 add sharding of fastqs before alignment #1023

Dsl2 add sharding of fastqs before alignment #1023

Conversation

shyama-mama commented Aug 25, 2023

PR checklist

shyama-mama commented Aug 25, 2023

TCLamnidis left a comment • edited by shyama-mama Loading

Choose a reason for hiding this comment

github-actions bot commented Aug 25, 2023 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

✅ Tests passed:

Run details

TCLamnidis left a comment • edited by shyama-mama Loading

Choose a reason for hiding this comment

shyama-mama commented Sep 29, 2023

TCLamnidis commented Oct 27, 2023

TCLamnidis left a comment

Choose a reason for hiding this comment

shyama-mama commented Nov 3, 2023 • edited Loading

TCLamnidis commented Nov 3, 2023

shyama-mama commented Nov 3, 2023

TCLamnidis commented Nov 3, 2023

TCLamnidis left a comment

Choose a reason for hiding this comment

TCLamnidis left a comment •

edited by shyama-mama

Loading

github-actions bot commented Aug 25, 2023 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

TCLamnidis left a comment •

edited by shyama-mama

Loading

shyama-mama commented Nov 3, 2023 •

edited

Loading