Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dsl2 add sharding of fastqs before alignment #1023

Merged
merged 15 commits into from
Nov 11, 2023

Conversation

shyama-mama
Copy link

PR checklist

Added functionality to shard fastqs before aligning. This uses SeqKit. The sharded fastqs are merged together during the lane merge step.

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/eager/tree/master/.github/CONTRIBUTING.md)
    • If necessary, also make a PR on the nf-core/eager branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@shyama-mama
Copy link
Author

I'm looking into the bug here. Seems to be when --skip_preprocessing is used.

Copy link
Collaborator

@TCLamnidis TCLamnidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress! 🚀 Just a few small things:

  • Check sharded BAM headers. If they are full of RGs for each shard, we might want to tweak that so the RGs are still merged into one per lane as if sharding did not happen. My reasoning is that sharding is a computational optimisation, but does not alter the result itself. @jfy133 @shyama-mama your thoughts?
  • Rename sharding parameters (can be done once code is mostly done).
  • Potentially superfluous module include for SAMTOOLS_MERGE_SHARDS

conf/modules.config Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
nextflow.config Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
subworkflows/local/map.nf Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Aug 25, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 6341f02

+| ✅ 159 tests passed       |+
!| ❗  20 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
  • pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
  • pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
  • pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in test_humanbam.config: Specify the paths to your test data on nf-core/test-datasets
  • pipeline_todos - TODO string in test_humanbam.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
  • pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
  • pipeline_todos - TODO string in WorkflowEager.groovy: Optionally add in-text citation tools to this list.
  • schema_description - No description provided in schema for parameter: skip_qualimap
  • schema_description - No description provided in schema for parameter: skip_damage_calculation

✅ Tests passed:

Run details

  • nf-core/tools version 2.10
  • Run at 2023-11-03 15:04:20

@shyama-mama shyama-mama requested a review from TCLamnidis August 25, 2023 14:07
Copy link
Collaborator

@TCLamnidis TCLamnidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Minor changes to parameter docs in schema.
  • Assigning ch_input_for_mapping without overwriting reads channel.

nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
subworkflows/local/map.nf Outdated Show resolved Hide resolved
@shyama-mama
Copy link
Author

Not sure what the issue here is with linting. Unable to fix it automatically.
multiqc_config: 'assets/multiqc_config.yml' does not contain a matching 'report_comment'.

@TCLamnidis
Copy link
Collaborator

@shyama-mama I think the issue with the linting had to do with the new template that has been merged now. Could you resolve the conflicts? Then I can review again and we merge this, hopefully 🦾 😄

Copy link
Collaborator

@TCLamnidis TCLamnidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
One minor comment about the groupTuple call: I'm not sure if it is needed or not. But if it works it works.
Only outstanding thing is to add tests in the CI for sharding. Could you please activate sharding for one of the CI jobs with a small number of reads per shard (1000)? File is at .github/workflows/ci.yml

subworkflows/local/map.nf Show resolved Hide resolved
@shyama-mama
Copy link
Author

shyama-mama commented Nov 3, 2023

Would I have to add the test in this branch and push changes for the test to run here? Should I also add tests for sharding with bowtie2 and bwamem options for mapping?

@TCLamnidis
Copy link
Collaborator

Add the test here, and push and it will run on this branch, yes.
But please don't add a new test, jut add sharding to any single mapper. We're trying to minimise the number of test commands ran cause things got out of hand in eager 2.*

@shyama-mama
Copy link
Author

@TCLamnidis Actually I realised I've got sharding enabled with 5000 reads as part of the 'test' profile in the config in this branch. So it has been running for all these tests. Would you rather have it added explicitly in the ci.yml with one of the tests?

@TCLamnidis
Copy link
Collaborator

oh right! Skimmed over that >.< nevermind me then. undo the ci.yml changes! all good

Copy link
Collaborator

@TCLamnidis TCLamnidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge once the CI changes are amended!

@shyama-mama shyama-mama merged commit d421158 into dev Nov 11, 2023
18 checks passed
@shyama-mama shyama-mama deleted the dsl2-add-sharding-of-fastqs-before-alignment branch November 11, 2023 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants