-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dsl2 add sharding of fastqs before alignment #1023
Dsl2 add sharding of fastqs before alignment #1023
Conversation
I'm looking into the bug here. Seems to be when --skip_preprocessing is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress! 🚀 Just a few small things:
- Check sharded BAM headers. If they are full of RGs for each shard, we might want to tweak that so the RGs are still merged into one per lane as if sharding did not happen. My reasoning is that sharding is a computational optimisation, but does not alter the result itself. @jfy133 @shyama-mama your thoughts?
- Rename sharding parameters (can be done once code is mostly done).
- Potentially superfluous module include for
SAMTOOLS_MERGE_SHARDS
…//github.com/nf-core/eager into dsl2-add-sharding-of-fastqs-before-alignment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Minor changes to parameter docs in schema.
- Assigning
ch_input_for_mapping
without overwritingreads
channel.
…mapping without overwriting reads channel.
Not sure what the issue here is with linting. Unable to fix it automatically. |
@shyama-mama I think the issue with the linting had to do with the new template that has been merged now. Could you resolve the conflicts? Then I can review again and we merge this, hopefully 🦾 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
One minor comment about the groupTuple
call: I'm not sure if it is needed or not. But if it works it works.
Only outstanding thing is to add tests in the CI for sharding. Could you please activate sharding for one of the CI jobs with a small number of reads per shard (1000)? File is at .github/workflows/ci.yml
Would I have to add the test in this branch and push changes for the test to run here? Should I also add tests for sharding with bowtie2 and bwamem options for mapping? |
…//github.com/nf-core/eager into dsl2-add-sharding-of-fastqs-before-alignment
Add the test here, and push and it will run on this branch, yes. |
@TCLamnidis Actually I realised I've got sharding enabled with 5000 reads as part of the 'test' profile in the config in this branch. So it has been running for all these tests. Would you rather have it added explicitly in the ci.yml with one of the tests? |
oh right! Skimmed over that >.< nevermind me then. undo the ci.yml changes! all good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to merge once the CI changes are amended!
PR checklist
Added functionality to shard fastqs before aligning. This uses SeqKit. The sharded fastqs are merged together during the lane merge step.
scrape_software_versions.py
nf-core lint .
).nextflow run . -profile test,docker
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).