Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL2: Add multi-reference mapping support #952

Merged
merged 55 commits into from
May 26, 2023
Merged

DSL2: Add multi-reference mapping support #952

merged 55 commits into from
May 26, 2023

Conversation

jfy133
Copy link
Member

@jfy133 jfy133 commented Mar 20, 2023

Adds ability to provide a TSV file listing multiple references and their correpsonding indicies etc.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/eager/tree/master/.github/CONTRIBUTING.md)
    • If necessary, also make a PR on the nf-core/eager branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jfy133 jfy133 marked this pull request as draft March 20, 2023 08:15
@jfy133
Copy link
Member Author

jfy133 commented Mar 24, 2023

Current to do:

  • Add test data and a profile of referencesheet.tsv
  • Run tests cycling through presence of each column

@jfy133
Copy link
Member Author

jfy133 commented Mar 24, 2023

@nf-core-bot fix linting

@jfy133 jfy133 mentioned this pull request Mar 28, 2023
11 tasks
@TCLamnidis
Copy link
Collaborator

Please rename mitchondrion to mitochondrion_name

@jfy133
Copy link
Member Author

jfy133 commented Mar 28, 2023

I vote mitochondrion_header

@jfy133
Copy link
Member Author

jfy133 commented Apr 28, 2023

Next time: Continue updating modules.conf so all output files after mappign include the reference name
Then official samplesheet (see testing/) and then actual testing

@jfy133 jfy133 requested a review from TCLamnidis May 15, 2023 13:31
Copy link
Contributor

@scarlhoff scarlhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly documentation and spelling 🙇

docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
subworkflows/local/reference_indexing_multi.nf Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
@jfy133 jfy133 requested a review from scarlhoff May 24, 2023 08:44
@jfy133
Copy link
Member Author

jfy133 commented May 24, 2023

@nf-core-bot fix linting

Copy link
Collaborator

@TCLamnidis TCLamnidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

  • ext.prefix of (non-published) mapping output files should also include ${meta.lane}, since we map per lane, then merge! (No tests have one sample over multiple lanes as fasta input!!)
    • This should also apply to BAM -> FASTQ conversion, to avoid file name collisions.
    • maybe add test with one-sample over multiple lanes?
    • maybe tag should include library_id and lane also?
  • Column vs entries missing in reference TSV. Are both optional? we should stick to clear nomenclature.
  • replace mitochondrion to mitochondrion_header in the docs. That was agreed as the name for the column.
  • Example reference TSV. If it's meant to be there, re-add it. if not, then rephrase.
  • remove "notes prose" from parameter description.

docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
@jfy133
Copy link
Member Author

jfy133 commented May 24, 2023

I've made all the changes @TCLamnidis , however you and @scarlhoff's stupid HapMap poop-gen files have now broken the tests again...

ERROR ~ Invalid method invocation `call` with arguments: [[id:Mammoth_MT_Krause], /nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta, /home/runner/work/eager/eager/work/68/315f26b95b0aea12ecbd65f4320b86/Mammoth_MT_Krause.fasta.fai, /home/runner/work/eager/eager/work/c8/12f5e1a6486962f43ad873f4bc788a/Mammoth_MT_Krause.dict, /home/runner/work/eager/eager/work/ae/affedff2a287be10a59906aad52ca4/bwa, null, null, /home/runner/work/eager/eager/assets/angsd_resources/HapMapChrX.gz] (java.util.LinkedList) on _closure20 type

Following my previous notes to myself , there is a .map{ definition somewhere, where the HapMap position of a channel element is not given a variable name. However for teh life of me I cannot find where on earht this HapMap file is added to the reference indices.... so you guys need to find that and fix it, as this has come from you 😬 . Otherwise wierdly using the multiref CSV file (Rather than straight fasta), passes teh tests.

@scarlhoff
Copy link
Contributor

I've made all the changes @TCLamnidis , however you and @scarlhoff's stupid HapMap poop-gen files have now broken the tests again...

ERROR ~ Invalid method invocation `call` with arguments: [[id:Mammoth_MT_Krause], /nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta, /home/runner/work/eager/eager/work/68/315f26b95b0aea12ecbd65f4320b86/Mammoth_MT_Krause.fasta.fai, /home/runner/work/eager/eager/work/c8/12f5e1a6486962f43ad873f4bc788a/Mammoth_MT_Krause.dict, /home/runner/work/eager/eager/work/ae/affedff2a287be10a59906aad52ca4/bwa, null, null, /home/runner/work/eager/eager/assets/angsd_resources/HapMapChrX.gz] (java.util.LinkedList) on _closure20 type

Following my previous notes to myself , there is a .map{ definition somewhere, where the HapMap position of a channel element is not given a variable name. However for teh life of me I cannot find where on earht this HapMap file is added to the reference indices.... so you guys need to find that and fix it, as this has come from you 😬 . Otherwise wierdly using the multiref CSV file (Rather than straight fasta), passes teh tests.

Should be fixed now, only needed to add the new reference sheet columns to the channel mapping adding a meta to the hapmap file. I also changed the test profile for angsd and mtnucratio CI tests to test_humanbam, so it will actually produce output.
The hapmap will be properly integrated to the reference sheet in a separate PR

Copy link
Collaborator

@TCLamnidis TCLamnidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will approve pre-emptively cause the changes are very small now

  • Lane info should also be in the output name for BAM filtering output (for BAM input of the same library that only differs in lane number)
  • I think that for test_humanbam, the mt header should be MT and not NC_007596.2? I believe that bam was mapped to hs37d5. Please check annd adjust accordingly.

Decision: point 1 will be dealt with in a separate PR linked to this issue, because adding the attribute to the meta for BAM input could break downstream channel operations.

conf/modules.config Show resolved Hide resolved
@jfy133 jfy133 merged commit ac6e136 into dev May 26, 2023
@jfy133 jfy133 deleted the dsl2-indexing-multi branch May 26, 2023 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants