Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Do not merge!] Pseudo PR for first release #9

Closed
wants to merge 195 commits into from
Closed
Show file tree
Hide file tree
Changes from 142 commits
Commits
Show all changes
195 commits
Select commit Hold shift + click to select a range
e59fb9e
Intall the assemblyscan module with nf-core tools
charles-plessy Apr 11, 2024
5b04820
added module gfastats (like installation)
U13bs1125 Apr 12, 2024
28102f7
Sample sheet for one genome file in compressed FASTA format
charles-plessy Apr 12, 2024
875558f
installed a dotplot last
U13bs1125 Apr 14, 2024
c92a3d7
installed lastal
U13bs1125 Apr 14, 2024
4251a26
installed lastdb lastsplit
U13bs1125 Apr 14, 2024
eaff71c
installed last train mafswap
U13bs1125 Apr 14, 2024
d39dff7
Merge branch 'dev' of github.com:oist/plessy_pairwiseGenomeComparison…
charles-plessy Apr 15, 2024
cf6cac8
Added a 'target' parameter
U13bs1125 Apr 17, 2024
497ee63
Merge pull request #18 from oist/adtag
charles-plessy Apr 17, 2024
bf16f5d
Remove handling of paired-end data.
charles-plessy Apr 17, 2024
45d5d3a
Run assemblyscan instead of fastqc
charles-plessy Apr 17, 2024
5e271af
Merge pull request #19 from oist/runAssemblyScan
U13bs1125 Apr 17, 2024
012a2dc
Added 1st batch of new parameters
U13bs1125 Apr 18, 2024
e2301db
Merge pull request #20 from oist/1stnewparams
charles-plessy Apr 18, 2024
3b78b0f
Remove FastQC
charles-plessy Apr 18, 2024
d830867
Add an input channel for the target genome.
charles-plessy Apr 18, 2024
1be75d1
Run lastdb
charles-plessy Apr 18, 2024
7f7fa15
Merge remote-tracking branch 'origin/addLASTDB' into dev
U13bs1125 Apr 19, 2024
61804e2
Inclusion of targetName parameter
U13bs1125 Apr 19, 2024
6f2ee0b
Inclusion of a new module: Last_train
U13bs1125 Apr 19, 2024
ecb5bb4
Run lastal
charles-plessy Apr 19, 2024
c69c8ac
Added modules: split and dotplot
U13bs1125 Apr 23, 2024
95bc221
Added modules: split and dotplot
U13bs1125 Apr 23, 2024
abffa46
tuple val(meta), path(maf)
U13bs1125 Apr 23, 2024
96f32df
correction
U13bs1125 Apr 23, 2024
7911279
...
U13bs1125 Apr 25, 2024
5321778
REGULARIZATION
U13bs1125 Apr 26, 2024
d845557
Merge branch 'dev' into addedmodules
charles-plessy Apr 26, 2024
cd517dd
Merge pull request #24 from oist/addedmodules
charles-plessy Apr 26, 2024
d7859fe
Merge branch 'dev' of github.com:oist/plessy_pairwiseGenomeComparison…
U13bs1125 Apr 26, 2024
5499a51
Remove unwanted spaces to avoid pre-commit failures in GitHub actions.
charles-plessy Apr 26, 2024
91c9e5e
Make YASS the default seed.
charles-plessy Apr 26, 2024
717ed54
Rename the pipeline to pairgenomealign
charles-plessy Apr 26, 2024
2351f32
Merge branch 'TEMPLATE' into dev
charles-plessy Apr 26, 2024
6a5b792
Also rename files accordingly.
charles-plessy Apr 26, 2024
d7b5449
Merge branch 'TEMPLATE' into dev
charles-plessy Apr 26, 2024
57ac4d6
Refreshed logos with nf-core create-logo
charles-plessy Apr 26, 2024
3da9cf8
Put the refreshed logos in the correct directory.
charles-plessy Apr 26, 2024
cdc4661
Finish overlooked merge.
charles-plessy Apr 26, 2024
fc33d26
Put the files in correct folder.
charles-plessy Apr 26, 2024
6c672df
Added modules for m2m, o2o, o2m modules
U13bs1125 Apr 30, 2024
f1b70ff
REGULARIZATION
U13bs1125 Apr 30, 2024
aa6b978
Reduce test time so that it is easier to run on OISTs short queue
charles-plessy Apr 30, 2024
eb36120
ignore that are difficult to reproducibly generate
U13bs1125 May 2, 2024
b9c35f1
Changed the prefices
U13bs1125 May 2, 2024
8d8b79f
Merge branch 'dev' into m2mdotplotsmodules
charles-plessy May 2, 2024
f53d21c
Fix whitespace issues.
charles-plessy May 2, 2024
9703460
Merge pull request #26 from oist/m2mdotplotsmodules
charles-plessy May 2, 2024
0a0194a
Complete pipeline
U13bs1125 May 2, 2024
acbe435
regularization
U13bs1125 May 2, 2024
e14145c
Draft a tube map representation of the pipeline.
charles-plessy May 7, 2024
eb435e0
Cleared TODOs
U13bs1125 May 7, 2024
17b094f
cleared some TODOs
U13bs1125 May 7, 2024
cb100ed
added svg
U13bs1125 May 7, 2024
3910ef1
..
U13bs1125 May 7, 2024
ff50dd1
..
U13bs1125 May 7, 2024
53422ec
,,
U13bs1125 May 7, 2024
b94028b
mm
U13bs1125 May 7, 2024
04e7d37
m
U13bs1125 May 7, 2024
8006076
Editing the README file
U13bs1125 May 8, 2024
b044e01
Clear TODOs
U13bs1125 May 8, 2024
78befc7
Update of subworkflos utils_nfcore_pipeline
U13bs1125 May 8, 2024
c0b46db
Move the many-to-many and downstream alignments to a subworkflow
charles-plessy May 8, 2024
fd6f601
Use the new PAIRALIGN_M2M subworkflow
charles-plessy May 9, 2024
5c21c99
New m2o subworkflow
U13bs1125 May 9, 2024
5ed0636
Added alternataive statement on subworkflows
U13bs1125 May 9, 2024
c05e9de
Regularize groupinf of m2m param
U13bs1125 May 9, 2024
6016d13
mxc
U13bs1125 May 9, 2024
480d5e8
Merge branch 'TEMPLATE' into dev
charles-plessy May 10, 2024
2a4cbbc
added new params arguments; last_extr_args...
U13bs1125 May 13, 2024
9d05170
Merge branch 'subworkflow2' into dev
U13bs1125 May 13, 2024
8b496ff
Added extra arguments for lastal and split: mismap...
U13bs1125 May 14, 2024
7943e4b
Nf-core lint clearing
U13bs1125 May 15, 2024
ca8f726
Remove whitespace
charles-plessy May 21, 2024
dbe4eb8
Ran `pre-commit run --all-files` by hand.
charles-plessy May 21, 2024
36ecc17
pick file name that does not look like unassembled reads
charles-plessy May 21, 2024
6b1679b
Drop the last_split_options parameter
charles-plessy May 21, 2024
0510c6c
Updated Documentations
U13bs1125 May 22, 2024
d9b967d
Merge branch 'dev' of github.com:oist/plessy_pairwiseGenomeComparison…
U13bs1125 May 23, 2024
dd64e51
implement the datasets folder to tests
U13bs1125 May 23, 2024
c21e3e2
Implement dotplot_options
U13bs1125 May 23, 2024
34266b3
Implemented the issue: lastal params
U13bs1125 May 23, 2024
6cfbe17
cleared docs/Readme.md file
U13bs1125 May 23, 2024
a3fef68
Fix typo
charles-plessy May 24, 2024
aed4dff
removed the hanging "i"
U13bs1125 May 24, 2024
ef7d436
Merge branch 'dev' of github.com:oist/plessy_pairwiseGenomeComparison…
charles-plessy May 24, 2024
0e555c6
chema build done
U13bs1125 May 24, 2024
517bdd4
Merge branch 'dev' of github.com:oist/plessy_pairwiseGenomeComparison…
U13bs1125 May 24, 2024
08a3e57
Ran pre-commit run --all-files
charles-plessy May 24, 2024
21386e2
Remove Windowmasker and adjust borders
May 24, 2024
686a1db
Simplify information already given in docs/output.md
May 24, 2024
3f360d1
Transfer information to usage page
May 24, 2024
e102ddd
Credits
May 24, 2024
124eec1
Format reference.
May 24, 2024
8f595db
pre-commit run --all-files
charles-plessy May 24, 2024
4e5aa3f
Brush up schema
charles-plessy May 24, 2024
cdeaa73
Merge branch 'TEMPLATE' into dev
charles-plessy May 28, 2024
5090a0e
Correct input parameter type.
charles-plessy May 28, 2024
97dd3ec
Ensure last-train is ran with --revsym and the other lastal options
charles-plessy May 28, 2024
dbaf9b8
Add a zero to make nf-core lint happier
charles-plessy May 28, 2024
73bdd22
Show information directly in description.
charles-plessy May 28, 2024
5b9c9ac
Try with uppercase E like in funcscan pipeline
charles-plessy May 28, 2024
9ea4a53
Typo
charles-plessy May 28, 2024
7fbb90f
Prefix output file names with ${params.targetName}___
charles-plessy May 28, 2024
8ea3c9f
Allow for .fna and .fna.gz suffixed.
charles-plessy May 28, 2024
58efb12
Use the nf-core repository
charles-plessy May 28, 2024
1832f81
Whitespace changes by pre-commit
charles-plessy May 28, 2024
8493b3b
Numeric argument with non-scientific notation
charles-plessy May 28, 2024
fc353a4
Update nf-core modules
charles-plessy Jun 11, 2024
b354802
Adjust test config to the small size of the sequence
charles-plessy Jun 11, 2024
76f78c9
Set parameters of lastdb to -R01 -c -u${params.seed} -S2
charles-plessy Jun 11, 2024
7bae94e
Rename LAST_LASTAL LAST_LASTAL_M2M in the M2M workflow for consistency.
charles-plessy Jun 11, 2024
8498a70
installed and run seqtk successfully
U13bs1125 Jun 18, 2024
dfb7e54
Effected the changes of Seqtk workflow to run on each channel individ…
U13bs1125 Jun 18, 2024
bcb46f8
Again, prettier and lint
charles-plessy Jun 18, 2024
5b3fa86
Merge pull request #1 from charles-plessy/minus-s-option
U13bs1125 Jun 18, 2024
57fd146
Update pairgenomealign.nf
U13bs1125 Jun 19, 2024
62518a2
Merge pull request #2 from charles-plessy/addseqtk
charles-plessy Jun 19, 2024
9a4724f
Small-scale test suite with fungal genomes.
charles-plessy Jun 19, 2024
d3fde9e
Merge pull request #3 from charles-plessy/fusarium
U13bs1125 Jun 19, 2024
ae22d57
Fix file names in modules runnign on single genomes.
charles-plessy Jun 19, 2024
b5862bd
Merge pull request #4 from charles-plessy/target__query
U13bs1125 Jun 19, 2024
caba650
Merge branch 'dev' of github.com:charles-plessy/pairgenomealign into dev
U13bs1125 Jun 19, 2024
b11fe3a
Update last/dotplot
charles-plessy Jun 28, 2024
588955f
Pass the seqtk cutN output to last-dotplot
charles-plessy Jun 28, 2024
99eb125
Document the use of seqtk cutN to plot polyN regions.
charles-plessy Jun 28, 2024
0ec1923
Document assemblyscan and seqtk; remove postmask
charles-plessy Jun 28, 2024
6f146d6
Reorganise documentation of dot-plots and other outputs
charles-plessy Jun 28, 2024
fb6fc8d
Merge pull request #5 from charles-plessy/plotContigBoundaries2
U13bs1125 Jun 28, 2024
3a061df
Update modules
charles-plessy Jul 16, 2024
36825d7
Collect and report training and alignment statistics.
charles-plessy Jul 16, 2024
a2e223a
Collect more software version numbers.
charles-plessy Jul 16, 2024
fc92de6
Merge pull request #7 from charles-plessy/lastal-multiqc2
U13bs1125 Jul 16, 2024
9499a67
new branch to resolve multiqc version issue
U13bs1125 Jul 17, 2024
52bcfac
Combine two reports, plus minor changes.
charles-plessy Jul 18, 2024
c62835d
Cite LAST papers.
charles-plessy Jul 19, 2024
f43f129
Fix typo in output file name.
charles-plessy Jul 19, 2024
ae2bd85
Change comments and indentation
charles-plessy Jul 19, 2024
6ea049a
Full tests comparing the human genome to other primates.
charles-plessy Jul 19, 2024
e0cbc98
Name the target
charles-plessy Jul 19, 2024
db6492d
Release 1.0.0
charles-plessy Jul 19, 2024
eb13134
Remove the PSEUDO seed from the schema.
charles-plessy Jul 23, 2024
b58818a
Correct duplicated text
charles-plessy Jul 24, 2024
94545b9
Reduce redundancy with `doc/output.md` as suggested in PR #9.
charles-plessy Jul 24, 2024
18a2ba1
Also cite the paper describing the original implementation.
charles-plessy Jul 24, 2024
c9d2be8
Correct the list of accepted file suffixes
charles-plessy Jul 24, 2024
adda591
Remove mention of unported parameters.
charles-plessy Jul 24, 2024
27b0c0a
Improve wording of docs/usage.md
charles-plessy Jul 24, 2024
a58961b
Mention --input explicitely
charles-plessy Jul 24, 2024
2ed941b
Fix typo
charles-plessy Jul 24, 2024
157e675
Fix markdown formatting.
charles-plessy Jul 24, 2024
5463b6b
modified the pipeline logo, now png formatted
U13bs1125 Jul 24, 2024
6df876d
modified the pipeline logo, now png formatted
U13bs1125 Jul 24, 2024
8eefc01
added the new svg formatted pipeline logo/map
U13bs1125 Jul 24, 2024
cd92f37
Merge pull request #10 from oist/devop
charles-plessy Jul 24, 2024
3476a81
Merge branch 'dev' of github.com:nf-core/pairgenomealign into dev
charles-plessy Jul 24, 2024
2940837
Indent workflows/pairgenomealign.nf
charles-plessy Jul 24, 2024
2baf4b4
Remove un-needed example
charles-plessy Jul 24, 2024
054e436
Remove dangling filename.
charles-plessy Jul 24, 2024
f279c5c
Show the full sample sheet as an example.
charles-plessy Jul 24, 2024
11d1457
Multi-query example
charles-plessy Jul 24, 2024
c5ec40a
Slim nextflow.config
charles-plessy Jul 24, 2024
5d36b75
Move LAST output to `alignment/`
charles-plessy Jul 24, 2024
7fb4f49
Merge branch 'dev' of github.com:nf-core/pairgenomealign into dev
charles-plessy Jul 24, 2024
850605e
Put `seqtk cutN` output in `cutn/` and document it.
charles-plessy Jul 24, 2024
cabd248
Remove mention of FastQC
charles-plessy Jul 24, 2024
d3e2e86
[automated] Fix code linting
nf-core-bot Jul 24, 2024
0bb9ad7
Remove duplicated documentation.
charles-plessy Jul 24, 2024
d2406ac
Merge branch 'dev' of github.com:nf-core/pairgenomealign into dev
charles-plessy Jul 24, 2024
a93536f
Fix typo
charles-plessy Jul 24, 2024
c029ae4
Add a human–monkey alignment as example.
charles-plessy Jul 25, 2024
3e349a9
Update workflows/pairgenomealign.nf
charles-plessy Jul 25, 2024
340d9c6
Rename the custom module and document its output.
charles-plessy Jul 25, 2024
6949ad5
Merge branch 'dev' of github.com:nf-core/pairgenomealign into dev
charles-plessy Jul 25, 2024
ab93bb4
Revert "Update workflows/pairgenomealign.nf"
charles-plessy Jul 25, 2024
420a929
Polish parameter description.
charles-plessy Jul 25, 2024
0b417aa
Move tube map to docs/ hoping it solves display problem.
charles-plessy Jul 25, 2024
e9fb4bd
Add an example dot-plot
charles-plessy Jul 25, 2024
eca7b83
Remove FASTQC examples.
charles-plessy Jul 25, 2024
823bcdc
Add new multiqc examples
charles-plessy Jul 25, 2024
591ee73
Merge branch 'dev' of github.com:nf-core/pairgenomealign into dev
charles-plessy Jul 25, 2024
057a097
Display example MultiQC plots
charles-plessy Jul 25, 2024
123d9dc
prettier
charles-plessy Jul 25, 2024
44fff18
modified the logomap again as advised by nfcore team
U13bs1125 Jul 25, 2024
cc8234a
Merge pull request #13 from oist/devlogo
charles-plessy Jul 26, 2024
dd56788
Add a codename
charles-plessy Jul 26, 2024
d5df279
Fix filename
charles-plessy Jul 26, 2024
3737d98
Merge branch 'dev' of github.com:nf-core/pairgenomealign into dev
charles-plessy Jul 26, 2024
872991d
Use a Markdown link instead of HTML.
charles-plessy Jul 26, 2024
57224ea
pre-commit fixes
charles-plessy Jul 26, 2024
2d4f08c
Rename and document some table columns
charles-plessy Aug 7, 2024
830d557
Thank Martin and teammates
charles-plessy Aug 7, 2024
a298b19
Remove mention of lastdb -P because it does not impact the alignment …
charles-plessy Aug 8, 2024
c493be3
Update release date in CHANGELOG.md
charles-plessy Aug 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ jobs:
steps:
- name: Launch workflow via Seqera Platform
uses: seqeralabs/action-tower-launch@v2
# TODO nf-core: You can customise AWS full pipeline tests as required
# Add full size test data (but still relatively small datasets for few samples)
# on the `test_full.config` test runs with only one set of parameters
with:
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ jobs:
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1

- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
run: |
Expand Down
5 changes: 5 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
repository_type: pipeline
lint:
files_unchanged:
- assets/nf-core-pairgenomealign_logo_light.png
- docs/images/nf-core-pairgenomealign_logo_light.png
- docs/images/nf-core-pairgenomealign_logo_dark.png
nf_core_version: "2.14.1"
10 changes: 1 addition & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,6 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.0dev - [date]
charles-plessy marked this conversation as resolved.
Show resolved Hide resolved
## v1.0.0 - [July 19th, 2024]

Initial release of nf-core/pairgenomealign, created with the [nf-core](https://nf-co.re/) template.

### `Added`

### `Fixed`

### `Dependencies`

### `Deprecated`
12 changes: 10 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,17 @@

charles-plessy marked this conversation as resolved.
Show resolved Hide resolved
## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [LAST](https://gitlab.com/mcfrith/last/)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
> Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011 21(3):487-93. doi: 10.1101/gr.113985.110. PubMed PMID: 21209072 (This describes the main algorithms used by LAST.)

> Frith MC, Noé L. Improved search heuristics find 20,000 new alignments between human and mouse genomes. doi: 10.1093/nar/gku104 PubMed PMID: 24493737 (This describes sensitive DNA seeding (MAM8 and MAM4)

> Frith MC, Kawaguchi R. Split-alignment of genomes finds orthologies more accurately. Genome Biology. 2015 16:106. doi: 10.1186/s13059-015-0670-9 PubMed PMID: 25994148 (Describes the split alignment algorithm, and its application to whole genome alignment.)

> Hamada M, Ono Y, Asai K Frith MC. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Bioinformatics. 2017 33(6):926-928. doi: 10.1093/bioinformatics/btw742 PubMed PMID: 28039163 (Describes last-train.)

> Frith MC, Shaw J, Spouge JL. How to optimally sample a sequence for rapid analysis. doi: 10.1093/bioinformatics/btad057 PubMed PMID: 36702468 (Describes the lastdb -u RY sparsity options.)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

Expand Down
50 changes: 24 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,49 +19,42 @@

## Introduction

**nf-core/pairgenomealign** is a bioinformatics pipeline that ...
**nf-core/pairgenomealign** is a bioinformatics pipeline that aligns one or more _query_ genomes to a _target_ genome, and plots pairwise representations.

<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
major pipeline sections and the types of output it produces. You're giving an overview to someone new
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
-->
<img src= "assets/tube_map.svg">

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
The pipeline can generate four kinds of outputs, depending on whether sequences of one genome can match the other genome multiple times or not.

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
- _**many-to-many**_ (M2M): Every computed alignments between the _target_ and a _query_ genome.
- _**many-to-one**_ (M2O): Alignments where regions of the _target_ genome are matched at most once by a _query_ genome.
- _**one-to-many**_ (M2O): Alignments where regions of a _query_ genome are matched at most once by the _target_ genome.
- _**one-to-one**_ (O2O) Alignment where regions of the _target_ and _query_ genomes are used at most once.

These alignments are output in [MAF](https://genome.ucsc.edu/FAQ/FAQformat.html#format5) format, and optional line plot representations are output in PNG format.
charles-plessy marked this conversation as resolved.
Show resolved Hide resolved

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.

<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
Explain what rows and columns represent. For instance (please edit as appropriate):

First, prepare a samplesheet with your input data that looks as follows:

`samplesheet.csv`:

```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
sample,fasta
query_1,path-to-query-genome-file-one.fasta
query_2,path-to-query-genome-file-two.fasta
```

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).

-->
Each row represents a fasta file, this can also contain multiple rows to accomodate multiple query genomes in fasta format.

Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
nextflow run nf-core/pairgenomealign \
-profile <docker/singularity/.../institute> \
--target sequencefile.fa \
--input samplesheet.csv \
--outdir <OUTDIR>
```
Expand All @@ -80,11 +73,11 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/pairgenomealign was originally written by charles-plessy.
`nf-core/pairgenomealign` was originally written by [charles-plessy](https://github.com/charles-plessy); the original versions are available at <https://github.com/oist/plessy_pairwiseGenomeComparison>.

We thank the following people for their extensive assistance in the development of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
- [Mahdi Mohammed](https://github.com/U13bs1125): ported the original pipeline to _nf-core_ template 2.14.x.

## Contributions and Support

Expand All @@ -94,10 +87,15 @@ For further information or help, don't hesitate to get in touch on the [Slack `#

## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/pairgenomealign for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
If you use this pipeline, please cite:

> **Extreme genome scrambling in marine planktonic Oikopleura dioica cryptic species.**
> Charles Plessy, Michael J. Mansfield, Aleksandra Bliznina, Aki Masunaga, Charlotte West, Yongkai Tan, Andrew W. Liu, Jan Grašič, María Sara del Río Pisula, Gaspar Sánchez-Serna, Marc Fabrega-Torrus, Alfonso Ferrández-Roldán, Vittoria Roncalli, Pavla Navratilova, Eric M. Thompson, Takeshi Onuma, Hiroki Nishida, Cristian Cañestro, Nicholas M. Luscombe.
> _Genome Res._ 2024. 34: 426-440; doi: [10.1101/2023.05.09.539028](https://doi.org/10.1101/gr.278295.123). PubMed ID: [38621828](https://pubmed.ncbi.nlm.nih.gov/38621828/)

[OIST research news article](https://www.oist.jp/news-center/news/2024/4/25/oikopleura-who-species-identity-crisis-genome-community)

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
And also please cite the [LAST papers](https://gitlab.com/mcfrith/last/-/blob/main/doc/last-papers.rst).

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

Expand Down
1 change: 0 additions & 1 deletion assets/methods_description_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag
section_name: "nf-core/pairgenomealign Methods Description"
section_href: "https://github.com/nf-core/pairgenomealign"
plot_type: "html"
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
Expand Down
20 changes: 18 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/pairgenomealign/tree/dev" target="_blank">nf-core/pairgenomealign</a>
This report has been generated by the <a href="https://github.com/nf-core/pairgenomealign/releases/tag/1.0.0" target="_blank">nf-core/pairgenomealign</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/pairgenomealign/dev/docs/output" target="_blank">documentation</a>.
<a href="https://nf-co.re/pairgenomealign/1.0.0/docs/output" target="_blank">documentation</a>.
report_section_order:
"nf-core-pairgenomealign-methods-description":
order: -1000
Expand All @@ -13,3 +13,19 @@ report_section_order:
export_plots: true

disable_version_detection: true

custom_data:
train:
file_format: "tsv"
section_name: "Training parameter statistics"
plot_type: "table"
last_o2o:
file_format: "tsv"
section_name: "Alignment statistics"
plot_type: "table"

sp:
last_o2o:
fn: "*o2o_aln.tsv"
train:
fn: "*train.tsv"
35 changes: 35 additions & 0 deletions assets/samplesheet_full.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
sample,fasta
Homo_sapiens_GCA_000001405.29,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.29_GRCh38.p14/GCA_000001405.29_GRCh38.p14_genomic.fna.gz
Callithrix_jacchus_GCA_000004665.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/004/665/GCA_000004665.1_Callithrix_jacchus-3.2/GCA_000004665.1_Callithrix_jacchus-3.2_genomic.fna.gz
Cercopithecus_mitis_GCA_028627265.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/627/265/GCA_028627265.1_Cercopithecus_mitis_HiC/GCA_028627265.1_Cercopithecus_mitis_HiC_genomic.fna.gz
Chlorocebus_sabaeus_GCA_000409795.2,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/409/795/GCA_000409795.2_Chlorocebus_sabeus_1.1/GCA_000409795.2_Chlorocebus_sabeus_1.1_genomic.fna.gz
Colobus_guereza_GCA_030247045.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/030/247/045/GCA_030247045.1_ASM3024704v1/GCA_030247045.1_ASM3024704v1_genomic.fna.gz
Eulemur_mongoz_GCA_028534055.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/534/055/GCA_028534055.1_Eulemur_mongoz_HiC/GCA_028534055.1_Eulemur_mongoz_HiC_genomic.fna.gz
Gorilla_gorilla_gorilla_GCA_000151905.3,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/151/905/GCA_000151905.3_gorGor4/GCA_000151905.3_gorGor4_genomic.fna.gz
Hylobates_pileatus_GCA_021498465.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/021/498/465/GCA_021498465.1_ASM2149846v1/GCA_021498465.1_ASM2149846v1_genomic.fna.gz
Lemur_catta_GCA_020740605.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/020/740/605/GCA_020740605.1_mLemCat1.pri/GCA_020740605.1_mLemCat1.pri_genomic.fna.gz
Leontopithecus_rosalia_GCA_028533165.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/533/165/GCA_028533165.1_Leontopithecus_rosalia_HiC/GCA_028533165.1_Leontopithecus_rosalia_HiC_genomic.fna.gz
Macaca_cyclopis_GCA_026956025.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/026/956/025/GCA_026956025.1_MCyc01/GCA_026956025.1_MCyc01_genomic.fna.gz
Macaca_fascicularis_GCA_011100615.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/011/100/615/GCA_011100615.1_Macaca_fascicularis_6.0/GCA_011100615.1_Macaca_fascicularis_6.0_genomic.fna.gz
Macaca_mulatta_GCA_003339765.3,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/339/765/GCA_003339765.3_Mmul_10/GCA_003339765.3_Mmul_10_genomic.fna.gz
Macaca_thibetana_thibetana_GCA_024542745.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/024/542/745/GCA_024542745.1_ASM2454274v1/GCA_024542745.1_ASM2454274v1_genomic.fna.gz
Microcebus_murinus_GCA_000165445.3,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/165/445/GCA_000165445.3_Mmur_3.0/GCA_000165445.3_Mmur_3.0_genomic.fna.gz
Miopithecus_talapoin_GCA_028551445.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/551/445/GCA_028551445.1_Miopithecus_talapoin_HiC/GCA_028551445.1_Miopithecus_talapoin_HiC_genomic.fna.gz
Nasalis_larvatus_GCA_000772465.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/772/465/GCA_000772465.1_Charlie1.0/GCA_000772465.1_Charlie1.0_genomic.fna.gz
Nomascus_leucogenys_GCA_000146795.3,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/146/795/GCA_000146795.3_Nleu_3.0/GCA_000146795.3_Nleu_3.0_genomic.fna.gz
Nycticebus_bengalensis_GCA_023898255.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/023/898/255/GCA_023898255.1_ASM2389825v1/GCA_023898255.1_ASM2389825v1_genomic.fna.gz
Nycticebus_coucang_GCA_027406575.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/027/406/575/GCA_027406575.1_mNycCou1.pri/GCA_027406575.1_mNycCou1.pri_genomic.fna.gz
Pan_paniscus_GCA_000258655.2,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/258/655/GCA_000258655.2_panpan1.1/GCA_000258655.2_panpan1.1_genomic.fna.gz
Pan_troglodytes_GCA_000001515.5,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/515/GCA_000001515.5_Pan_tro_3.0/GCA_000001515.5_Pan_tro_3.0_genomic.fna.gz
Papio_anubis_GCA_000264685.2,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/264/685/GCA_000264685.2_Panu_3.0/GCA_000264685.2_Panu_3.0_genomic.fna.gz
Papio_papio_GCA_028645565.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/645/565/GCA_028645565.1_Papio_papio_HiC/GCA_028645565.1_Papio_papio_HiC_genomic.fna.gz
Piliocolobus_tephrosceles_GCA_002776525.5,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/776/525/GCA_002776525.5_ASM277652v5/GCA_002776525.5_ASM277652v5_genomic.fna.gz
Pithecia_pithecia_GCA_028551515.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/551/515/GCA_028551515.1_Pithecia_pithecia_HiC/GCA_028551515.1_Pithecia_pithecia_HiC_genomic.fna.gz
Pongo_abelii_GCA_028885655.2,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/885/655/GCA_028885655.2_NHGRI_mPonAbe1-v2.0_pri/GCA_028885655.2_NHGRI_mPonAbe1-v2.0_pri_genomic.fna.gz
Pongo_pygmaeus_GCA_028885625.2,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/885/625/GCA_028885625.2_NHGRI_mPonPyg2-v2.0_pri/GCA_028885625.2_NHGRI_mPonPyg2-v2.0_pri_genomic.fna.gz
Rhinopithecus_roxellana_GCA_007565055.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/007/565/055/GCA_007565055.1_ASM756505v1/GCA_007565055.1_ASM756505v1_genomic.fna.gz
Saguinus_midas_GCA_021498475.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/021/498/475/GCA_021498475.1_ASM2149847v1/GCA_021498475.1_ASM2149847v1_genomic.fna.gz
Saguinus_oedipus_GCA_031835075.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/031/835/075/GCA_031835075.1_ASM3183507v1/GCA_031835075.1_ASM3183507v1_genomic.fna.gz
Symphalangus_syndactylus_GCA_028878055.3,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/878/055/GCA_028878055.3_NHGRI_mSymSyn1-v2.1_pri/GCA_028878055.3_NHGRI_mSymSyn1-v2.1_pri_genomic.fna.gz
Theropithecus_gelada_GCA_003255815.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/255/815/GCA_003255815.1_Tgel_1.0/GCA_003255815.1_Tgel_1.0_genomic.fna.gz
Varecia_variegata_GCA_028533085.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/028/533/085/GCA_028533085.1_Varecia_variegata_HiC/GCA_028533085.1_Varecia_variegata_HiC_genomic.fna.gz
3 changes: 3 additions & 0 deletions assets/samplesheet_small.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sample,fasta
Fusarium_asiaticum_GCA_025258505.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/025/258/505/GCA_025258505.1_ASM2525850v1/GCA_025258505.1_ASM2525850v1_genomic.fna.gz
Fusarium_oxysporum_GCA_014857085.1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/014/857/085/GCA_014857085.1_ASM1485708v1/GCA_014857085.1_ASM1485708v1_genomic.fna.gz
15 changes: 4 additions & 11 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,14 @@
"errorMessage": "Sample name must be provided and cannot contain spaces",
"meta": ["id"]
},
"fastq_1": {
"fasta": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
"pattern": "^\\S+\\.f(ast|n)?a(\\.gz)?$",
"errorMessage": "Fasta file for genomes must be provided, cannot contain spaces and must have extension '.fa', '.fa.gz', '.fna', '.fna.gz', '.fasta' or '.fasta.gz'"
charles-plessy marked this conversation as resolved.
Show resolved Hide resolved
}
},
"required": ["sample", "fastq_1"]
"required": ["sample", "fasta"]
}
}
Loading
Loading