Skip to content

Commit

Permalink
Merge pull request #587 from nf-core/new-module-ctat-splicing
Browse files Browse the repository at this point in the history
New module ctat splicing
  • Loading branch information
nvnieuwk authored Dec 20, 2024
2 parents 34121d1 + a209fea commit 3f4406a
Show file tree
Hide file tree
Showing 17 changed files with 536 additions and 51 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Add nf-test to local subworkflow: `TRIM_WORKFLOW` [#572](https://github.com/nf-core/rnafusion/pull/572)
- Add nf-test to local module: `FUSIONREPORT_DETECT`. Improve `FUSIONREPORT_DOWNLOAD` module [#577](https://github.com/nf-core/rnafusion/pull/577)
- Add nf-test to local subworkflow: `ARRIBA_WORKFLOW` [#578](https://github.com/nf-core/rnafusion/pull/578)
- Added a new module `CTATSPLICING_STARTOCANCERINTRONS` and a new parameter `--ctatsplicing`. This options creates reports on cancer splicing abberations and requires one or both of `--arriba` and `--starfusion` to be given. [#587](https://github.com/nf-core/rnafusion/pull/587)

### Changed

Expand Down
28 changes: 26 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,30 @@ process {
]
}

withName: '.*ARRIBA_WORKFLOW:.*:CTATSPLICING_STARTOCANCERINTRONS' {
ext.args = {[
bam ? "--vis" : "",
"--sample_name ${meta.id}",
].join(" ")}
publishDir = [
path: { "${params.outdir}/ctatsplicing/arriba" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*STARFUSION_WORKFLOW:.*:CTATSPLICING_STARTOCANCERINTRONS' {
ext.args = {[
bam ? "--vis" : "",
"--sample_name ${meta.id}",
].join(" ")}
publishDir = [
path: { "${params.outdir}/ctatsplicing/starfusion" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'ENSEMBL_DOWNLOAD' {
publishDir = [
path: { "${params.genomes_base}/ensembl" },
Expand Down Expand Up @@ -259,15 +283,15 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
ext.args = '--readFilesCommand zcat \
--outSAMtype BAM Unsorted \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--outBAMcompression 0 \
--outFilterMultimapNmax 50 \
--peOverlapNbasesMin 10 \
--alignSplicedMateMapLminOverLmate 0.5 \
--alignSJstitchMismatchNmax 5 -1 5 5 \
--chimSegmentMin 10 \
--chimOutType WithinBAM HardClip \
--chimOutType WithinBAM HardClip Junctions \
--chimJunctionOverhangMin 10 \
--chimScoreDropMax 30 \
--chimScoreJunctionNonGTAG 0 \
Expand Down
9 changes: 9 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,12 @@ params {
// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnafusion/testdata/human/samplesheet_valid.csv'
}

// Limit and standardize resources for github actions and reproducibility
process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}
9 changes: 9 additions & 0 deletions conf/test_build.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,12 @@ params {
fusion_annot_lib = 'https://github.com/STAR-Fusion/STAR-Fusion-Tutorial/raw/master/CTAT_HumanFusionLib.mini.dat.gz'

}

// Limit and standardize resources for github actions and reproducibility
process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}
9 changes: 9 additions & 0 deletions conf/test_cosmic.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,12 @@ params {
cosmic_username = secrets.COSMIC_USERNAME
cosmic_passwd = secrets.COSMIC_PASSWD
}

// Limit and standardize resources for github actions and reproducibility
process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}
36 changes: 36 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [STAR-fusion](#starfusion) - STAR-fusion fusion detection
- [StringTie](#stringtie) - StringTie assembly
- [FusionCatcher](#fusioncatcher) - Fusion catcher fusion detection
- [CTAT-SPLICING](#ctat-splicing) - Detection and annotation of cancer splicing aberrations
- [Samtools](#samtools) - SAM/BAM file manipulation
- [Fusion-report](#fusion-report) - Summary of the findings of each tool and comparison to COSMIC, Mitelman, and FusionGDB2 databases
- [FusionInspector](#fusionInspector) - Supervised analysis of fusion predictions from fusion-report, recover and re-score evidence for such predictions
Expand Down Expand Up @@ -186,6 +187,41 @@ The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They m

[FusionCatcher](https://github.com/ndaniel/fusioncatcher) searches for novel/known somatic fusion genes translocations, and chimeras in RNA-seq data. Possibility to use parameter `--fusioncatcher_limitSjdbInsertNsj` to modify limitSjdbInsertNsj.

### CTAT-SPLICING

<details markdown="1">
<summary>Output files</summary>

- `ctatsplicing`
- `arriba`
- `<sample>.cancer_intron_reads.sorted.bam`
- `<sample>.cancer_intron_reads.sorted.bam.bai`
- `<sample>.cancer.introns`
- `<sample>.cancer.introns.prelim`
- `<sample>.chckpts`
- `<sample>.ctat-splicing.igv.html`
- `<sample>.gene_reads.sorted.sifted.bam`
- `<sample>.gene_reads.sorted.sifted.bam.bai`
- `<sample>.igv.tracks`
- `<sample>.introns`
- `<sample>.introns.for_IGV.bed`
- `starfusion`
- `<sample>.cancer_intron_reads.sorted.bam`
- `<sample>.cancer_intron_reads.sorted.bam.bai`
- `<sample>.cancer.introns`
- `<sample>.cancer.introns.prelim`
- `<sample>.chckpts`
- `<sample>.ctat-splicing.igv.html`
- `<sample>.gene_reads.sorted.sifted.bam`
- `<sample>.gene_reads.sorted.sifted.bam.bai`
- `<sample>.igv.tracks`
- `<sample>.introns`
- `<sample>.introns.for_IGV.bed`

</details>

[CTAT-SPLICING](https://github.com/TrinityCTAT/CTAT-SPLICING/wiki) detects and annotates of aberrant splicing isoforms in cancer. This is run on the input files for `arriba` and/or `starfusion`.

### FusionInspector

<details markdown="1">
Expand Down
4 changes: 2 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The pipeline is divided into two parts:

2. Detecting fusions

- Supported tools: `Arriba`, `FusionCatcher`, `STAR-Fusion`, and `StringTie`
- Supported tools: `Arriba`, `FusionCatcher`, `STAR-Fusion`, `StringTie` and `CTAT-SPLICING`
- QC: `Fastqc`, `MultiQC`, and `Picard CollectInsertSize`, `Picard CollectWgsMetrics`, `Picard Markduplicates`
- Fusions visualization: `Arriba`, `fusion-report`, `FusionInspector`, and `vcf_collect`

Expand Down Expand Up @@ -136,7 +136,7 @@ As you can see above for multiple runs of the same sample, the `sample` name has

### Starting commands

The pipeline can either be run using all fusion detection tools or specifying individual tools. Visualisation tools will be run on all fusions detected. To run all tools (`arriba`, `fusioncatcher`, `starfusion`, `stringtie`) use the `--all` parameter:
The pipeline can either be run using all fusion detection tools or specifying individual tools. Visualisation tools will be run on all fusions detected. To run all tools (`arriba`, `fusioncatcher`, `starfusion`, `stringtie`, `ctat-splicing`) use the `--all` parameter:

```bash
nextflow run nf-core/rnafusion \
Expand Down
72 changes: 72 additions & 0 deletions modules/local/ctatsplicing/startocancerintrons/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
process CTATSPLICING_STARTOCANCERINTRONS {
tag "$meta.id"
label 'process_single'

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://data.broadinstitute.org/Trinity/CTAT_SINGULARITY/CTAT-SPLICING/ctat_splicing.v0.0.2.simg' :
'docker.io/trinityctat/ctat_splicing:0.0.2' }"

input:
tuple val(meta), path(split_junction), path(junction), path(bam), path(bai)
tuple val(meta2), path(genome_lib)

output:
tuple val(meta), path("*.cancer_intron_reads.sorted.bam") , emit: cancer_introns_sorted_bam
tuple val(meta), path("*.cancer_intron_reads.sorted.bam.bai") , emit: cancer_introns_sorted_bai
tuple val(meta), path("*.gene_reads.sorted.sifted.bam") , emit: gene_reads_sorted_bam
tuple val(meta), path("*.gene_reads.sorted.sifted.bam.bai") , emit: gene_reads_sorted_bai
tuple val(meta), path("*.cancer.introns") , emit: cancer_introns
tuple val(meta), path("*.cancer.introns.prelim") , emit: cancer_introns_prelim
tuple val(meta), path("*${prefix}.introns") , emit: introns
tuple val(meta), path("*.introns.for_IGV.bed") , emit: introns_igv_bed, optional: true
tuple val(meta), path("*.ctat-splicing.igv.html") , emit: igv_html, optional: true
tuple val(meta), path("*.igv.tracks") , emit: igv_tracks, optional: true
tuple val(meta), path("*.chckpts") , emit: chckpts
path "versions.yml" , emit: versions

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
def bam_arg = bam ? "--bam_file ${bam}" : ""
def VERSION = '0.0.2' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
def create_index = bam && !bai ? "samtools index ${bam}" : ""
"""
${create_index}
/usr/local/src/CTAT-SPLICING/STAR_to_cancer_introns.py \\
--SJ_tab_file ${split_junction} \\
--chimJ_file ${junction} \\
${bam_arg} \\
--output_prefix ${prefix} \\
--ctat_genome_lib ${genome_lib} \\
${args}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
ctat-splicing: $VERSION
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
def VERSION = '0.0.2' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
def create_igv_files = args.contains("--vis") ? "touch ${prefix}.introns.for_IGV.bed && touch ${prefix}.ctat-splicing.igv.html && touch ${prefix}.igv.tracks" : ""
"""
${create_igv_files}
touch ${prefix}.cancer_intron_reads.sorted.bam
touch ${prefix}.cancer_intron_reads.sorted.bam.bai
touch ${prefix}.gene_reads.sorted.sifted.bam
touch ${prefix}.gene_reads.sorted.sifted.bam.bai
touch ${prefix}.cancer.introns
touch ${prefix}.cancer.introns.prelim
touch ${prefix}.introns
touch ${prefix}.chckpts
cat <<-END_VERSIONS > versions.yml
"${task.process}":
ctat-splicing: $VERSION
END_VERSIONS
"""
}
69 changes: 69 additions & 0 deletions modules/local/ctatsplicing/startocancerintrons/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
nextflow_process {

name "Test Process CTATSPLICING_STARTOCANCERINTRONS"
script "../main.nf"
process "CTATSPLICING_STARTOCANCERINTRONS"
options "-stub"

test("test without BAM") {

when {
params {
outdir = "tests/results"
}
process {
"""
input[0] = [
[id:"test"],
file("test.SJ.out.tab"),
file("test.Chimeric.out.junctions"),
[],
[]
]
input[1] = [
[id:"reference"],
file("ctat_genome_lib")
]
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.findAll { key, value -> !key.isNumber() }).match() }
)
}
}

test("test with BAM") {

when {
params {
outdir = "tests/results"
}
process {
"""
input[0] = [
[id:"test"],
file("test.SJ.out.tab"),
file("test.Chimeric.out.junctions"),
file("test.Aligned.sortedByCoord.out.bam"),
[]
]
input[1] = [
[id:"reference"],
file("ctat_genome_lib")
]
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out.findAll { key, value -> !key.isNumber() }).match() }
)
}
}
}
Loading

0 comments on commit 3f4406a

Please sign in to comment.