diff --git a/CHANGELOG.md b/CHANGELOG.md index ad01da0d..6db05e56 100755 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [#220](https://github.com/nf-core/demultiplex/pull/220) Added kraken2. - [#221](https://github.com/nf-core/demultiplex/pull/221) Added checkqc_config to pipeline schema. - [#225](https://github.com/nf-core/demultiplex/pull/225) Added test profile for multi-lane samples, updated handling of such samples and adapter trimming. +- [#236](https://github.com/nf-core/demultiplex/pull/236) Add samplesheet generation. ### `Changed` diff --git a/docs/output.md b/docs/output.md index 3c168be7..49dbc39a 100755 --- a/docs/output.md +++ b/docs/output.md @@ -21,6 +21,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d - [Falco](#falco) - Raw read QC - [md5sum](#md5sum) - Creates an MD5 (128-bit) checksum of every fastq. - [kraken2](#kraken2) - Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads. +- [samplesheet](#samplesheet) - Samplesheet generation for downstream nf-core pipelines. - [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline ### bcl-convert @@ -204,6 +205,16 @@ Creates an MD5 (128-bit) checksum of every fastq. [Kraken](https://ccb.jhu.edu/software/kraken2/) is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps -mers to the lowest common ancestor (LCA) of all genomes known to contain a given k-mer. +### Downstream pipeline samplesheet + +
+Output files + +- `/samplesheet/` + - `*.csv`: Samplesheet with the generated FASTQ files formatted according to the selected downstream nf-core pipeline. Default: rnaseq format. + +
+ ### Adapter sequence removal from samplesheet
diff --git a/docs/usage.md b/docs/usage.md index 8af3d823..ba370802 100755 --- a/docs/usage.md +++ b/docs/usage.md @@ -6,17 +6,23 @@ ## Introduction -## Samplesheet input +> [!IMPORTANT] +> It is relevant to distinguish between the _pipeline_ samplesheet and the _flowcell_ samplesheet before working with this pipeline. +> +> - The **_pipeline_ samplesheet** is a file provided as input to the nf-core pipeline itself. It contains the overall configuration for your run, specifying the paths to individual _flowcell_ samplesheets, flowcell directories, and other metadata required to manage multiple sequencing runs. This is the primary configuration file that directs the pipeline on how to process your data. +> - The **_flowcell_ samplesheet** is specific to a particular sequencing run. It is typically created by the sequencing facility and contains the sample information, including barcodes, lane numbers, and indexes. The typical name is `SampleSheet.csv`. Each demultiplexer may require a different format for this file, which must be adhered to for proper data processing. -You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with at least 4 columns, and a header row as shown in the examples below. The input samplesheet is a comma-separated file that contains four columns: `id`, `samplesheet`, `lane`, `flowcell`. +## Pipeline samplesheet input -When using the demultiplexer fqtk, the samplesheet must contain an additional column `per_flowcell_manifest`. The column `per_flowcell_manifest` must contain two headers `fastq` and `read_structure`. As shown in the [example](https://github.com/fulcrumgenomics/nf-core-test-datasets/blob/fqtk/testdata/sim-data/per_flowcell_manifest.csv) provided each row must contain one fastq file name and the correlating read structure. +You will need to create a _pipeline_ samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with at least 4 columns, and a header row as shown in the examples below. The input _pipeline_ samplesheet is a comma-separated file that contains four columns: `id`, `samplesheet`, `lane`, `flowcell`. + +When using the demultiplexer fqtk, the _pipeline_ samplesheet must contain an additional column `per_flowcell_manifest`. The column `per_flowcell_manifest` must contain two headers `fastq` and `read_structure`. As shown in the [example](https://github.com/fulcrumgenomics/nf-core-test-datasets/blob/fqtk/testdata/sim-data/per_flowcell_manifest.csv) provided each row must contain one fastq file name and the correlating read structure. ```bash ---input '[path to samplesheet file]' +--input '[path to pipeline samplesheet file]' ``` -### Full samplesheet +#### Example: Pipeline samplesheet ```csv title="samplesheet.csv" id,samplesheet,lane,flowcell @@ -29,17 +35,15 @@ DDMMYY_SERIAL_NUMBER_FC3,/path/to/SampleSheet3.csv,3,/path/to/sequencer/output3 | Column | Description | | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | `id` | Flowcell id | -| `samplesheet` | Full path to the `SampleSheet.csv` file containing the sample information and indexes | +| `samplesheet` | Full path to the _flowcell_ `SampleSheet.csv` file containing the sample information and indexes | | `lane` | Optional lane number. When a lane number is provided, only the given lane will be demultiplexed | | `flowcell` | Full path to the Illumina sequencer output directory (often referred as run directory) or a `tar.gz` file containing the contents of said directory | -An [example samplesheet](https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/samplesheet/1.3.0/flowcell_input.csv) has been provided with the pipeline. +An [example _pipeline_ samplesheet](https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/samplesheet/1.3.0/flowcell_input.csv) has been provided with the pipeline. Note that the run directory in the `flowcell` column must lead to a `tar.gz` for compatibility with the demultiplexers sgdemux and fqtk. -Each demultiplexing software uses a distinct samplesheet format. Below are examples for demultiplexer-specific samplesheets. Please see the following examples to format `SampleSheet.csv` for [sgdemux](https://github.com/nf-core/test-datasets/blob/demultiplex/testdata/sim-data/out.sample_meta.csv), [fqtk](https://github.com/fulcrumgenomics/nf-core-test-datasets/raw/fqtk/testdata/sim-data/fqtk_samplesheet.csv), and [bcl2fastq and bclconvert](https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/samplesheet/1.3.0/b2fq-samplesheet.csv) - -### Samplesheet for fqtk +#### Example: Pipeline samplesheet for fqtk ```csv title="samplesheet.csv" id,samplesheet,lane,flowcell,per_flowcell_manifest @@ -52,17 +56,30 @@ DDMMYY_SERIAL_NUMBER_FC3,/path/to/SampleSheet3.csv,3,/path/to/sequencer/output3, | Column | Description | | ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | `id` | Flowcell id | -| `samplesheet` | Full path to the `SampleSheet.csv` file containing the sample information and indexes | +| `samplesheet` | Full path to the _flowcell_ `SampleSheet.csv` file containing the sample information and indexes | | `lane` | Optional lane number. When a lane number is provided, only the given lane will be demultiplexed | | `flowcell` | Full path to the Illumina sequencer output directory (often referred as run directory) or a `tar.gz` file containing the contents of said directory | | `per_flowcell_manifest` | Full path to the flowcell manifest, containing the fastq file names and read structures | +### Flowcell samplesheet + +Each demultiplexing software uses a distinct _flowcell_ samplesheet format. Below are examples for demultiplexer-specific _flowcell_ samplesheets. Please see the following examples to format the _flowcell_ `SampleSheet.csv`: + +| Demultiplexer | Example _flowcell_ `SampleSheet.csv` Format | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | +| **sgdemux** | [sgdemux SampleSheet.csv](https://github.com/nf-core/test-datasets/blob/demultiplex/testdata/sim-data/out.sample_meta.csv) | +| **fqtk** | [fqtk SampleSheet.csv](https://github.com/fulcrumgenomics/nf-core-test-datasets/raw/fqtk/testdata/sim-data/fqtk_samplesheet.csv) | +| **bcl2fastq and bclconvert** | [bcl2fastq and bclconvert SampleSheet.csv](https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/samplesheet/1.3.0/b2fq-samplesheet.csv) | + ## Running the pipeline The typical command for running the pipeline is as follows: ```bash -nextflow run nf-core/demultiplex --input ./samplesheet.csv --outdir ./results -profile docker +nextflow run nf-core/demultiplex \ + --input pipeline_samplesheet.csv \ + --outdir results \ + -profile docker ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. diff --git a/modules/local/fastq_to_samplesheet/main.nf b/modules/local/fastq_to_samplesheet/main.nf new file mode 100644 index 00000000..f79cabcf --- /dev/null +++ b/modules/local/fastq_to_samplesheet/main.nf @@ -0,0 +1,45 @@ +process FASTQ_TO_SAMPLESHEET { + tag "$meta.id" + + executor 'local' + memory 100.MB + + input: + val meta + val pipeline + val strandedness + + output: + tuple val(meta), path("*samplesheet.csv"), emit: samplesheet + + exec: + + // Add relevant fields to the map + def pipeline_map = [ + sample : meta.samplename, + fastq_1 : meta.fastq_1 + ] + + // Add fastq_2 if it's a paired-end sample + if (!meta.single_end) { + pipeline_map.fastq_2 = meta.fastq_2 + } + + // Add pipeline-specific entries + if (pipeline == 'rnaseq') { + pipeline_map << [ strandedness: strandedness ] + } else if (pipeline == 'atacseq') { + pipeline_map << [ replicate: 1 ] + } else if (pipeline == 'taxprofiler') { + pipeline_map << [ fasta: '' ] + } + + // Create the samplesheet content + def samplesheet = pipeline_map.keySet().collect { '"' + it + '"' }.join(",") + '\n' + samplesheet += pipeline_map.values().collect { '"' + it + '"' }.join(",") + + // Write samplesheet to file + def samplesheet_file = task.workDir.resolve("${meta.id}.samplesheet.csv") + samplesheet_file.text = samplesheet + +} diff --git a/modules/local/fastq_to_samplesheet/tests/main.nf.test b/modules/local/fastq_to_samplesheet/tests/main.nf.test new file mode 100644 index 00000000..62a57ccf --- /dev/null +++ b/modules/local/fastq_to_samplesheet/tests/main.nf.test @@ -0,0 +1,30 @@ +nextflow_process { + + name "Test Process FASTQ_TO_SAMPLESHEET" + script "../main.nf" + process "FASTQ_TO_SAMPLESHEET" + + tag "modules" + tag "modules_local" + tag "fastq_to_samplesheet" + + test("Should run without failures") { + + when { + process { + """ + input[0] = Channel.of([[id:'Sample1_S1_L001', samplename:'Sample1', fcid:'220422_M11111_0222_000000000-K9H97', lane:'1', empty:false, single_end:true, fastq_1:'Sample1_S1_L001_R1_001.fastq.gz']]) + input[1] = 'rnaseq' + input[2] = 'auto' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } +} diff --git a/modules/local/fastq_to_samplesheet/tests/main.nf.test.snap b/modules/local/fastq_to_samplesheet/tests/main.nf.test.snap new file mode 100644 index 00000000..ebf0bb74 --- /dev/null +++ b/modules/local/fastq_to_samplesheet/tests/main.nf.test.snap @@ -0,0 +1,45 @@ +{ + "Should run without failures": { + "content": [ + { + "0": [ + [ + [ + { + "id": "Sample1_S1_L001", + "samplename": "Sample1", + "fcid": "220422_M11111_0222_000000000-K9H97", + "lane": "1", + "empty": false, + "single_end": true, + "fastq_1": "Sample1_S1_L001_R1_001.fastq.gz" + } + ], + "[Sample1_S1_L001].samplesheet.csv:md5,bc779a8b2302a093cbb04a118bb5c90f" + ] + ], + "samplesheet": [ + [ + [ + { + "id": "Sample1_S1_L001", + "samplename": "Sample1", + "fcid": "220422_M11111_0222_000000000-K9H97", + "lane": "1", + "empty": false, + "single_end": true, + "fastq_1": "Sample1_S1_L001_R1_001.fastq.gz" + } + ], + "[Sample1_S1_L001].samplesheet.csv:md5,bc779a8b2302a093cbb04a118bb5c90f" + ] + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.4" + }, + "timestamp": "2024-08-09T22:00:18.282617632" + } +} \ No newline at end of file diff --git a/nextflow.config b/nextflow.config index 1e1ab763..81fa6519 100755 --- a/nextflow.config +++ b/nextflow.config @@ -11,7 +11,7 @@ params { // Options: Generic input = null - demultiplexer = "bclconvert" // [bclconvert, bcl2fastq, bases2fastq, fqtk, sgdemux, mkfastq] + demultiplexer = "bclconvert" // enum string [bclconvert, bcl2fastq, bases2fastq, fqtk, sgdemux, mkfastq] // Options: trimming trim_fastq = true // [true, false] @@ -25,6 +25,10 @@ params { // Kraken2 options kraken_db = null // file .tar.gz + + // Downstream Nextflow pipeline + downstream_pipeline = "default" // enum string [rnaseq, atacseq, taxprofiler, default] + // Options: CheckQC checkqc_config = [] // file .yaml diff --git a/nextflow_schema.json b/nextflow_schema.json index 9db9ac30..3d4acd22 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -29,8 +29,13 @@ "kraken_db": { "type": "string", "format": "path", - "default": null, - "description": "path to Kraken2 DB to use for screening" + "description": "Path to Kraken2 DB to use for screening" + }, + "downstream_pipeline": { + "type": "string", + "description": "Name of downstream nf-core pipeline (one of: rnaseq, atacseq, taxprofiler or default). Used to produce the input samplesheet for that pipeline.", + "default": "default", + "enum": ["rnaseq", "atacseq", "taxprofiler", "default"] } } }, diff --git a/tests/pipeline/bases2fastq.nf.test b/tests/pipeline/bases2fastq.nf.test index 56df79b0..9802e65b 100644 --- a/tests/pipeline/bases2fastq.nf.test +++ b/tests/pipeline/bases2fastq.nf.test @@ -19,7 +19,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") }, - { assert workflow.trace.succeeded().size() == 7 }, + { assert workflow.trace.succeeded().size() == 8 }, { assert snapshot( // FIXME // path("$outputDir/sim-data/DefaultSample_R1.fastq.gz.md5"), diff --git a/tests/pipeline/bcl2fastq.nf.test b/tests/pipeline/bcl2fastq.nf.test index 2bada9be..027784f9 100644 --- a/tests/pipeline/bcl2fastq.nf.test +++ b/tests/pipeline/bcl2fastq.nf.test @@ -20,7 +20,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") }, - { assert workflow.trace.succeeded().size() == 5 }, + { assert workflow.trace.succeeded().size() == 6 }, { assert snapshot( path("$outputDir/multiqc/multiqc_data/bcl2fastq_lane_counts.txt"), path("$outputDir/multiqc/multiqc_data/fastp_filtered_reads_plot.txt"), diff --git a/tests/pipeline/bclconvert.nf.test b/tests/pipeline/bclconvert.nf.test index 0c094c3f..4056af8b 100644 --- a/tests/pipeline/bclconvert.nf.test +++ b/tests/pipeline/bclconvert.nf.test @@ -19,7 +19,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") }, - { assert workflow.trace.succeeded().size() == 5 }, + { assert workflow.trace.succeeded().size() == 6 }, { assert snapshot( path("$outputDir/multiqc/multiqc_data/bclconvert_lane_counts.txt"), path("$outputDir/multiqc/multiqc_data/fastp_filtered_reads_plot.txt"), diff --git a/tests/pipeline/fqtk.nf.test b/tests/pipeline/fqtk.nf.test index 633b88a1..8576ee9b 100644 --- a/tests/pipeline/fqtk.nf.test +++ b/tests/pipeline/fqtk.nf.test @@ -19,7 +19,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") }, - { assert workflow.trace.succeeded().size() == 104 }, + { assert workflow.trace.succeeded().size() == 129 }, { assert snapshot(path("$outputDir/test/demux-metrics.txt")).match("fqtk") }, { assert new File("$outputDir/test/unmatched_1.fastp.fastq.gz").exists() }, { assert new File("$outputDir/test/unmatched_2.fastp.fastq.gz").exists() }, diff --git a/tests/pipeline/kraken.nf.test b/tests/pipeline/kraken.nf.test index 20afde6c..73ccc70c 100644 --- a/tests/pipeline/kraken.nf.test +++ b/tests/pipeline/kraken.nf.test @@ -21,7 +21,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") }, - { assert workflow.trace.succeeded().size() == 8 }, + { assert workflow.trace.succeeded().size() == 9 }, { assert snapshot( path("$outputDir/multiqc/multiqc_data/bcl2fastq_lane_counts.txt"), path("$outputDir/multiqc/multiqc_data/fastp_filtered_reads_plot.txt"), diff --git a/tests/pipeline/kraken.nf.test.snap b/tests/pipeline/kraken.nf.test.snap index 74d978b9..af9781bb 100644 --- a/tests/pipeline/kraken.nf.test.snap +++ b/tests/pipeline/kraken.nf.test.snap @@ -57,9 +57,9 @@ ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.0" + "nextflow": "24.04.4" }, - "timestamp": "2024-08-05T22:49:12.12938394" + "timestamp": "2024-08-09T17:17:23.034777828" }, "software_versions": { "content": [ @@ -67,9 +67,9 @@ ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.0" + "nextflow": "24.04.4" }, - "timestamp": "2024-08-01T22:34:15.140488001" + "timestamp": "2024-08-09T17:17:22.999406989" }, "multiqc": { "content": [ @@ -80,8 +80,8 @@ ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.0" + "nextflow": "24.04.4" }, - "timestamp": "2024-08-05T22:49:08.601265877" + "timestamp": "2024-08-09T17:17:23.014483899" } } \ No newline at end of file diff --git a/tests/pipeline/mkfastq.nf.test b/tests/pipeline/mkfastq.nf.test index 09b2f13f..3c77944e 100644 --- a/tests/pipeline/mkfastq.nf.test +++ b/tests/pipeline/mkfastq.nf.test @@ -19,9 +19,9 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") }, - { assert workflow.trace.succeeded().size() == 6 }, + { assert workflow.trace.succeeded().size() == 7 }, // How many directories were produced? - {assert path("${outputDir}").list().size() == 4}, + {assert path("${outputDir}").list().size() == 6}, // How many files were produced? {assert path("$outputDir/220422_M11111_0222_000000000-K9H97_mkfastq/").list().size() == 2}, {assert path("$outputDir/multiqc/").list().size() == 3}, diff --git a/tests/pipeline/sgdemux.nf.test b/tests/pipeline/sgdemux.nf.test index a740c4ef..de48242b 100644 --- a/tests/pipeline/sgdemux.nf.test +++ b/tests/pipeline/sgdemux.nf.test @@ -19,7 +19,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") }, - { assert workflow.trace.succeeded().size() == 103 }, + { assert workflow.trace.succeeded().size() == 128 }, { assert snapshot( path("$outputDir/sim-data/metrics.tsv"), path("$outputDir/sim-data/per_project_metrics.tsv"), diff --git a/tests/pipeline/skip_tools.nf.test b/tests/pipeline/skip_tools.nf.test index 1de176d0..2f0d1cc7 100644 --- a/tests/pipeline/skip_tools.nf.test +++ b/tests/pipeline/skip_tools.nf.test @@ -21,7 +21,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_trimming") }, - { assert workflow.trace.succeeded().size() == 5 }, + { assert workflow.trace.succeeded().size() == 6 }, { assert path("$outputDir/multiqc/multiqc_report.html").exists() }, { assert snapshot( path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"), @@ -49,7 +49,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_fastp") }, - { assert workflow.trace.succeeded().size() == 4 }, + { assert workflow.trace.succeeded().size() == 5 }, { assert path("$outputDir/multiqc/multiqc_report.html").exists() }, { assert snapshot( path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"), @@ -77,7 +77,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_fastqc") }, - { assert workflow.trace.succeeded().size() == 5 }, + { assert workflow.trace.succeeded().size() == 6 }, { assert path("$outputDir/multiqc/multiqc_report.html").exists() }, { assert snapshot( path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"), @@ -105,7 +105,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_fastp_fastqc") }, - { assert workflow.trace.succeeded().size() == 4 }, + { assert workflow.trace.succeeded().size() == 5 }, { assert path("$outputDir/multiqc/multiqc_report.html").exists() }, { assert snapshot( path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"), @@ -133,7 +133,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_multiqc") }, - { assert workflow.trace.succeeded().size() == 4 }, + { assert workflow.trace.succeeded().size() == 5 }, { assert !path("$outputDir/multiqc/multiqc_report.html").exists() }, { assert snapshot( path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"), diff --git a/tests/pipeline/test_pe.nf.test b/tests/pipeline/test_pe.nf.test index bba8f754..8b9ae237 100644 --- a/tests/pipeline/test_pe.nf.test +++ b/tests/pipeline/test_pe.nf.test @@ -20,7 +20,7 @@ nextflow_pipeline { assertAll( { assert workflow.success }, { assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") }, - { assert workflow.trace.succeeded().size() == 6 }, + { assert workflow.trace.succeeded().size() == 7 }, { assert snapshot( path("$outputDir/multiqc/multiqc_data/bcl2fastq_lane_counts.txt"), path("$outputDir/multiqc/multiqc_data/fastp_filtered_reads_plot.txt"), diff --git a/workflows/demultiplex.nf b/workflows/demultiplex.nf index c48c9940..1f8576a1 100644 --- a/workflows/demultiplex.nf +++ b/workflows/demultiplex.nf @@ -15,6 +15,7 @@ include { FQTK_DEMULTIPLEX } from '../subworkflows/local/fqtk_demultipl include { MKFASTQ_DEMULTIPLEX } from '../subworkflows/local/mkfastq_demultiplex/main' include { SINGULAR_DEMULTIPLEX } from '../subworkflows/local/singular_demultiplex/main' include { RUNDIR_CHECKQC } from '../subworkflows/local/rundir_checkqc/main' +include { FASTQ_TO_SAMPLESHEET } from '../modules/local/fastq_to_samplesheet/main' // @@ -48,11 +49,12 @@ workflow DEMULTIPLEX { main: // Value inputs - demultiplexer = params.demultiplexer // string: bases2fastq, bcl2fastq, bclconvert, fqtk, sgdemux, mkfastq - trim_fastq = params.trim_fastq // boolean: true, false - skip_tools = params.skip_tools ? params.skip_tools.split(',') : [] // list: [falco, fastp, multiqc] - sample_size = params.sample_size // int - kraken_db = params.kraken_db // path + demultiplexer = params.demultiplexer // string: bases2fastq, bcl2fastq, bclconvert, fqtk, sgdemux, mkfastq + trim_fastq = params.trim_fastq // boolean: true, false + skip_tools = params.skip_tools ? params.skip_tools.split(',') : [] // list: [falco, fastp, multiqc] + sample_size = params.sample_size // int + kraken_db = params.kraken_db // path + downstream_pipeline = params.downstream_pipeline // string: rnaseq, atacseq, taxprofiler // Channel inputs @@ -250,6 +252,30 @@ workflow DEMULTIPLEX { ch_versions = ch_versions.mix(FASTQ_CONTAM_SEQTK_KRAKEN.out.versions) ch_multiqc_files = ch_multiqc_files.mix( FASTQ_CONTAM_SEQTK_KRAKEN.out.reports.map { meta, log -> return log }) } + + // Prepare metamap with fastq info + ch_meta_fastq = ch_raw_fastq.map { meta, fastq_files -> + // Determine the publish directory based on the lane information + def publish_dir = meta.lane ? "${params.outdir}/${meta.id}/L00${meta.lane}" : "${params.outdir}/${meta.id}" + meta.fastq_1 = "${publish_dir}/${fastq_files[0].getName()}" + + // Add full path for fastq_2 to the metadata if the sample is not single-end + if (!meta.single_end) { + meta.fastq_2 = "${publish_dir}/${fastq_files[1].getName()}" + } + return meta + } + + // Module: FASTQ to samplesheet + FASTQ_TO_SAMPLESHEET(ch_meta_fastq, downstream_pipeline, 'auto') + + FASTQ_TO_SAMPLESHEET.out.samplesheet + .map { it[1] } + .collectFile(name:'tmp_samplesheet.csv', newLine: true, keepHeader: true, sort: { it.baseName }) + .map { it.text.tokenize('\n').join('\n') } + .collectFile(name:'samplesheet.csv', storeDir: "${params.outdir}/samplesheet") + .set { ch_samplesheet } + // // Collate and save software versions //