diff --git a/CHANGELOG.md b/CHANGELOG.md
index ad01da0d..6db05e56 100755
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#220](https://github.com/nf-core/demultiplex/pull/220) Added kraken2.
- [#221](https://github.com/nf-core/demultiplex/pull/221) Added checkqc_config to pipeline schema.
- [#225](https://github.com/nf-core/demultiplex/pull/225) Added test profile for multi-lane samples, updated handling of such samples and adapter trimming.
+- [#236](https://github.com/nf-core/demultiplex/pull/236) Add samplesheet generation.
### `Changed`
diff --git a/docs/output.md b/docs/output.md
index 3c168be7..49dbc39a 100755
--- a/docs/output.md
+++ b/docs/output.md
@@ -21,6 +21,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Falco](#falco) - Raw read QC
- [md5sum](#md5sum) - Creates an MD5 (128-bit) checksum of every fastq.
- [kraken2](#kraken2) - Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads.
+- [samplesheet](#samplesheet) - Samplesheet generation for downstream nf-core pipelines.
- [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline
### bcl-convert
@@ -204,6 +205,16 @@ Creates an MD5 (128-bit) checksum of every fastq.
[Kraken](https://ccb.jhu.edu/software/kraken2/) is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps -mers to the lowest common ancestor (LCA) of all genomes known to contain a given k-mer.
+### Downstream pipeline samplesheet
+
+
+Output files
+
+- `/samplesheet/`
+ - `*.csv`: Samplesheet with the generated FASTQ files formatted according to the selected downstream nf-core pipeline. Default: rnaseq format.
+
+
+
### Adapter sequence removal from samplesheet
diff --git a/docs/usage.md b/docs/usage.md
index 8af3d823..ba370802 100755
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -6,17 +6,23 @@
## Introduction
-## Samplesheet input
+> [!IMPORTANT]
+> It is relevant to distinguish between the _pipeline_ samplesheet and the _flowcell_ samplesheet before working with this pipeline.
+>
+> - The **_pipeline_ samplesheet** is a file provided as input to the nf-core pipeline itself. It contains the overall configuration for your run, specifying the paths to individual _flowcell_ samplesheets, flowcell directories, and other metadata required to manage multiple sequencing runs. This is the primary configuration file that directs the pipeline on how to process your data.
+> - The **_flowcell_ samplesheet** is specific to a particular sequencing run. It is typically created by the sequencing facility and contains the sample information, including barcodes, lane numbers, and indexes. The typical name is `SampleSheet.csv`. Each demultiplexer may require a different format for this file, which must be adhered to for proper data processing.
-You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with at least 4 columns, and a header row as shown in the examples below. The input samplesheet is a comma-separated file that contains four columns: `id`, `samplesheet`, `lane`, `flowcell`.
+## Pipeline samplesheet input
-When using the demultiplexer fqtk, the samplesheet must contain an additional column `per_flowcell_manifest`. The column `per_flowcell_manifest` must contain two headers `fastq` and `read_structure`. As shown in the [example](https://github.com/fulcrumgenomics/nf-core-test-datasets/blob/fqtk/testdata/sim-data/per_flowcell_manifest.csv) provided each row must contain one fastq file name and the correlating read structure.
+You will need to create a _pipeline_ samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with at least 4 columns, and a header row as shown in the examples below. The input _pipeline_ samplesheet is a comma-separated file that contains four columns: `id`, `samplesheet`, `lane`, `flowcell`.
+
+When using the demultiplexer fqtk, the _pipeline_ samplesheet must contain an additional column `per_flowcell_manifest`. The column `per_flowcell_manifest` must contain two headers `fastq` and `read_structure`. As shown in the [example](https://github.com/fulcrumgenomics/nf-core-test-datasets/blob/fqtk/testdata/sim-data/per_flowcell_manifest.csv) provided each row must contain one fastq file name and the correlating read structure.
```bash
---input '[path to samplesheet file]'
+--input '[path to pipeline samplesheet file]'
```
-### Full samplesheet
+#### Example: Pipeline samplesheet
```csv title="samplesheet.csv"
id,samplesheet,lane,flowcell
@@ -29,17 +35,15 @@ DDMMYY_SERIAL_NUMBER_FC3,/path/to/SampleSheet3.csv,3,/path/to/sequencer/output3
| Column | Description |
| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `id` | Flowcell id |
-| `samplesheet` | Full path to the `SampleSheet.csv` file containing the sample information and indexes |
+| `samplesheet` | Full path to the _flowcell_ `SampleSheet.csv` file containing the sample information and indexes |
| `lane` | Optional lane number. When a lane number is provided, only the given lane will be demultiplexed |
| `flowcell` | Full path to the Illumina sequencer output directory (often referred as run directory) or a `tar.gz` file containing the contents of said directory |
-An [example samplesheet](https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/samplesheet/1.3.0/flowcell_input.csv) has been provided with the pipeline.
+An [example _pipeline_ samplesheet](https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/samplesheet/1.3.0/flowcell_input.csv) has been provided with the pipeline.
Note that the run directory in the `flowcell` column must lead to a `tar.gz` for compatibility with the demultiplexers sgdemux and fqtk.
-Each demultiplexing software uses a distinct samplesheet format. Below are examples for demultiplexer-specific samplesheets. Please see the following examples to format `SampleSheet.csv` for [sgdemux](https://github.com/nf-core/test-datasets/blob/demultiplex/testdata/sim-data/out.sample_meta.csv), [fqtk](https://github.com/fulcrumgenomics/nf-core-test-datasets/raw/fqtk/testdata/sim-data/fqtk_samplesheet.csv), and [bcl2fastq and bclconvert](https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/samplesheet/1.3.0/b2fq-samplesheet.csv)
-
-### Samplesheet for fqtk
+#### Example: Pipeline samplesheet for fqtk
```csv title="samplesheet.csv"
id,samplesheet,lane,flowcell,per_flowcell_manifest
@@ -52,17 +56,30 @@ DDMMYY_SERIAL_NUMBER_FC3,/path/to/SampleSheet3.csv,3,/path/to/sequencer/output3,
| Column | Description |
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `id` | Flowcell id |
-| `samplesheet` | Full path to the `SampleSheet.csv` file containing the sample information and indexes |
+| `samplesheet` | Full path to the _flowcell_ `SampleSheet.csv` file containing the sample information and indexes |
| `lane` | Optional lane number. When a lane number is provided, only the given lane will be demultiplexed |
| `flowcell` | Full path to the Illumina sequencer output directory (often referred as run directory) or a `tar.gz` file containing the contents of said directory |
| `per_flowcell_manifest` | Full path to the flowcell manifest, containing the fastq file names and read structures |
+### Flowcell samplesheet
+
+Each demultiplexing software uses a distinct _flowcell_ samplesheet format. Below are examples for demultiplexer-specific _flowcell_ samplesheets. Please see the following examples to format the _flowcell_ `SampleSheet.csv`:
+
+| Demultiplexer | Example _flowcell_ `SampleSheet.csv` Format |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| **sgdemux** | [sgdemux SampleSheet.csv](https://github.com/nf-core/test-datasets/blob/demultiplex/testdata/sim-data/out.sample_meta.csv) |
+| **fqtk** | [fqtk SampleSheet.csv](https://github.com/fulcrumgenomics/nf-core-test-datasets/raw/fqtk/testdata/sim-data/fqtk_samplesheet.csv) |
+| **bcl2fastq and bclconvert** | [bcl2fastq and bclconvert SampleSheet.csv](https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/samplesheet/1.3.0/b2fq-samplesheet.csv) |
+
## Running the pipeline
The typical command for running the pipeline is as follows:
```bash
-nextflow run nf-core/demultiplex --input ./samplesheet.csv --outdir ./results -profile docker
+nextflow run nf-core/demultiplex \
+ --input pipeline_samplesheet.csv \
+ --outdir results \
+ -profile docker
```
This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
diff --git a/modules/local/fastq_to_samplesheet/main.nf b/modules/local/fastq_to_samplesheet/main.nf
new file mode 100644
index 00000000..f79cabcf
--- /dev/null
+++ b/modules/local/fastq_to_samplesheet/main.nf
@@ -0,0 +1,45 @@
+process FASTQ_TO_SAMPLESHEET {
+ tag "$meta.id"
+
+ executor 'local'
+ memory 100.MB
+
+ input:
+ val meta
+ val pipeline
+ val strandedness
+
+ output:
+ tuple val(meta), path("*samplesheet.csv"), emit: samplesheet
+
+ exec:
+
+ // Add relevant fields to the map
+ def pipeline_map = [
+ sample : meta.samplename,
+ fastq_1 : meta.fastq_1
+ ]
+
+ // Add fastq_2 if it's a paired-end sample
+ if (!meta.single_end) {
+ pipeline_map.fastq_2 = meta.fastq_2
+ }
+
+ // Add pipeline-specific entries
+ if (pipeline == 'rnaseq') {
+ pipeline_map << [ strandedness: strandedness ]
+ } else if (pipeline == 'atacseq') {
+ pipeline_map << [ replicate: 1 ]
+ } else if (pipeline == 'taxprofiler') {
+ pipeline_map << [ fasta: '' ]
+ }
+
+ // Create the samplesheet content
+ def samplesheet = pipeline_map.keySet().collect { '"' + it + '"' }.join(",") + '\n'
+ samplesheet += pipeline_map.values().collect { '"' + it + '"' }.join(",")
+
+ // Write samplesheet to file
+ def samplesheet_file = task.workDir.resolve("${meta.id}.samplesheet.csv")
+ samplesheet_file.text = samplesheet
+
+}
diff --git a/modules/local/fastq_to_samplesheet/tests/main.nf.test b/modules/local/fastq_to_samplesheet/tests/main.nf.test
new file mode 100644
index 00000000..62a57ccf
--- /dev/null
+++ b/modules/local/fastq_to_samplesheet/tests/main.nf.test
@@ -0,0 +1,30 @@
+nextflow_process {
+
+ name "Test Process FASTQ_TO_SAMPLESHEET"
+ script "../main.nf"
+ process "FASTQ_TO_SAMPLESHEET"
+
+ tag "modules"
+ tag "modules_local"
+ tag "fastq_to_samplesheet"
+
+ test("Should run without failures") {
+
+ when {
+ process {
+ """
+ input[0] = Channel.of([[id:'Sample1_S1_L001', samplename:'Sample1', fcid:'220422_M11111_0222_000000000-K9H97', lane:'1', empty:false, single_end:true, fastq_1:'Sample1_S1_L001_R1_001.fastq.gz']])
+ input[1] = 'rnaseq'
+ input[2] = 'auto'
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+}
diff --git a/modules/local/fastq_to_samplesheet/tests/main.nf.test.snap b/modules/local/fastq_to_samplesheet/tests/main.nf.test.snap
new file mode 100644
index 00000000..ebf0bb74
--- /dev/null
+++ b/modules/local/fastq_to_samplesheet/tests/main.nf.test.snap
@@ -0,0 +1,45 @@
+{
+ "Should run without failures": {
+ "content": [
+ {
+ "0": [
+ [
+ [
+ {
+ "id": "Sample1_S1_L001",
+ "samplename": "Sample1",
+ "fcid": "220422_M11111_0222_000000000-K9H97",
+ "lane": "1",
+ "empty": false,
+ "single_end": true,
+ "fastq_1": "Sample1_S1_L001_R1_001.fastq.gz"
+ }
+ ],
+ "[Sample1_S1_L001].samplesheet.csv:md5,bc779a8b2302a093cbb04a118bb5c90f"
+ ]
+ ],
+ "samplesheet": [
+ [
+ [
+ {
+ "id": "Sample1_S1_L001",
+ "samplename": "Sample1",
+ "fcid": "220422_M11111_0222_000000000-K9H97",
+ "lane": "1",
+ "empty": false,
+ "single_end": true,
+ "fastq_1": "Sample1_S1_L001_R1_001.fastq.gz"
+ }
+ ],
+ "[Sample1_S1_L001].samplesheet.csv:md5,bc779a8b2302a093cbb04a118bb5c90f"
+ ]
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-08-09T22:00:18.282617632"
+ }
+}
\ No newline at end of file
diff --git a/nextflow.config b/nextflow.config
index 1e1ab763..81fa6519 100755
--- a/nextflow.config
+++ b/nextflow.config
@@ -11,7 +11,7 @@ params {
// Options: Generic
input = null
- demultiplexer = "bclconvert" // [bclconvert, bcl2fastq, bases2fastq, fqtk, sgdemux, mkfastq]
+ demultiplexer = "bclconvert" // enum string [bclconvert, bcl2fastq, bases2fastq, fqtk, sgdemux, mkfastq]
// Options: trimming
trim_fastq = true // [true, false]
@@ -25,6 +25,10 @@ params {
// Kraken2 options
kraken_db = null // file .tar.gz
+
+ // Downstream Nextflow pipeline
+ downstream_pipeline = "default" // enum string [rnaseq, atacseq, taxprofiler, default]
+
// Options: CheckQC
checkqc_config = [] // file .yaml
diff --git a/nextflow_schema.json b/nextflow_schema.json
index 9db9ac30..3d4acd22 100644
--- a/nextflow_schema.json
+++ b/nextflow_schema.json
@@ -29,8 +29,13 @@
"kraken_db": {
"type": "string",
"format": "path",
- "default": null,
- "description": "path to Kraken2 DB to use for screening"
+ "description": "Path to Kraken2 DB to use for screening"
+ },
+ "downstream_pipeline": {
+ "type": "string",
+ "description": "Name of downstream nf-core pipeline (one of: rnaseq, atacseq, taxprofiler or default). Used to produce the input samplesheet for that pipeline.",
+ "default": "default",
+ "enum": ["rnaseq", "atacseq", "taxprofiler", "default"]
}
}
},
diff --git a/tests/pipeline/bases2fastq.nf.test b/tests/pipeline/bases2fastq.nf.test
index 56df79b0..9802e65b 100644
--- a/tests/pipeline/bases2fastq.nf.test
+++ b/tests/pipeline/bases2fastq.nf.test
@@ -19,7 +19,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") },
- { assert workflow.trace.succeeded().size() == 7 },
+ { assert workflow.trace.succeeded().size() == 8 },
{ assert snapshot(
// FIXME
// path("$outputDir/sim-data/DefaultSample_R1.fastq.gz.md5"),
diff --git a/tests/pipeline/bcl2fastq.nf.test b/tests/pipeline/bcl2fastq.nf.test
index 2bada9be..027784f9 100644
--- a/tests/pipeline/bcl2fastq.nf.test
+++ b/tests/pipeline/bcl2fastq.nf.test
@@ -20,7 +20,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") },
- { assert workflow.trace.succeeded().size() == 5 },
+ { assert workflow.trace.succeeded().size() == 6 },
{ assert snapshot(
path("$outputDir/multiqc/multiqc_data/bcl2fastq_lane_counts.txt"),
path("$outputDir/multiqc/multiqc_data/fastp_filtered_reads_plot.txt"),
diff --git a/tests/pipeline/bclconvert.nf.test b/tests/pipeline/bclconvert.nf.test
index 0c094c3f..4056af8b 100644
--- a/tests/pipeline/bclconvert.nf.test
+++ b/tests/pipeline/bclconvert.nf.test
@@ -19,7 +19,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") },
- { assert workflow.trace.succeeded().size() == 5 },
+ { assert workflow.trace.succeeded().size() == 6 },
{ assert snapshot(
path("$outputDir/multiqc/multiqc_data/bclconvert_lane_counts.txt"),
path("$outputDir/multiqc/multiqc_data/fastp_filtered_reads_plot.txt"),
diff --git a/tests/pipeline/fqtk.nf.test b/tests/pipeline/fqtk.nf.test
index 633b88a1..8576ee9b 100644
--- a/tests/pipeline/fqtk.nf.test
+++ b/tests/pipeline/fqtk.nf.test
@@ -19,7 +19,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") },
- { assert workflow.trace.succeeded().size() == 104 },
+ { assert workflow.trace.succeeded().size() == 129 },
{ assert snapshot(path("$outputDir/test/demux-metrics.txt")).match("fqtk") },
{ assert new File("$outputDir/test/unmatched_1.fastp.fastq.gz").exists() },
{ assert new File("$outputDir/test/unmatched_2.fastp.fastq.gz").exists() },
diff --git a/tests/pipeline/kraken.nf.test b/tests/pipeline/kraken.nf.test
index 20afde6c..73ccc70c 100644
--- a/tests/pipeline/kraken.nf.test
+++ b/tests/pipeline/kraken.nf.test
@@ -21,7 +21,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") },
- { assert workflow.trace.succeeded().size() == 8 },
+ { assert workflow.trace.succeeded().size() == 9 },
{ assert snapshot(
path("$outputDir/multiqc/multiqc_data/bcl2fastq_lane_counts.txt"),
path("$outputDir/multiqc/multiqc_data/fastp_filtered_reads_plot.txt"),
diff --git a/tests/pipeline/kraken.nf.test.snap b/tests/pipeline/kraken.nf.test.snap
index 74d978b9..af9781bb 100644
--- a/tests/pipeline/kraken.nf.test.snap
+++ b/tests/pipeline/kraken.nf.test.snap
@@ -57,9 +57,9 @@
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.0"
+ "nextflow": "24.04.4"
},
- "timestamp": "2024-08-05T22:49:12.12938394"
+ "timestamp": "2024-08-09T17:17:23.034777828"
},
"software_versions": {
"content": [
@@ -67,9 +67,9 @@
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.0"
+ "nextflow": "24.04.4"
},
- "timestamp": "2024-08-01T22:34:15.140488001"
+ "timestamp": "2024-08-09T17:17:22.999406989"
},
"multiqc": {
"content": [
@@ -80,8 +80,8 @@
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.0"
+ "nextflow": "24.04.4"
},
- "timestamp": "2024-08-05T22:49:08.601265877"
+ "timestamp": "2024-08-09T17:17:23.014483899"
}
}
\ No newline at end of file
diff --git a/tests/pipeline/mkfastq.nf.test b/tests/pipeline/mkfastq.nf.test
index 09b2f13f..3c77944e 100644
--- a/tests/pipeline/mkfastq.nf.test
+++ b/tests/pipeline/mkfastq.nf.test
@@ -19,9 +19,9 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") },
- { assert workflow.trace.succeeded().size() == 6 },
+ { assert workflow.trace.succeeded().size() == 7 },
// How many directories were produced?
- {assert path("${outputDir}").list().size() == 4},
+ {assert path("${outputDir}").list().size() == 6},
// How many files were produced?
{assert path("$outputDir/220422_M11111_0222_000000000-K9H97_mkfastq/").list().size() == 2},
{assert path("$outputDir/multiqc/").list().size() == 3},
diff --git a/tests/pipeline/sgdemux.nf.test b/tests/pipeline/sgdemux.nf.test
index a740c4ef..de48242b 100644
--- a/tests/pipeline/sgdemux.nf.test
+++ b/tests/pipeline/sgdemux.nf.test
@@ -19,7 +19,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") },
- { assert workflow.trace.succeeded().size() == 103 },
+ { assert workflow.trace.succeeded().size() == 128 },
{ assert snapshot(
path("$outputDir/sim-data/metrics.tsv"),
path("$outputDir/sim-data/per_project_metrics.tsv"),
diff --git a/tests/pipeline/skip_tools.nf.test b/tests/pipeline/skip_tools.nf.test
index 1de176d0..2f0d1cc7 100644
--- a/tests/pipeline/skip_tools.nf.test
+++ b/tests/pipeline/skip_tools.nf.test
@@ -21,7 +21,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_trimming") },
- { assert workflow.trace.succeeded().size() == 5 },
+ { assert workflow.trace.succeeded().size() == 6 },
{ assert path("$outputDir/multiqc/multiqc_report.html").exists() },
{ assert snapshot(
path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"),
@@ -49,7 +49,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_fastp") },
- { assert workflow.trace.succeeded().size() == 4 },
+ { assert workflow.trace.succeeded().size() == 5 },
{ assert path("$outputDir/multiqc/multiqc_report.html").exists() },
{ assert snapshot(
path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"),
@@ -77,7 +77,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_fastqc") },
- { assert workflow.trace.succeeded().size() == 5 },
+ { assert workflow.trace.succeeded().size() == 6 },
{ assert path("$outputDir/multiqc/multiqc_report.html").exists() },
{ assert snapshot(
path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"),
@@ -105,7 +105,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_fastp_fastqc") },
- { assert workflow.trace.succeeded().size() == 4 },
+ { assert workflow.trace.succeeded().size() == 5 },
{ assert path("$outputDir/multiqc/multiqc_report.html").exists() },
{ assert snapshot(
path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"),
@@ -133,7 +133,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions_skip_multiqc") },
- { assert workflow.trace.succeeded().size() == 4 },
+ { assert workflow.trace.succeeded().size() == 5 },
{ assert !path("$outputDir/multiqc/multiqc_report.html").exists() },
{ assert snapshot(
path("$outputDir/220422_M11111_0222_000000000-K9H97/L001/Sample1_S1_L001_R1_001.fastq.gz"),
diff --git a/tests/pipeline/test_pe.nf.test b/tests/pipeline/test_pe.nf.test
index bba8f754..8b9ae237 100644
--- a/tests/pipeline/test_pe.nf.test
+++ b/tests/pipeline/test_pe.nf.test
@@ -20,7 +20,7 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(UTILS.removeNextflowVersion("$outputDir")).match("software_versions") },
- { assert workflow.trace.succeeded().size() == 6 },
+ { assert workflow.trace.succeeded().size() == 7 },
{ assert snapshot(
path("$outputDir/multiqc/multiqc_data/bcl2fastq_lane_counts.txt"),
path("$outputDir/multiqc/multiqc_data/fastp_filtered_reads_plot.txt"),
diff --git a/workflows/demultiplex.nf b/workflows/demultiplex.nf
index c48c9940..1f8576a1 100644
--- a/workflows/demultiplex.nf
+++ b/workflows/demultiplex.nf
@@ -15,6 +15,7 @@ include { FQTK_DEMULTIPLEX } from '../subworkflows/local/fqtk_demultipl
include { MKFASTQ_DEMULTIPLEX } from '../subworkflows/local/mkfastq_demultiplex/main'
include { SINGULAR_DEMULTIPLEX } from '../subworkflows/local/singular_demultiplex/main'
include { RUNDIR_CHECKQC } from '../subworkflows/local/rundir_checkqc/main'
+include { FASTQ_TO_SAMPLESHEET } from '../modules/local/fastq_to_samplesheet/main'
//
@@ -48,11 +49,12 @@ workflow DEMULTIPLEX {
main:
// Value inputs
- demultiplexer = params.demultiplexer // string: bases2fastq, bcl2fastq, bclconvert, fqtk, sgdemux, mkfastq
- trim_fastq = params.trim_fastq // boolean: true, false
- skip_tools = params.skip_tools ? params.skip_tools.split(',') : [] // list: [falco, fastp, multiqc]
- sample_size = params.sample_size // int
- kraken_db = params.kraken_db // path
+ demultiplexer = params.demultiplexer // string: bases2fastq, bcl2fastq, bclconvert, fqtk, sgdemux, mkfastq
+ trim_fastq = params.trim_fastq // boolean: true, false
+ skip_tools = params.skip_tools ? params.skip_tools.split(',') : [] // list: [falco, fastp, multiqc]
+ sample_size = params.sample_size // int
+ kraken_db = params.kraken_db // path
+ downstream_pipeline = params.downstream_pipeline // string: rnaseq, atacseq, taxprofiler
// Channel inputs
@@ -250,6 +252,30 @@ workflow DEMULTIPLEX {
ch_versions = ch_versions.mix(FASTQ_CONTAM_SEQTK_KRAKEN.out.versions)
ch_multiqc_files = ch_multiqc_files.mix( FASTQ_CONTAM_SEQTK_KRAKEN.out.reports.map { meta, log -> return log })
}
+
+ // Prepare metamap with fastq info
+ ch_meta_fastq = ch_raw_fastq.map { meta, fastq_files ->
+ // Determine the publish directory based on the lane information
+ def publish_dir = meta.lane ? "${params.outdir}/${meta.id}/L00${meta.lane}" : "${params.outdir}/${meta.id}"
+ meta.fastq_1 = "${publish_dir}/${fastq_files[0].getName()}"
+
+ // Add full path for fastq_2 to the metadata if the sample is not single-end
+ if (!meta.single_end) {
+ meta.fastq_2 = "${publish_dir}/${fastq_files[1].getName()}"
+ }
+ return meta
+ }
+
+ // Module: FASTQ to samplesheet
+ FASTQ_TO_SAMPLESHEET(ch_meta_fastq, downstream_pipeline, 'auto')
+
+ FASTQ_TO_SAMPLESHEET.out.samplesheet
+ .map { it[1] }
+ .collectFile(name:'tmp_samplesheet.csv', newLine: true, keepHeader: true, sort: { it.baseName })
+ .map { it.text.tokenize('\n').join('\n') }
+ .collectFile(name:'samplesheet.csv', storeDir: "${params.outdir}/samplesheet")
+ .set { ch_samplesheet }
+
//
// Collate and save software versions
//