diff --git a/CHANGELOG.md b/CHANGELOG.md index e4d169db..750a7028 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Add `--save_align_intermeds` parameter that publishes BAM files to the output directory (for `starsolo`, `cellranger` and `cellranger multi`) ([#384](https://github.com/nf-core/scrnaseq/issues/384)) - Added support for pre-built indexes in `genomes.config` file for `cellranger`, `cellranger-arc`, `simpleaf` and `simpleaf txp2gene` ([#371](https://github.com/nf-core/scrnaseq/issues/371)) +- Cleanup and fix bugs in matrix conversion code, and change to use anndataR for conversions, and cellbender for emptydrops call. ([#369](https://github.com/nf-core/scrnaseq/pull/369)) +- Fix problem with `test_full` that was not running out of the box, since code was trying to overwrite parameters in the workflow, which is not possible ([#366](https://github.com/nf-core/scrnaseq/issues/366)) ## v2.7.1 - 2024-08-13 diff --git a/docs/output.md b/docs/output.md index a5292336..38dbda10 100644 --- a/docs/output.md +++ b/docs/output.md @@ -19,7 +19,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d - [Cellranger ARC](#cellranger-arc) - [Cellranger multi](#cellranger-multi) - [UniverSC](#universc) - - [Custom emptydrops filter](#custom-emptydrops-filter) + - [Cellbender emptydrops filter](#cellbender-emptydrops-filter) - [Other output data](#other-output-data) - [MultiQC](#multiqc) - [Pipeline information](#pipeline-information) @@ -141,15 +141,15 @@ Battenberg, K., Kelly, S.T., Ras, R.A., Hetherington, N.A., Hayashi, K., and Min - Contains the mapped BAM files, filtered and unfiltered HDF5 matrices and output metrics created by the open-source implementation of Cell Ranger run via UniverSC -## Custom emptydrops filter +## Cellbender emptydrops filter -The pipeline also possess a module to perform empty-drops calling and filtering with a custom-made script that uses a library called `bioconductor-dropletutils` that is available in `bioconda`. The process is simple, it takes a raw/unfiltered matrix file, and performs the empty-drops calling and filtering on it, generating another matrix file. +The pipeline also possess a subworkflow imported from scdownstream to perform emptydrops calling and filtering using [cellbender](https://github.com/broadinstitute/CellBender). The process is simple, it takes a raw/unfiltered matrix file, and performs the emptydrops calling and filtering on it, generating another matrix file. > Users can turn it of with `--skip_emptydrops`. -**Output directory: `results/${params.aligner}/emptydrops_filtered`** +**Output directory: `results/${params.aligner}/${meta.id}/emptydrops_filter`** -- Contains the empty-drops filtered matrices results generated by the `bioconductor-dropletutils` custom script +- Contains the emptydrops filtered matrices results generated by the cellbender subworkflow. ## Other output data @@ -170,15 +170,15 @@ The pipeline also possess a module to perform empty-drops calling and filtering - `.mtx` files converted to R native data format, rds, using the [Seurat package](https://github.com/satijalab/seurat) - One per sample -Because the pipeline has both the data directly from the aligners, and from the custom empty-drops filtering module the conversion modules were modified to understand the difference between raw/filtered from the aligners itself and filtered from the custom empty-drops module. So, to try to avoid confusion by the user, we added "suffixes" to the generated converted files so that we have provenance from what input it came from. +Because the pipeline has both the data directly from the aligners, and from the cellbender empty-drops filtering module, the conversion modules were modified to understand the difference between raw/filtered from the aligners itself and filtered from the empty-drops module. So, to try to avoid confusion by the user, we added "suffixes" to the generated converted files so that we have provenance from what input it came from. -So, the conversion modules generate data with the following syntax: **`*_{raw,filtered,custom_emptydrops_filter}_matrix.{h5ad,rds}`**. With the following meanings: +So, the conversion modules generate data with the following syntax: **`*_{raw,filtered,emptydrops_filter}_matrix.{h5ad,rds}`**. With the following meanings: | suffix | meaning | | :----------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- | | raw | Conversion of the raw/unprocessed matrix generated by the tool. It is also used for tools that generate only one matrix, such as alevin. | | filtered | Conversion of the filtered/processed matrix generated by the tool | -| custom_emptydrops_filter | Conversion of the matrix that was generated by the new custom empty drops filter module | +| emptydrops_filter | Conversion of the matrix that was generated by the cellbender empty drops filter module | > Some aligners, like `alevin` do not produce both raw&filtered matrices. When aligners give only one output, they are treated with the `raw` suffix. Some aligners may have an option to give both raw&filtered and only one, like `kallisto`. Be aware when using the tools. diff --git a/nextflow_schema.json b/nextflow_schema.json index 935d4277..c09875be 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -90,7 +90,7 @@ }, "skip_emptydrops": { "type": "boolean", - "description": "Skip custom empty drops filter module" + "description": "Skip cellbender empty drops filter subworkflow" } } }, diff --git a/subworkflows/local/emptydrops_removal.nf b/subworkflows/local/emptydrops_removal.nf index 2ccacc26..7d63e86f 100644 --- a/subworkflows/local/emptydrops_removal.nf +++ b/subworkflows/local/emptydrops_removal.nf @@ -1,6 +1,10 @@ include { CELLBENDER_REMOVEBACKGROUND } from '../../modules/nf-core/cellbender/removebackground' include { ADATA_BARCODES } from '../../modules/local/adata_barcodes' +// +// TODO: Make it a nf-core subworkflow to be shared by scrnaseq and scdownstream pipelines. +// + workflow EMPTY_DROPLET_REMOVAL { take: ch_unfiltered