diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index d2dc65c3..9b48f356 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/scrnaseq, the standard workflow is 1. Check that there isn't already an issue about your idea in the [nf-core/scrnaseq issues](https://github.com/nf-core/scrnaseq/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/scrnaseq repository](https://github.com/nf-core/scrnaseq) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -40,7 +40,7 @@ There are typically two types of tests that run: ### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. -To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -75,7 +75,7 @@ If you wish to contribute a new step, please use the following coding standards: 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new parameters to `nextflow.config` with a default (see below). -5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. @@ -86,11 +86,11 @@ If you wish to contribute a new step, please use the following coding standards: Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. -Once there, use `nf-core schema build` to add to `nextflow_schema.json`. +Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements -Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block. @@ -103,7 +103,7 @@ Please use the following naming schemes, to make it easy to understand what is g ### Nextflow version bumping -If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]` ### Images and figures diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 073b2953..4a1fd411 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,7 +17,7 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/scrn - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/scrnaseq/tree/master/.github/CONTRIBUTING.md) - [ ] If necessary, also make a PR on the nf-core/scrnaseq _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. -- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Make sure your code lints (`nf-core pipelines lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index 57587738..665c7198 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,21 +1,38 @@ name: nf-core AWS full size tests -# This workflow is triggered on published releases. +# This workflow is triggered on PRs opened against the master branch. # It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: - release: - types: [published] + pull_request: + branches: + - master workflow_dispatch: + pull_request_review: + types: [submitted] + jobs: run-platform: name: Run AWS full tests - if: github.repository == 'nf-core/scrnaseq' + # run only if the PR is approved by at least 2 reviewers and against the master branch or manually triggered + if: github.repository == 'nf-core/scrnaseq' && github.event.review.state == 'approved' && github.event.pull_request.base.ref == 'master' || github.event_name == 'workflow_dispatch' runs-on: ubuntu-latest strategy: matrix: - aligner: ["alevin", "kallisto", "star", "cellranger", "universc"] + aligner: ["alevin", "kallisto", "star", "cellranger"] steps: + - uses: octokit/request-action@v2.x + id: check_approvals + with: + route: GET /repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/reviews + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - id: test_variables + if: github.event_name != 'workflow_dispatch' + run: | + JSON_RESPONSE='${{ steps.check_approvals.outputs.data }}' + CURRENT_APPROVALS_COUNT=$(echo $JSON_RESPONSE | jq -c '[.[] | select(.state | contains("APPROVED")) ] | length') + test $CURRENT_APPROVALS_COUNT -ge 2 || exit 1 # At least 2 approvals are required - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 with: diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index d04c973b..fccb7fc0 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest strategy: matrix: - aligner: ["alevin", "kallisto", "star", "cellranger", "universc"] + aligner: ["alevin", "kallisto", "star", "cellranger"] steps: # Launch workflow using Seqera Platform CLI tool action - name: Launch workflow via Seqera Platform diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 4b3c26a0..40eab942 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -20,6 +20,8 @@ env: NFT_WORKDIR: "~" NFT_DIFF: "pdiff" NFT_DIFF_ARGS: "--line-numbers --expand-tabs=2" + NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity + NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity concurrency: group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" @@ -34,7 +36,7 @@ jobs: fail-fast: false matrix: NXF_VER: - - "23.04.0" + - "24.04.2" - "latest-everything" profile: ["alevin", "cellranger", "cellrangermulti", "kallisto", "star"] @@ -50,6 +52,11 @@ jobs: python-version: "3.11" architecture: "x64" + - uses: actions/setup-java@8df1039502a15bceb9433410b1a100fbe190c53b # v4 + with: + distribution: "temurin" + java-version: "17" + - name: Install pdiff to see diff between nf-test snapshots run: | python -m pip install --upgrade pip diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml index 003eadd5..45923af3 100644 --- a/.github/workflows/download_pipeline.yml +++ b/.github/workflows/download_pipeline.yml @@ -1,4 +1,4 @@ -name: Test successful pipeline download with 'nf-core download' +name: Test successful pipeline download with 'nf-core pipelines download' # Run the workflow when: # - dispatched manually @@ -8,7 +8,7 @@ on: workflow_dispatch: inputs: testbranch: - description: "The specific branch you wish to utilize for the test execution of nf-core download." + description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download." required: true default: "dev" pull_request: @@ -42,9 +42,11 @@ jobs: with: python-version: "3.12" architecture: "x64" - - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7 + + - name: Setup Apptainer + uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0 with: - singularity-version: 3.8.3 + apptainer-version: 1.3.4 - name: Install dependencies run: | @@ -57,33 +59,64 @@ jobs: echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV} echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV} + - name: Make a cache directory for the container images + run: | + mkdir -p ./singularity_container_images + - name: Download the pipeline env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images run: | - nf-core download ${{ env.REPO_LOWERCASE }} \ + nf-core pipelines download ${{ env.REPO_LOWERCASE }} \ --revision ${{ env.REPO_BRANCH }} \ --outdir ./${{ env.REPOTITLE_LOWERCASE }} \ --compress "none" \ --container-system 'singularity' \ - --container-library "quay.io" -l "docker.io" -l "ghcr.io" \ + --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io" \ --container-cache-utilisation 'amend' \ - --download-configuration + --download-configuration 'yes' - name: Inspect download run: tree ./${{ env.REPOTITLE_LOWERCASE }} + - name: Count the downloaded number of container images + id: count_initial + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Initial container image count: $image_count" + echo "IMAGE_COUNT_INITIAL=$image_count" >> ${GITHUB_ENV} + - name: Run the downloaded pipeline (stub) id: stub_run_pipeline continue-on-error: true env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results - name: Run the downloaded pipeline (stub run not supported) id: run_pipeline if: ${{ job.steps.stub_run_pipeline.status == failure() }} env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results + + - name: Count the downloaded number of container images + id: count_afterwards + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Post-pipeline run container image count: $image_count" + echo "IMAGE_COUNT_AFTER=$image_count" >> ${GITHUB_ENV} + + - name: Compare container image counts + run: | + if [ "${{ env.IMAGE_COUNT_INITIAL }}" -ne "${{ env.IMAGE_COUNT_AFTER }}" ]; then + initial_count=${{ env.IMAGE_COUNT_INITIAL }} + final_count=${{ env.IMAGE_COUNT_AFTER }} + difference=$((final_count - initial_count)) + echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!" + tree ./singularity_container_images + exit 1 + else + echo "The pipeline can be downloaded successfully!" + fi diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 1fcafe88..a502573c 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,6 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure +# It runs the `nf-core pipelines lint` and markdown lint tests to ensure # that the code meets the nf-core guidelines. on: push: @@ -41,17 +41,32 @@ jobs: python-version: "3.12" architecture: "x64" + - name: read .nf-core.yml + uses: pietrobolcato/action-read-yaml@1.1.0 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + - name: Install dependencies run: | python -m pip install --upgrade pip - pip install nf-core + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Run nf-core pipelines lint + if: ${{ github.base_ref != 'master' }} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - - name: Run nf-core lint + - name: Run nf-core pipelines lint --release + if: ${{ github.base_ref == 'master' }} env: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index 40acc23f..42e519bf 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Download lint results - uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3 + uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6 with: workflow: linting.yml workflow_conclusion: completed diff --git a/.github/workflows/release-announcements.yml b/.github/workflows/release-announcements.yml index 03ecfcf7..c6ba35df 100644 --- a/.github/workflows/release-announcements.yml +++ b/.github/workflows/release-announcements.yml @@ -12,7 +12,7 @@ jobs: - name: get topics and convert to hashtags id: get_topics run: | - echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" >> $GITHUB_OUTPUT + echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" | sed 's/-//g' >> $GITHUB_OUTPUT - uses: rzr/fediverse-action@master with: diff --git a/.github/workflows/template_version_comment.yml b/.github/workflows/template_version_comment.yml new file mode 100644 index 00000000..e8aafe44 --- /dev/null +++ b/.github/workflows/template_version_comment.yml @@ -0,0 +1,46 @@ +name: nf-core template version comment +# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version. +# It posts a comment to the PR, even if it comes from a fork. + +on: pull_request_target + +jobs: + template_version: + runs-on: ubuntu-latest + steps: + - name: Check out pipeline code + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + with: + ref: ${{ github.event.pull_request.head.sha }} + + - name: Read template version from .nf-core.yml + uses: nichmor/minimal-read-yaml@v0.0.2 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + + - name: Install nf-core + run: | + python -m pip install --upgrade pip + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Check nf-core outdated + id: nf_core_outdated + run: echo "OUTPUT=$(pip list --outdated | grep nf-core)" >> ${GITHUB_ENV} + + - name: Post nf-core template version comment + uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2 + if: | + contains(env.OUTPUT, 'nf-core') + with: + repo-token: ${{ secrets.NF_CORE_BOT_AUTH_TOKEN }} + allow-repeats: false + message: | + > [!WARNING] + > Newer version of the nf-core template is available. + > + > Your pipeline is using an old version of the nf-core template: ${{ steps.read_yml.outputs['nf_core_version'] }}. + > Please update your pipeline to the latest version. + > + > For more documentation on how to update your pipeline, please see the [nf-core documentation](https://github.com/nf-core/tools?tab=readme-ov-file#sync-a-pipeline-with-the-template) and [Synchronisation documentation](https://nf-co.re/docs/contributing/sync). + # diff --git a/.gitignore b/.gitignore index bc675aba..8529bb03 100644 --- a/.gitignore +++ b/.gitignore @@ -6,6 +6,7 @@ results/ testing/ testing* *.pyc +null/ log/ reports/ testme.sh diff --git a/.gitpod.yml b/.gitpod.yml index 105a1821..46118637 100644 --- a/.gitpod.yml +++ b/.gitpod.yml @@ -4,17 +4,14 @@ tasks: command: | pre-commit install --install-hooks nextflow self-update - - name: unset JAVA_TOOL_OPTIONS - command: | - unset JAVA_TOOL_OPTIONS vscode: extensions: # based on nf-core.nf-core-extensionpack - - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + #- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar - mechatroner.rainbow-csv # Highlight columns in csv files in different colors - # - nextflow.nextflow # Nextflow syntax highlighting + - nextflow.nextflow # Nextflow syntax highlighting - oderwat.indent-rainbow # Highlight indentation level - streetsidesoftware.code-spell-checker # Spelling checker for source code - charliermarsh.ruff # Code linter Ruff diff --git a/.nf-core.yml b/.nf-core.yml index 73506c0f..48d46d3b 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,11 +1,22 @@ -repository_type: pipeline -nf_core_version: "2.14.1" +bump_version: null lint: - template_strings: False - files_unchanged: - - .github/ISSUE_TEMPLATE/bug_report.yml files_exist: - lib/Utils.groovy - # TODO This is because of an issue with the monochromeLogs parameter - # See nextflow.config for details - schema_params: False + files_unchanged: + - .github/ISSUE_TEMPLATE/bug_report.yml + schema_params: false + template_strings: false +nf_core_version: 3.0.2 +org_path: null +repository_type: pipeline +template: + author: Bailey PJ, Botvinnik O, Marques de Almeida F, Peltzer A, Sturm G + description: Pipeline for processing 10x Genomics single cell rnaseq data + force: false + is_nfcore: true + name: scrnaseq + org: nf-core + outdir: . + skip_features: null + version: 2.8.0dev +update: null diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4dc0f1dc..9e9f0e1c 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -7,7 +7,7 @@ repos: - prettier@3.2.5 - repo: https://github.com/editorconfig-checker/editorconfig-checker.python - rev: "2.7.3" + rev: "3.0.3" hooks: - id: editorconfig-checker alias: ec diff --git a/CHANGELOG.md b/CHANGELOG.md index fd1b6e41..a4119ee3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,26 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## v3.0.0 - 2024-12-09 + +## Backwards-incompatible changes + +- Remove universc workflow from pipeline ([#289](https://github.com/nf-core/scrnaseq/issues/289)). +- Remove emptydrops from the pipeline, in favor of cellbender ([#369](https://github.com/nf-core/scrnaseq/pull/369)). + +## Additions + +- Add `--save_align_intermeds` parameter that publishes BAM files to the output directory (for `starsolo`, `cellranger` and `cellranger multi`) ([#384](https://github.com/nf-core/scrnaseq/issues/384)). + +## Fixes + +- Add support for pre-built indexes in `genomes.config` file for `cellranger`, `cellranger-arc`, `simpleaf` and `simpleaf txp2gene` ([#371](https://github.com/nf-core/scrnaseq/issues/371)). +- Refactor matrix conversion code. Output from all aligners is initially converted to AnnData h5ad that is used for + downstream code such as cellbender. H5ad objects are converted to Seurat and SingleCellExperiment at the end + using anndataR. This reduced the pipeline complexity and resolved various issues relating to output format conversion + ([#369](https://github.com/nf-core/scrnaseq/pull/369)). +- Fix problem with `test_full` that was not running out of the box, since code was trying to overwrite parameters in the workflow, which is not possible ([#366](https://github.com/nf-core/scrnaseq/issues/366)). + ## v2.7.1 - 2024-08-13 - Fix that tests have not been executed with nf-test v0.9 ([#359](https://github.com/nf-core/scrnaseq/pull/359)) diff --git a/CITATIONS.md b/CITATIONS.md index 867bde34..7281a5d0 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -12,11 +12,19 @@ - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) - > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. +> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) - > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. +> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. + +- [Simpleaf](https://doi.org/10.1093/bioinformatics/btad614) + + > He, D., Patro, R. simpleaf: a simple, flexible, and scalable framework for single-cell data processing using alevin-fry, Bioinformatics 39, 10 (2023). + +* [Alevin-fry](https://doi.org/10.1038/s41592-022-01408-3) + + > He, D., Zakeri, M., Sarkar, H. et al. Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data. Nat Methods 19, 316–322 (2022). * [Alevin](https://doi.org/10.1186/s13059-019-1670-y) diff --git a/README.md b/README.md index 477e77b9..8823b6cb 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ [![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.3568187-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.3568187) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) @@ -29,13 +29,15 @@ This is a community effort in building a pipeline capable to support: - STARSolo - Kallisto + BUStools - Cellranger -- UniverSC + +> [!IMPORTANT] +> Cellranger is a commercial tool from 10X Genomics Inc. and falls under the EULA from 10X Genomics Inc. The container provided for the CellRanger functionality in this pipeline has been built by the nf-core community and is therefore _not supported by 10X genomics_ directly. We are in discussions with 10X on how to improve the user experience and licence situation for both us as a community as well as 10X and end users and will update this statement here accordingly. ## Documentation The nf-core/scrnaseq pipeline comes with documentation about the pipeline [usage](https://nf-co.re/scrnaseq/usage), [parameters](https://nf-co.re/scrnaseq/parameters) and [output](https://nf-co.re/scrnaseq/output). -![scrnaseq workflow](docs/images/scrnaseq_pipeline_v1.0_metro_clean.png) +![scrnaseq workflow](docs/images/scrnaseq_pipeline_V3.0-metro_clean.png) ## Usage @@ -63,13 +65,12 @@ nextflow run nf-core/scrnaseq \ --genome_fasta GRCm38.p6.genome.chr19.fa \ --gtf gencode.vM19.annotation.chr19.gtf \ --protocol 10XV2 \ - --aligner \ + --aligner \ --outdir ``` > [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). +> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/scrnaseq/usage) and the [parameter documentation](https://nf-co.re/scrnaseq/parameters). @@ -83,7 +84,6 @@ graph TD A[sc RNA] -->|CellRanger| B(h5ad/seurat/mtx matrices) A[sc RNA] -->|kbpython| B(h5ad/seurat/mtx matrices) A[sc RNA] -->|STARsolo| B(h5ad/seurat/mtx matrices) - A[sc RNA] -->|Universc| B(h5ad/seurat/mtx matrices) ``` Options for the respective alignment method can be found [here](https://github.com/nf-core/scrnaseq/blob/dev/docs/usage.md#aligning-options) to choose between methods. diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml index d5fcc74a..03605fbc 100644 --- a/assets/multiqc_config.yml +++ b/assets/multiqc_config.yml @@ -1,7 +1,7 @@ report_comment: > - This report has been generated by the nf-core/scrnaseq + This report has been generated by the nf-core/scrnaseq analysis pipeline. For information about how to interpret these results, please see the - documentation. + documentation. report_section_order: "nf-core-scrnaseq-methods-description": order: -1000 diff --git a/assets/protocols.json b/assets/protocols.json index 0552f8d5..613e9773 100644 --- a/assets/protocols.json +++ b/assets/protocols.json @@ -89,25 +89,5 @@ "smartseq": { "protocol": "SMARTSEQ" } - }, - "universc": { - "auto": { - "protocol": "10x" - }, - "10XV1": { - "protocol": "10x-v1" - }, - "10XV2": { - "protocol": "10x-v2" - }, - "10XV3": { - "protocol": "10x-v3" - }, - "10XV4": { - "protocol": "10x-v4" - }, - "dropseq": { - "protocol": "dropseq" - } } } diff --git a/assets/schema_input.json b/assets/schema_input.json index 38b3611a..3954dbca 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/scrnaseq/master/assets/schema_input.json", "title": "nf-core/scrnaseq pipeline - params.input schema", "description": "Schema for the file provided with params.input", diff --git a/bin/concat_h5ad.py b/bin/concat_h5ad.py deleted file mode 100755 index 43ea071a..00000000 --- a/bin/concat_h5ad.py +++ /dev/null @@ -1,52 +0,0 @@ -#!/usr/bin/env python - -# Set numba chache dir to current working directory (which is a writable mount also in containers) -import os - -os.environ["NUMBA_CACHE_DIR"] = "." - -import scanpy as sc, anndata as ad, pandas as pd -from pathlib import Path -import argparse - - -def read_samplesheet(samplesheet): - df = pd.read_csv(samplesheet) - df.set_index("sample") - - # samplesheet may contain replicates, when it has, - # group information from replicates and collapse with commas - # only keep unique values using set() - df = df.groupby(["sample"]).agg(lambda column: ",".join(set(column))) - - return df - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Concatenates h5ad files and merge metadata from samplesheet") - - parser.add_argument("-i", "--input", dest="input", help="Path to samplesheet.csv") - parser.add_argument("-o", "--out", dest="out", help="Output path.") - parser.add_argument( - "-s", - "--suffix", - dest="suffix", - help="Suffix of matrices to remove and get sample name", - ) - - args = vars(parser.parse_args()) - - # Open samplesheet as dataframe - df_samplesheet = read_samplesheet(args["input"]) - - # find all h5ad and append to dict - dict_of_h5ad = {str(path).replace(args["suffix"], ""): sc.read_h5ad(path) for path in Path(".").rglob("*.h5ad")} - - # concat h5ad files - adata = ad.concat(dict_of_h5ad, label="sample", merge="unique", index_unique="_") - - # merge with data.frame, on sample information - adata.obs = adata.obs.join(df_samplesheet, on="sample") - adata.write_h5ad(args["out"], compression="gzip") - - print("Wrote h5ad file to {}".format(args["out"])) diff --git a/bin/emptydrops_cell_calling.R b/bin/emptydrops_cell_calling.R deleted file mode 100755 index 23a45267..00000000 --- a/bin/emptydrops_cell_calling.R +++ /dev/null @@ -1,52 +0,0 @@ -#!/usr/bin/env Rscript -library("DropletUtils") -library("Matrix") - -args <- commandArgs(trailingOnly=TRUE) - -fn_mtx <- args[1] -fn_barcodes <- args[2] -fn_genes <- args[3] -outdir <- args[4] -aligner <- args[5] - -# Read matrix/barcodes/genes -genes <- read.table(fn_genes,sep='\t') -barcodes <- read.table(fn_barcodes,sep='\t') -mtx <- readMM(fn_mtx) - -get_name <- function(file) { - name <- as.character(basename(file)) - name <- gsub('\\.gz$', '', name) - return(name) -} - -# transpose matrices when required -# based on code of 'mtx_to_seurat.R', only the data from kallisto and alevin would require transposition -print("Only kallisto and alevin have transposed matrices.") -if (aligner %in% c( "kallisto", "alevin" )) { - is_transposed <- TRUE - mtx<-t(mtx) -} else { - is_transposed <- FALSE -} - - -# Call empty drops -e.out <- emptyDrops(mtx) -is.cell <- e.out$FDR <= 0.01 - -# Slice matrix and barcodes -mtx_filtered <-mtx[,which(is.cell),drop=FALSE] -barcodes_filtered<-barcodes[which(is.cell),] - -# If matrix was transposed early, need to transpose back -if (is_transposed){ - mtx_filtered<-t(mtx_filtered) - print('Transposing back matrix.') -} - -# Write output -writeMM(mtx_filtered,file.path(outdir,get_name(fn_mtx))) -write.table(barcodes_filtered,file=file.path(outdir,get_name(fn_barcodes)),col.names=FALSE,row.names=FALSE,sep='\t',quote=FALSE) -write.table(genes,file=file.path(outdir,get_name(fn_genes)),col.names=FALSE,row.names=FALSE,sep='\t',quote=FALSE) diff --git a/bin/mtx_to_h5ad.py b/bin/mtx_to_h5ad.py deleted file mode 100755 index 2190245d..00000000 --- a/bin/mtx_to_h5ad.py +++ /dev/null @@ -1,160 +0,0 @@ -#!/usr/bin/env python - -# Set numba chache dir to current working directory (which is a writable mount also in containers) -import os - -os.environ["NUMBA_CACHE_DIR"] = "." - -import scanpy as sc -import pandas as pd -import argparse -from scipy import io -from anndata import AnnData - - -def _10x_h5_to_adata(mtx_h5: str, sample: str): - adata = sc.read_10x_h5(mtx_h5) - adata.var["gene_symbols"] = adata.var_names - adata.var.set_index("gene_ids", inplace=True) - adata.obs["sample"] = sample - - # reorder columns for 10x mtx files - adata.var = adata.var[["gene_symbols", "feature_types", "genome"]] - - return adata - - -def _mtx_to_adata( - mtx_file: str, - barcode_file: str, - feature_file: str, - sample: str, - aligner: str, -): - adata = sc.read_mtx(mtx_file) - # for some reason star matrix comes transposed and doesn't fit when values are appended directly - # also true for cellranger files ( this is only used when running with the custom emptydrops_filtered files ) - # otherwise, it uses the cellranger .h5 files - if aligner in [ - "cellranger", - "cellrangermulti", - "star", - ]: - adata = adata.transpose() - - adata.obs_names = pd.read_csv(barcode_file, header=None, sep="\t")[0].values - adata.var_names = pd.read_csv(feature_file, header=None, sep="\t")[0].values - adata.obs["sample"] = sample - - return adata - - -def input_to_adata( - input_data: str, - barcode_file: str, - feature_file: str, - sample: str, - aligner: str, - txp2gene: str, - star_index: str, - verbose: bool = True, -): - if verbose and (txp2gene or star_index): - print("Reading in {}".format(input_data)) - - # - # open main data - # - if aligner == "cellranger" and input_data.lower().endswith('.h5'): - adata = _10x_h5_to_adata(input_data, sample) - else: - adata = _mtx_to_adata(input_data, barcode_file, feature_file, sample, aligner) - - # - # open gene information - # - if verbose and (txp2gene or star_index): - print("Reading in {}".format(txp2gene)) - - if aligner == "cellranger" and not input_data.lower().endswith('.h5'): - # - # for cellranger workflow, we do not have a txp2gene file, so, when using this normal/manual function for empty drops - # we need to provide this information coming directly from the features.tsv file - # by not using the .h5 file for conversion, we loose the two col information: feature_types and genome - # - t2g = pd.read_table(feature_file, header=None, names=["gene_id", "gene_symbol", "feature_types"], usecols=[0, 1, 2]) - else: - if txp2gene: - t2g = pd.read_table(txp2gene, header=None, names=["gene_id", "gene_symbol"], usecols=[1, 2]) - elif star_index: - t2g = pd.read_table( - f"{star_index}/geneInfo.tab", header=None, skiprows=1, names=["gene_id", "gene_symbol"], usecols=[0, 1] - ) - - if txp2gene or star_index or (aligner == "cellranger" and not input_data.lower().endswith('.h5')): - t2g = t2g.drop_duplicates(subset="gene_id").set_index("gene_id") - adata.var["gene_symbol"] = t2g["gene_symbol"] - - return adata - - -def write_counts( - adata: AnnData, - out: str, - verbose: bool = False, -): - pd.DataFrame(adata.obs.index).to_csv(os.path.join(out, "barcodes.tsv"), sep="\t", index=False, header=None) - pd.DataFrame(adata.var).to_csv(os.path.join(out, "features.tsv"), sep="\t", index=True, header=None) - io.mmwrite(os.path.join(out, "matrix.mtx"), adata.X.T, field="integer") - - if verbose: - print("Wrote features.tsv, barcodes.tsv, and matrix.mtx files to {}".format(args["out"])) - - -def dump_versions(task_process): - import pkg_resources - - with open("versions.yml", "w") as f: - f.write(f"{task_process}:\n\t") - f.write("\n\t".join([f"{pkg.key}: {pkg.version}" for pkg in pkg_resources.working_set])) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Converts mtx output to h5ad.") - - parser.add_argument("-i", "--input_data", dest="input_data", help="Path to either mtx or mtx h5 file.") - parser.add_argument("-v", "--verbose", dest="verbose", help="Toggle verbose messages", default=False) - parser.add_argument("-f", "--feature", dest="feature", help="Path to feature file.", nargs="?", const="") - parser.add_argument("-b", "--barcode", dest="barcode", help="Path to barcode file.", nargs="?", const="") - parser.add_argument("-s", "--sample", dest="sample", help="Sample name") - parser.add_argument("-o", "--out", dest="out", help="Output path.") - parser.add_argument("-a", "--aligner", dest="aligner", help="Which aligner has been used?") - parser.add_argument("--task_process", dest="task_process", help="Task process name.") - parser.add_argument("--txp2gene", dest="txp2gene", help="Transcript to gene (t2g) file.", nargs="?", const="") - parser.add_argument( - "--star_index", dest="star_index", help="Star index folder containing geneInfo.tab.", nargs="?", const="" - ) - - args = vars(parser.parse_args()) - - # create the directory with the sample name - os.makedirs(os.path.dirname(args["out"]), exist_ok=True) - - adata = input_to_adata( - input_data=args["input_data"], - barcode_file=args["barcode"], - feature_file=args["feature"], - sample=args["sample"], - aligner=args["aligner"], - txp2gene=args["txp2gene"], - star_index=args["star_index"], - verbose=args["verbose"], - ) - - write_counts(adata=adata, out=args["sample"], verbose=args["verbose"]) - - adata.write_h5ad(args["out"], compression="gzip") - - print("Wrote h5ad file to {}".format(args["out"])) - - dump_versions(task_process=args["task_process"]) diff --git a/bin/mtx_to_seurat.R b/bin/mtx_to_seurat.R deleted file mode 100755 index 7cacccf7..00000000 --- a/bin/mtx_to_seurat.R +++ /dev/null @@ -1,54 +0,0 @@ -#!/usr/bin/env Rscript -library(Seurat) - -args <- commandArgs(trailingOnly=TRUE) - -mtx_file <- args[1] -barcode_file <- args[2] -feature_file <- args[3] -out.file <- args[4] -aligner <- args[5] -is_emptydrops <- args[6] - -if (is_emptydrops == "--is_emptydrops") { - is_emptydrops <- TRUE -} else{ - is_emptydrops <- FALSE -} - -if (aligner %in% c( "kallisto", "alevin" )) { - print("1") - # for kallisto and alevin, the features file contains only one column and matrix needs to be transposed - expression.matrix <- ReadMtx( - mtx = mtx_file, features = feature_file, cells = barcode_file, feature.column = 1, mtx.transpose = TRUE - ) -} else { - if (aligner %in% c( "cellranger", "cellrangermulti", "star" ) && is_emptydrops) { - print("2") - expression.matrix <- ReadMtx( - mtx = mtx_file, features = feature_file, cells = barcode_file, feature.column = 1 - ) - } else{ - print("3") - expression.matrix <- ReadMtx( - mtx = mtx_file, features = feature_file, cells = barcode_file - ) - } -} - - -seurat.object <- CreateSeuratObject(counts = expression.matrix) - -dir.create(basename(dirname(out.file)), showWarnings = FALSE) - -saveRDS(seurat.object, file = out.file) - - -yaml::write_yaml( -list( - 'MTX_TO_SEURAT'=list( - 'Seurat' = paste(packageVersion('Seurat'), collapse='.') - ) -), -"versions.yml" -) diff --git a/conf/base.config b/conf/base.config index c7aeb64b..c85bf6cf 100644 --- a/conf/base.config +++ b/conf/base.config @@ -10,9 +10,9 @@ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 * task.attempt } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' } maxRetries = 1 @@ -25,30 +25,30 @@ process { // adding in your local modules too. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors withLabel:process_single { - cpus = { check_max( 1 , 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 12.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 36.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } + cpus = { 6 * task.attempt } + memory = { 36.GB * task.attempt } + time = { 8.h * task.attempt } } withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 72.GB * task.attempt, 'memory' ) } - time = { check_max( 16.h * task.attempt, 'time' ) } + cpus = { 12 * task.attempt } + memory = { 72.GB * task.attempt } + time = { 16.h * task.attempt } } withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } + time = { 20.h * task.attempt } } withLabel:process_high_memory { - memory = { check_max( 200.GB * task.attempt, 'memory' ) } + memory = { 200.GB * task.attempt } } withLabel:error_ignore { errorStrategy = 'ignore' diff --git a/conf/igenomes_ignored.config b/conf/igenomes_ignored.config new file mode 100644 index 00000000..b4034d82 --- /dev/null +++ b/conf/igenomes_ignored.config @@ -0,0 +1,9 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for iGenomes paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Empty genomes dictionary to use when igenomes is ignored. +---------------------------------------------------------------------------------------- +*/ + +params.genomes = [:] diff --git a/conf/modules.config b/conf/modules.config index 81395a1d..a1bfb4af 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -19,9 +19,8 @@ process { withName: FASTQC { ext.args = '--quiet' - time = { check_max( 120.h * task.attempt, 'time' ) } + time = { 120.h * task.attempt } } - withName: 'MULTIQC' { ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' } publishDir = [ @@ -32,25 +31,35 @@ process { } if (!params.skip_emptydrops) { - withName: EMPTYDROPS_CELL_CALLING { + withName: 'CELLBENDER_REMOVEBACKGROUND' { publishDir = [ - path: { "${params.outdir}/${params.aligner}" }, + path: { "${params.outdir}/${params.aligner}/${meta.id}/emptydrops_filter" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + withName: 'ADATA_BARCODES' { + ext.prefix = { "${meta.id}_${meta.input_type}_matrix" } + publishDir = [ + path: { "${params.outdir}/${params.aligner}/mtx_conversions/${meta.id}" }, mode: params.publish_dir_mode, - saveAs: { filename -> - if ( params.aligner == 'cellranger' ) "count/${meta.id}/${filename}" - else if ( params.aligner == 'kallisto' ) "${meta.id}.count/${filename}" - else "${meta.id}/${filename}" - } + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } - withName: 'MTX_TO_H5AD|CONCAT_H5AD|MTX_TO_SEURAT' { + withName: 'MTX_TO_H5AD|CONCAT_H5AD|ANNDATAR_CONVERT' { publishDir = [ path: { "${params.outdir}/${params.aligner}/mtx_conversions" }, - mode: params.publish_dir_mode + mode: params.publish_dir_mode, + saveAs: { filename -> + if (filename.equals('versions.yml')) { null } + else if (!filename.contains('combined_')) { "${meta.id}/${filename}" } + else filename + } ] } + withName: 'GTF_GENE_FILTER' { publishDir = [ path: { "${params.outdir}/gtf_filter" }, @@ -74,16 +83,18 @@ if(params.aligner == "cellranger") { withName: CELLRANGER_MKREF { publishDir = [ path: "${params.outdir}/${params.aligner}/mkref", - mode: params.publish_dir_mode + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } withName: CELLRANGER_COUNT { publishDir = [ path: "${params.outdir}/${params.aligner}/count", - mode: params.publish_dir_mode + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] - ext.args = {"--chemistry ${meta.chemistry} --create-bam true " + (meta.expected_cells ? "--expect-cells ${meta.expected_cells}" : '')} - time = { check_max( 240.h * task.attempt, 'time' ) } + ext.args = {"--chemistry ${meta.chemistry} --create-bam ${params.save_align_intermeds}" + " " + (meta.expected_cells ? "--expect-cells ${meta.expected_cells}" : '')} + time = { 240.h * task.attempt } } } } @@ -110,37 +121,7 @@ if(params.aligner == "cellrangerarc") { mode: params.publish_dir_mode ] ext.args = {meta.expected_cells ? "--expect-cells ${meta.expected_cells}" : ''} - time = { check_max( 240.h * task.attempt, 'time' ) } - } - } -} - -if(params.aligner == "universc") { - process { - publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" } - - withName: CELLRANGER_MKGTF { - publishDir = [ - path: "${params.outdir}/cellranger/mkgtf", - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - ext.args = "--attribute=gene_biotype:protein_coding --attribute=gene_biotype:lncRNA --attribute=gene_biotype:pseudogene" - container = "nf-core/universc:1.2.5.1" - } - withName: CELLRANGER_MKREF { - publishDir = [ - path: "${params.outdir}/cellranger/mkref", - mode: params.publish_dir_mode - ] - container = "nf-core/universc:1.2.5.1" - } - withName: UNIVERSC { - publishDir = [ - path: "${params.outdir}/universc", - mode: params.publish_dir_mode - ] - time = { check_max( 240.h * task.attempt, 'time' ) } + time = { 240.h * task.attempt } } } } @@ -161,15 +142,16 @@ if (params.aligner == "alevin") { } withName: 'SIMPLEAF_QUANT' { publishDir = [ - path: { "${params.outdir}/${params.aligner}" }, - mode: params.publish_dir_mode + path: { "${params.outdir}/${params.aligner}/${meta.id}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] ext.args = "-r cr-like" } // Fix for issue 196 // Modified for issue 334 withName: 'ALEVINQC' { - time = { check_max( 120.h, 'time' ) } + time = { 120.h } } } } @@ -183,14 +165,28 @@ if (params.aligner == "star") { publishDir = [ path: { "${params.outdir}/${params.aligner}/genome_generate" }, mode: params.publish_dir_mode, - enabled: params.save_reference + enabled: params.save_reference, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } - withName: STAR_ALIGN { - publishDir = [ - path: { "${params.outdir}/${params.aligner}/${meta.id}" }, - mode: params.publish_dir_mode - ] + + if(params.save_align_intermeds) { + withName: 'STAR_ALIGN' { + publishDir = [ + path: { "${params.outdir}/${params.aligner}/${meta.id}" }, + mode: params.publish_dir_mode + ] + } + } + else { + withName: 'STAR_ALIGN' { + publishDir = [ + path: { "${params.outdir}/${params.aligner}/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: '*', + saveAs: { it.endsWith('.bam') ? null : it } + ] + } } } } @@ -201,14 +197,16 @@ if (params.aligner == 'kallisto') { publishDir = [ path: { "${params.outdir}/${params.aligner}" }, mode: params.publish_dir_mode, - enabled: params.save_reference + enabled: params.save_reference, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } withName: KALLISTOBUSTOOLS_COUNT { def kb_filter = (params.kb_filter) ? '--filter' : '' publishDir = [ path: { "${params.outdir}/${params.aligner}" }, - mode: params.publish_dir_mode + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] ext.args = "--workflow ${params.kb_workflow} ${kb_filter}" } @@ -247,7 +245,8 @@ if (params.aligner == 'cellrangermulti') { withName: CELLRANGER_MKVDJREF { publishDir = [ path: "${params.outdir}/${params.aligner}/mkvdjref", - mode: params.publish_dir_mode + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } diff --git a/conf/test.config b/conf/test.config index 08ab1b69..a94d51d1 100644 --- a/conf/test.config +++ b/conf/test.config @@ -10,15 +10,18 @@ ---------------------------------------------------------------------------------------- */ +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data input = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/samplesheet-2-0.csv' skip_emptydrops = true // module does not work on small dataset diff --git a/conf/test_cellranger_multi.config b/conf/test_cellranger_multi.config index f10550ae..0a229f9c 100644 --- a/conf/test_cellranger_multi.config +++ b/conf/test_cellranger_multi.config @@ -10,6 +10,14 @@ ---------------------------------------------------------------------------------------- */ +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + // shared across profiles params { config_profile_name = 'Test profile (Cellranger Multi)' diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png deleted file mode 100755 index 361d0e47..00000000 Binary files a/docs/images/mqc_fastqc_adapter.png and /dev/null differ diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png deleted file mode 100755 index cb39ebb8..00000000 Binary files a/docs/images/mqc_fastqc_counts.png and /dev/null differ diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png deleted file mode 100755 index a4b89bf5..00000000 Binary files a/docs/images/mqc_fastqc_quality.png and /dev/null differ diff --git a/docs/images/scrnaseq_pipeline_V3.0-metro_clean.png b/docs/images/scrnaseq_pipeline_V3.0-metro_clean.png new file mode 100644 index 00000000..42c542c5 Binary files /dev/null and b/docs/images/scrnaseq_pipeline_V3.0-metro_clean.png differ diff --git a/docs/images/rnaseq_pipeline_V1.0-metro_clean.svg b/docs/images/scrnaseq_pipeline_V3.0-metro_clean.svg similarity index 93% rename from docs/images/rnaseq_pipeline_V1.0-metro_clean.svg rename to docs/images/scrnaseq_pipeline_V3.0-metro_clean.svg index a831236f..9333be6d 100644 --- a/docs/images/rnaseq_pipeline_V1.0-metro_clean.svg +++ b/docs/images/scrnaseq_pipeline_V3.0-metro_clean.svg @@ -1,12 +1,12 @@ image/svg+xmlCell Ranger ATACSalmon-AlevinDemuliplexed Count MatrixReferenceAnnotationReferenceAnnotationCell BenderRemove BackgroundUniverSCCell Ranger ArcGene BioType Filteringcellranger mkgtfcellranger-arc mkgtfGenerate Reference Indexcellranger mkrefcellranger-arc mkrefGenerate Count Matrixuniversccellranger-arc countCount MatrixCount Matrixh5adh5ad rdsCount Matrixh5ad rdsConvert MTX MTX to H5ADConversionSamplesheetscRNA-SeqLegendReferenceAnnotationReferenceAnnotation++Multiome