diff --git a/.editorconfig b/.editorconfig index dd9ffa53..72dda289 100644 --- a/.editorconfig +++ b/.editorconfig @@ -28,10 +28,6 @@ indent_style = unset [/assets/email*] indent_size = unset -# ignore Readme -[README.md] -indent_style = unset - -# ignore python +# ignore python and markdown [*.{py,md}] indent_style = unset diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index f5305291..23cadba9 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/differentialabundance, the standard 1. Check that there isn't already an issue about your idea in the [nf-core/differentialabundance issues](https://github.com/nf-core/differentialabundance/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/differentialabundance repository](https://github.com/nf-core/differentialabundance) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -40,7 +40,7 @@ There are typically two types of tests that run: ### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. -To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -75,7 +75,7 @@ If you wish to contribute a new step, please use the following coding standards: 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new parameters to `nextflow.config` with a default (see below). -5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. @@ -86,11 +86,11 @@ If you wish to contribute a new step, please use the following coding standards: Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. -Once there, use `nf-core schema build` to add to `nextflow_schema.json`. +Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements -Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block. @@ -103,7 +103,7 @@ Please use the following naming schemes, to make it easy to understand what is g ### Nextflow version bumping -If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]` ### Images and figures diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 70d62a73..37e30a84 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,8 +17,8 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/diff - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/differentialabundance/tree/master/.github/CONTRIBUTING.md) - [ ] If necessary, also make a PR on the nf-core/differentialabundance _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. -- [ ] Make sure your code lints (`nf-core lint`). -- [ ] Ensure the test suite passes (`nf-test test main.nf.test -profile test,docker`). +- [ ] Make sure your code lints (`nf-core pipelines lint`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. - [ ] Output Documentation in `docs/output.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index dd68bc7f..0e295170 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,19 +1,36 @@ name: nf-core AWS full size tests -# This workflow is triggered on published releases. +# This workflow is triggered on PRs opened against the master branch. # It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: - release: - types: [published] + pull_request: + branches: + - master workflow_dispatch: + pull_request_review: + types: [submitted] + jobs: - run-tower: + run-platform: name: Run AWS full tests - if: github.repository == 'nf-core/differentialabundance' + # run only if the PR is approved by at least 2 reviewers and against the master branch or manually triggered + if: github.repository == 'nf-core/differentialabundance' && github.event.review.state == 'approved' && github.event.pull_request.base.ref == 'master' || github.event_name == 'workflow_dispatch' runs-on: ubuntu-latest steps: - - name: Launch workflow via tower + - uses: octokit/request-action@v2.x + id: check_approvals + with: + route: GET /repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/reviews + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - id: test_variables + if: github.event_name != 'workflow_dispatch' + run: | + JSON_RESPONSE='${{ steps.check_approvals.outputs.data }}' + CURRENT_APPROVALS_COUNT=$(echo $JSON_RESPONSE | jq -c '[.[] | select(.state | contains("APPROVED")) ] | length') + test $CURRENT_APPROVALS_COUNT -ge 2 || exit 1 # At least 2 approvals are required + - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 with: workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} @@ -30,7 +47,7 @@ jobs: - uses: actions/upload-artifact@v4 with: - name: Tower debug log file + name: Seqera Platform debug log file path: | - tower_action_*.log - tower_action_*.json + seqera_platform_action_*.log + seqera_platform_action_*.json diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index ed9b65e7..b3d06d05 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -5,13 +5,13 @@ name: nf-core AWS test on: workflow_dispatch: jobs: - run-tower: + run-platform: name: Run AWS tests if: github.repository == 'nf-core/differentialabundance' runs-on: ubuntu-latest steps: - # Launch workflow using Tower CLI tool action - - name: Launch workflow via tower + # Launch workflow using Seqera Platform CLI tool action + - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 with: workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} @@ -27,7 +27,7 @@ jobs: - uses: actions/upload-artifact@v4 with: - name: Tower debug log file + name: Seqera Platform debug log file path: | - tower_action_*.log - tower_action_*.json + seqera_platform_action_*.log + seqera_platform_action_*.json diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 876a22b1..e4673f07 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -7,9 +7,12 @@ on: pull_request: release: types: [published] + workflow_dispatch: env: NXF_ANSI_LOG: false + NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity + NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity concurrency: group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" @@ -17,36 +20,73 @@ concurrency: jobs: test: - name: Test Nextflow versions and profiles + name: "Run pipeline with test data (${{ matrix.NXF_VER }} | ${{ matrix.test_profile }} | ${{ matrix.compute_profile }})" # Only run on push if this is the nf-core dev branch (merged PRs) if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/differentialabundance') }}" runs-on: ubuntu-latest strategy: matrix: NXF_VER: - - "23.10.0" + - "24.04.2" - "latest-everything" - profile: + test_profile: - "test" - "test_nogtf" - "test_affy" - "test_maxquant" - "test_soft" + - "test_experimental" + compute_profile: + - "conda" + - "docker" + - "singularity" + test_name: + - "test" + isMaster: + - ${{ github.base_ref == 'master' }} + # Exclude conda and singularity on dev + exclude: + - isMaster: false + compute_profile: "conda" + - isMaster: false + compute_profile: "singularity" steps: - name: Check out pipeline code - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4 + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - name: Install Nextflow - uses: nf-core/setup-nextflow@v1 + - name: Set up Nextflow + uses: nf-core/setup-nextflow@v2 with: version: "${{ matrix.NXF_VER }}" - - name: Disk space cleanup + - name: Set up Apptainer + if: matrix.compute_profile == 'singularity' + uses: eWaterCycle/setup-apptainer@main + + - name: Set up Singularity + if: matrix.compute_profile == 'singularity' + run: | + mkdir -p $NXF_SINGULARITY_CACHEDIR + mkdir -p $NXF_SINGULARITY_LIBRARYDIR + + - name: Set up Miniconda + if: matrix.compute_profile == 'conda' + uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3 + with: + miniconda-version: "latest" + auto-update-conda: true + conda-solver: libmamba + channels: conda-forge,bioconda + + - name: Set up Conda + if: matrix.compute_profile == 'conda' + run: | + echo $(realpath $CONDA)/condabin >> $GITHUB_PATH + echo $(realpath python) >> $GITHUB_PATH + + - name: Clean up Disk space uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 - - name: Run pipeline with test data - # You can customise CI pipeline run tests as required - # For example: adding multiple test runs with different parameters - # Remember that you can parallelise this by using strategy.matrix + - name: "Run pipeline with test data ${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.compute_profile }}" run: | - nextflow run ${GITHUB_WORKSPACE} -profile docker,${{ matrix.profile }} --outdir ./results + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_profile }},${{ matrix.compute_profile }} --outdir ./results diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml index 08622fd5..713dc3e7 100644 --- a/.github/workflows/download_pipeline.yml +++ b/.github/workflows/download_pipeline.yml @@ -1,4 +1,4 @@ -name: Test successful pipeline download with 'nf-core download' +name: Test successful pipeline download with 'nf-core pipelines download' # Run the workflow when: # - dispatched manually @@ -8,12 +8,14 @@ on: workflow_dispatch: inputs: testbranch: - description: "The specific branch you wish to utilize for the test execution of nf-core download." + description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download." required: true default: "dev" pull_request: types: - opened + - edited + - synchronize branches: - master pull_request_target: @@ -28,15 +30,20 @@ jobs: runs-on: ubuntu-latest steps: - name: Install Nextflow - uses: nf-core/setup-nextflow@v1 + uses: nf-core/setup-nextflow@v2 - - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - name: Disk space cleanup + uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 + + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: - python-version: "3.11" + python-version: "3.12" architecture: "x64" - - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7 + + - name: Setup Apptainer + uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0 with: - singularity-version: 3.8.3 + apptainer-version: 1.3.4 - name: Install dependencies run: | @@ -49,24 +56,64 @@ jobs: echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV} echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV} + - name: Make a cache directory for the container images + run: | + mkdir -p ./singularity_container_images + - name: Download the pipeline env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images run: | - nf-core download ${{ env.REPO_LOWERCASE }} \ + nf-core pipelines download ${{ env.REPO_LOWERCASE }} \ --revision ${{ env.REPO_BRANCH }} \ --outdir ./${{ env.REPOTITLE_LOWERCASE }} \ --compress "none" \ --container-system 'singularity' \ - --container-library "quay.io" -l "docker.io" -l "ghcr.io" \ + --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io" \ --container-cache-utilisation 'amend' \ - --download-configuration + --download-configuration 'yes' - name: Inspect download run: tree ./${{ env.REPOTITLE_LOWERCASE }} - - name: Run the downloaded pipeline + - name: Count the downloaded number of container images + id: count_initial + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Initial container image count: $image_count" + echo "IMAGE_COUNT_INITIAL=$image_count" >> ${GITHUB_ENV} + + - name: Run the downloaded pipeline (stub) + id: stub_run_pipeline + continue-on-error: true env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results + - name: Run the downloaded pipeline (stub run not supported) + id: run_pipeline + if: ${{ job.steps.stub_run_pipeline.status == failure() }} + env: + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images + NXF_SINGULARITY_HOME_MOUNT: true + run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results + + - name: Count the downloaded number of container images + id: count_afterwards + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Post-pipeline run container image count: $image_count" + echo "IMAGE_COUNT_AFTER=$image_count" >> ${GITHUB_ENV} + + - name: Compare container image counts + run: | + if [ "${{ env.IMAGE_COUNT_INITIAL }}" -ne "${{ env.IMAGE_COUNT_AFTER }}" ]; then + initial_count=${{ env.IMAGE_COUNT_INITIAL }} + final_count=${{ env.IMAGE_COUNT_AFTER }} + difference=$((final_count - initial_count)) + echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!" + tree ./singularity_container_images + exit 1 + else + echo "The pipeline can be downloaded successfully!" + fi diff --git a/.github/workflows/fix-linting.yml b/.github/workflows/fix-linting.yml index 07013265..01392a8f 100644 --- a/.github/workflows/fix-linting.yml +++ b/.github/workflows/fix-linting.yml @@ -13,7 +13,7 @@ jobs: runs-on: ubuntu-latest steps: # Use the @nf-core-bot token to check out so we can push later - - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4 + - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 with: token: ${{ secrets.nf_core_bot_auth_token }} @@ -32,9 +32,9 @@ jobs: GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }} # Install and run pre-commit - - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: - python-version: 3.11 + python-version: "3.12" - name: Install pre-commit run: pip install pre-commit diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 073e1876..a502573c 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,6 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure +# It runs the `nf-core pipelines lint` and markdown lint tests to ensure # that the code meets the nf-core guidelines. on: push: @@ -14,13 +14,12 @@ jobs: pre-commit: runs-on: ubuntu-latest steps: - - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4 + - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - name: Set up Python 3.11 - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - name: Set up Python 3.12 + uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: - python-version: 3.11 - cache: "pip" + python-version: "3.12" - name: Install pre-commit run: pip install pre-commit @@ -32,27 +31,42 @@ jobs: runs-on: ubuntu-latest steps: - name: Check out pipeline code - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4 + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - name: Install Nextflow - uses: nf-core/setup-nextflow@v1 + uses: nf-core/setup-nextflow@v2 - - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: - python-version: "3.11" + python-version: "3.12" architecture: "x64" + - name: read .nf-core.yml + uses: pietrobolcato/action-read-yaml@1.1.0 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + - name: Install dependencies run: | python -m pip install --upgrade pip - pip install nf-core + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Run nf-core pipelines lint + if: ${{ github.base_ref != 'master' }} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - - name: Run nf-core lint + - name: Run nf-core pipelines lint --release + if: ${{ github.base_ref == 'master' }} env: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} @@ -60,7 +74,7 @@ jobs: - name: Upload linting log file artifact if: ${{ always() }} - uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4 + uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4 with: name: linting-logs path: | diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index b706875f..42e519bf 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Download lint results - uses: dawidd6/action-download-artifact@f6b0bace624032e30a85a8fd9c1a7f8f611f5737 # v3 + uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6 with: workflow: linting.yml workflow_conclusion: completed diff --git a/.github/workflows/release-announcements.yml b/.github/workflows/release-announcements.yml index d468aeaa..c6ba35df 100644 --- a/.github/workflows/release-announcements.yml +++ b/.github/workflows/release-announcements.yml @@ -12,7 +12,7 @@ jobs: - name: get topics and convert to hashtags id: get_topics run: | - curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ' >> $GITHUB_OUTPUT + echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" | sed 's/-//g' >> $GITHUB_OUTPUT - uses: rzr/fediverse-action@master with: @@ -25,13 +25,13 @@ jobs: Please see the changelog: ${{ github.event.release.html_url }} - ${{ steps.get_topics.outputs.GITHUB_OUTPUT }} #nfcore #openscience #nextflow #bioinformatics + ${{ steps.get_topics.outputs.topics }} #nfcore #openscience #nextflow #bioinformatics send-tweet: runs-on: ubuntu-latest steps: - - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: python-version: "3.10" - name: Install dependencies diff --git a/.github/workflows/template_version_comment.yml b/.github/workflows/template_version_comment.yml new file mode 100644 index 00000000..e8aafe44 --- /dev/null +++ b/.github/workflows/template_version_comment.yml @@ -0,0 +1,46 @@ +name: nf-core template version comment +# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version. +# It posts a comment to the PR, even if it comes from a fork. + +on: pull_request_target + +jobs: + template_version: + runs-on: ubuntu-latest + steps: + - name: Check out pipeline code + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + with: + ref: ${{ github.event.pull_request.head.sha }} + + - name: Read template version from .nf-core.yml + uses: nichmor/minimal-read-yaml@v0.0.2 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + + - name: Install nf-core + run: | + python -m pip install --upgrade pip + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Check nf-core outdated + id: nf_core_outdated + run: echo "OUTPUT=$(pip list --outdated | grep nf-core)" >> ${GITHUB_ENV} + + - name: Post nf-core template version comment + uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2 + if: | + contains(env.OUTPUT, 'nf-core') + with: + repo-token: ${{ secrets.NF_CORE_BOT_AUTH_TOKEN }} + allow-repeats: false + message: | + > [!WARNING] + > Newer version of the nf-core template is available. + > + > Your pipeline is using an old version of the nf-core template: ${{ steps.read_yml.outputs['nf_core_version'] }}. + > Please update your pipeline to the latest version. + > + > For more documentation on how to update your pipeline, please see the [nf-core documentation](https://github.com/nf-core/tools?tab=readme-ov-file#sync-a-pipeline-with-the-template) and [Synchronisation documentation](https://nf-co.re/docs/contributing/sync). + # diff --git a/.gitignore b/.gitignore index 5124c9ac..a42ce016 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,4 @@ results/ testing/ testing* *.pyc +null/ diff --git a/.gitpod.yml b/.gitpod.yml index 105a1821..46118637 100644 --- a/.gitpod.yml +++ b/.gitpod.yml @@ -4,17 +4,14 @@ tasks: command: | pre-commit install --install-hooks nextflow self-update - - name: unset JAVA_TOOL_OPTIONS - command: | - unset JAVA_TOOL_OPTIONS vscode: extensions: # based on nf-core.nf-core-extensionpack - - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + #- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar - mechatroner.rainbow-csv # Highlight columns in csv files in different colors - # - nextflow.nextflow # Nextflow syntax highlighting + - nextflow.nextflow # Nextflow syntax highlighting - oderwat.indent-rainbow # Highlight indentation level - streetsidesoftware.code-spell-checker # Spelling checker for source code - charliermarsh.ruff # Code linter Ruff diff --git a/.nf-core.yml b/.nf-core.yml index d2cda970..29a057fb 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,4 +1,3 @@ -repository_type: pipeline lint: nextflow_config: - config_defaults: @@ -6,3 +5,21 @@ lint: - params.css_file - params.citations_file - params.report_file + - params.tools + multiqc_config: False + files_exist: + - assets/multiqc_config.yml +nf_core_version: 3.0.2 +repository_type: pipeline +template: + author: Oskar Wacker, Jonathan Manning + description: Differential abundance analysis + force: false + is_nfcore: true + name: differentialabundance + org: nf-core + outdir: . + skip_features: + - fastqc + - multiqc + version: 1.6.0dev diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index af57081f..9e9f0e1c 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -3,8 +3,11 @@ repos: rev: "v3.1.0" hooks: - id: prettier + additional_dependencies: + - prettier@3.2.5 + - repo: https://github.com/editorconfig-checker/editorconfig-checker.python - rev: "2.7.3" + rev: "3.0.3" hooks: - id: editorconfig-checker alias: ec diff --git a/CHANGELOG.md b/CHANGELOG.md index 3d844a1e..c3e2904f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,10 +3,25 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.5.0 +## v1.6.0dev - xxxx-xx-xx + +### Added + +### Fixed + +- [[#299](https://github.com/nf-core/differentialabundance/pull/299)] - Add exclusions for 3.0.1 template update ([@pinin4fjords](https://github.com/pinin4fjords)) +- [[#280](https://github.com/nf-core/differentialabundance/pull/280)] - Bump shinyngs, fix contrasts passed to app creation ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO)) +- [[#274](https://github.com/nf-core/differentialabundance/pull/274)] - Fix pagination on samples table ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO)) +- [[#272](https://github.com/nf-core/differentialabundance/pull/272)] - Show >10 contrasts in report ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO)) +- [[#282](https://github.com/nf-core/differentialabundance/pull/282)] - In order to improve resumability, create a copy of the matrix as an annotation file only if necessary; add original matrix file name to copy name ([@bjlang](https://github.com/bjlang), review by [@WackerO](https://github.com/WackerO)) + +### Changed + +## v1.5.0 - 2024-05-08 ### `Added` +- [[#273](https://github.com/nf-core/differentialabundance/pull/273)] - Template update for nf-core/tools v2.14.1 ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords)) - [[#266](https://github.com/nf-core/differentialabundance/pull/266)] - Fix logging by specifying assays to log ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO)) - [[#259](https://github.com/nf-core/differentialabundance/pull/259)] - Bump gtf2featureannotation to fix GTF handling error ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO)) - [[#257](https://github.com/nf-core/differentialabundance/pull/257)] - Added maxquant profile to nextflow.config to make it available ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords)) @@ -21,6 +36,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### `Fixed` +- [[#278](https://github.com/nf-core/differentialabundance/pull/278)] - Fix missing ch_gene_sets when running gprofiler without gene sets ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords)) - [[#267](https://github.com/nf-core/differentialabundance/pull/267)] - Whitespace fix, remove TODO, also update changelog for release release 1.5.0 ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords)) - [[#265](https://github.com/nf-core/differentialabundance/pull/265)] - GSEA- pngs and htmls in same place ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO)) - [[#257](https://github.com/nf-core/differentialabundance/pull/257)] - Fixed FILTER_DIFFTABLE module, updated PROTEUS module to better handle whitespace in prefix param, made docs clearer ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords)) diff --git a/README.md b/README.md index adaee862..458464fe 100644 --- a/README.md +++ b/README.md @@ -9,11 +9,11 @@ [![GitHub Actions Linting Status](https://github.com/nf-core/differentialabundance/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/differentialabundance/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/differentialabundance/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.7568000-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.7568000) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.10.0-23aa62.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) -[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/nf-core/differentialabundance) +[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/differentialabundance) [![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23differentialabundance-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/differentialabundance)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core) @@ -75,8 +75,7 @@ Affymetrix microarray: ``` > [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). +> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/differentialabundance/usage) and the [parameter documentation](https://nf-co.re/differentialabundance/parameters). diff --git a/assets/differentialabundance_report.Rmd b/assets/differentialabundance_report.Rmd index 7bf41c12..b13d321c 100644 --- a/assets/differentialabundance_report.Rmd +++ b/assets/differentialabundance_report.Rmd @@ -487,7 +487,7 @@ display_columns <- head(union(display_columns, additional_useful_cols), 5) display_columns <- unique(c(display_columns, informative_variables)) observations_to_print <- observations[,unique(display_columns)] colnames(observations_to_print) <- prettifyVariablename(colnames(observations_to_print)) -print( htmltools::tagList(datatable(observations_to_print, caption = paste(ucfirst(params$observations_type), 'metadata'), rownames = FALSE, options = list(dom = 'tb')) )) +print( htmltools::tagList(datatable(observations_to_print, caption = paste(ucfirst(params$observations_type), 'metadata'), rownames = FALSE, options = list(dom = 'tp')) )) ``` ## Contrasts @@ -510,7 +510,7 @@ contrasts_to_print$model <- sapply(contrasts_to_print$Id, function(id) { } }) -print( htmltools::tagList(datatable(contrasts_to_print, caption = paste0("Table of contrasts"), rownames = FALSE, options = list(dom = 't')) )) +print( htmltools::tagList(datatable(contrasts_to_print, caption = paste0("Table of contrasts"), rownames = FALSE, options = list(dom = ifelse(nrow(contrasts_to_print) > 10, 'tp', 't'))) )) ``` # Results diff --git a/assets/methods_description_template.yml b/assets/methods_description_template.yml deleted file mode 100644 index 198410ef..00000000 --- a/assets/methods_description_template.yml +++ /dev/null @@ -1,28 +0,0 @@ -id: "nf-core-differentialabundance-methods-description" -description: "Suggested text and references to use when describing pipeline usage within the methods section of a publication." -section_name: "nf-core/differentialabundance Methods Description" -section_href: "https://github.com/nf-core/differentialabundance" -plot_type: "html" -## You inject any metadata in the Nextflow '${workflow}' object -data: | -

Methods

-

Data was processed using nf-core/differentialabundance v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (Ewels et al., 2020), utilising reproducible software environments from the Bioconda (Grüning et al., 2018) and Biocontainers (da Veiga Leprevost et al., 2017) projects.

-

The pipeline was executed with Nextflow v${workflow.nextflow.version} (Di Tommaso et al., 2017) with the following command:

-
${workflow.commandLine}
-

${tool_citations}

-

References

-
    -
  • Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: 10.1038/nbt.3820
  • -
  • Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: 10.1038/s41587-020-0439-x
  • -
  • Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: 10.1038/s41592-018-0046-7
  • -
  • da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: 10.1093/bioinformatics/btx192
  • - ${tool_bibliography} -
-
-
Notes:
-
    - ${nodoi_text} -
  • The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!
  • -
  • You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.
  • -
-
diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml deleted file mode 100644 index bc8bed92..00000000 --- a/assets/multiqc_config.yml +++ /dev/null @@ -1,13 +0,0 @@ -report_comment: > - This report has been generated by the nf-core/differentialabundance analysis pipeline. For information about how to interpret these results, please see the documentation. -report_section_order: - "nf-core-differentialabundance-methods-description": - order: -1000 - software_versions: - order: -1001 - "nf-core-differentialabundance-summary": - order: -1002 - -export_plots: true - -disable_version_detection: true diff --git a/assets/schema_input.json b/assets/schema_input.json index 6a90aaeb..7ecfbaea 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/differentialabundance/master/assets/schema_input.json", "title": "nf-core/differentialabundance pipeline - params.input schema", "description": "Schema for the file provided with params.input", diff --git a/assets/schema_tools.json b/assets/schema_tools.json new file mode 100644 index 00000000..dcbbe9f8 --- /dev/null +++ b/assets/schema_tools.json @@ -0,0 +1,45 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "title": "nf-core/differentialabundance - params.tools schema", + "description": "Schema for the file provided with params.tools", + "type": "array", + "items": { + "type": "object", + "properties": { + "pathway_name": { + "type": "string", + "meta": ["pathway_name"] + }, + "diff_method": { + "type": "string", + "errorMessage": "choose a differential analysis method (eg. deseq2, limma, propd) or none", + "meta": ["diff_method"], + "enum": ["propd", "deseq2", "limma", "none"] + }, + "args_diff": { + "type": "string", + "meta": ["args_diff"] + }, + "cor_method": { + "type": "string", + "meta": ["cor_method"], + "errorMessage": "choose a correlation method (eg. propr) or none", + "enum": ["propr", "none"] + }, + "args_cor": { + "type": "string", + "meta": ["args_cor"] + }, + "enr_method": { + "type": "string", + "meta": ["enr_method"], + "errorMessage": "choose a functional enrichment analysis method (eg. gsea, grea, gprofiler, etc) or none" + }, + "args_enr": { + "type": "string", + "meta": ["args_enr"] + } + }, + "required": [] + } +} diff --git a/assets/tools_samplesheet.csv b/assets/tools_samplesheet.csv new file mode 100644 index 00000000..ad8dfab4 --- /dev/null +++ b/assets/tools_samplesheet.csv @@ -0,0 +1,13 @@ +pathway_name,diff_method,args_diff,cor_method,args_cor,enr_method,args_enr +deseq2,deseq2,,,,, +deseq2_gsea,deseq2,,,,gsea, +deseq2_ora,deseq2,,,,gprofiler2, +limma,limma,,,,, +limma_ora,limma,,,,gprofiler2, +propd,propd,,,,, +propd_fdr,propd,--permutation 100,,,, +propd_grea,propd,,,,grea, +propd_ora,propd,,,,gprofiler2, +pcorbshrink,,,propr,--metric pcor.bshrink,, +propr,,,propr,--metric rho,, +cor,,,propr,--metric cor,, diff --git a/conf/base.config b/conf/base.config index 99e804ea..6bfd4e27 100644 --- a/conf/base.config +++ b/conf/base.config @@ -10,9 +10,10 @@ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + // TODO nf-core: Check the defaults for all processes + cpus = { 1 * task.attempt } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' } maxRetries = 1 @@ -25,30 +26,30 @@ process { // adding in your local modules too. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors withLabel:process_single { - cpus = { check_max( 1 , 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 12.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 36.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } + cpus = { 6 * task.attempt } + memory = { 36.GB * task.attempt } + time = { 8.h * task.attempt } } withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 72.GB * task.attempt, 'memory' ) } - time = { check_max( 16.h * task.attempt, 'time' ) } + cpus = { 12 * task.attempt } + memory = { 72.GB * task.attempt } + time = { 16.h * task.attempt } } withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } + time = { 20.h * task.attempt } } withLabel:process_high_memory { - memory = { check_max( 200.GB * task.attempt, 'memory' ) } + memory = { 200.GB * task.attempt } } withLabel:error_ignore { errorStrategy = 'ignore' diff --git a/conf/igenomes_ignored.config b/conf/igenomes_ignored.config new file mode 100644 index 00000000..b4034d82 --- /dev/null +++ b/conf/igenomes_ignored.config @@ -0,0 +1,9 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for iGenomes paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Empty genomes dictionary to use when igenomes is ignored. +---------------------------------------------------------------------------------------- +*/ + +params.genomes = [:] diff --git a/conf/modules.config b/conf/modules.config index 9313457d..1bb43755 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -367,7 +367,7 @@ process { path: { "${params.outdir}/plots/exploratory" }, mode: params.publish_dir_mode, ] - memory = { check_max( 12.GB * task.attempt, 'memory' ) } + memory = { 12.GB * task.attempt } ext.args = { [ "--sample_id_col \"${params.observations_id_col}\"", "--feature_id_col \"${params.features_id_col}\"", @@ -384,7 +384,7 @@ process { path: { "${params.outdir}/plots/differential" }, mode: params.publish_dir_mode, ] - memory = { check_max( 12.GB * task.attempt, 'memory' ) } + memory = { 12.GB * task.attempt } ext.args = { [ "--feature_id_col \"${params.features_id_col}\"", "--reference_level \"$meta.reference\"", @@ -406,7 +406,7 @@ process { path: { "${params.outdir}/shinyngs_app" }, mode: params.publish_dir_mode, ] - memory = { check_max( 12.GB * task.attempt, 'memory' ) } + memory = { 12.GB * task.attempt } ext.args = { [ "--assay_names \"${params.exploratory_assay_names}\"", "--sample_id_col \"${params.observations_id_col}\"", @@ -420,7 +420,7 @@ process { ((params.report_title == null) ? '' : "--title \"$params.report_title\""), ((params.report_author == null) ? '' : "--author \"$params.report_author\""), ((params.report_description == null) ? '' : "--description \"$params.report_description\""), - ((params.shinyngs_guess_unlog_matrices) ? "--guess_unlog_matrices" : ''), + ( (params.study_type == 'maxquant') ? "--log2_assays ''" : (((params.exploratory_log2_assays == null) ? '' : "--log2_assays \"$params.exploratory_log2_assays\"".replace('[', '').replace(']', ''))) ), ((params.shinyngs_deploy_to_shinyapps_io) ? "--deploy_app" : ''), ((params.shinyngs_shinyapps_account == null) ? '' : "--shinyapps_account \"$params.shinyngs_shinyapps_account\""), ((params.shinyngs_shinyapps_app_name == null) ? '' : "--shinyapps_name \"$params.shinyngs_shinyapps_app_name\"") @@ -481,4 +481,88 @@ process { enabled: false ] } + + // TODO + // for the moment, the parameters from toolsheet are parsed here. + // For the repeated arguments, since the R argument parser will only take the first one, + // here I add the toolsheet arguments at the beginning of ext.args to make sure they are + // taken into account instead of the default options. + // However, later we need to better handle this, maybe by a bit of groovy scripting to + // overwrite the repeated parameters (?) + + withName: "PROPR"{ + tag = { [ + "$meta.id", + (meta.args_cor ? "args_cor = $meta.args_cor" : '') + ].join(',').trim() } + ext.args = { [ + (meta.args_cor ? "${meta.args_cor}" : ''), + "--features_id_col ${params.features_id_col}", + "--metric ${params.propr_metric}", + "--ivar ${params.propr_ivar}", + (params.propr_alpha ? "--alpha ${params.propr_alpha}" : ''), + "--fdr ${params.propr_fdr}", + "--tails ${params.propr_tails}", + "--permutation ${params.propr_permutation}", + "--number_of_cutoffs ${params.propr_ncutoffs}" + ].join(' ').trim() } + publishDir = [ + path: { "${params.outdir}/correlation_analysis/propr-${meta.pathway_name ?: params.pathway}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: "PROPD"{ + tag = { [ + "$meta.id", + "$meta.contrast", + (meta.args_diff ? "args_diff = $meta.args_diff" : '') + ].join(',').trim() } + ext.args = { [ + (meta.args_diff ? "${meta.args_diff}" : ''), + "--features_id_col ${params.features_id_col}", + "--obs_id_col ${params.observations_id_col}", + (params.propd_alpha ? "--alpha ${params.propd_alpha}" : ''), + "--moderated ${params.propd_moderated}", + "--fdr ${params.propd_fdr}", + "--permutation ${params.propd_permutation}", + "--number_of_cutoffs ${params.propd_ncutoffs}", + "--weighted_degree ${params.propd_weighted_degree}" + ].join(' ').trim() } + publishDir = [ + path: { "${params.outdir}/differential_analysis/propr-${meta.pathway_name ?: params.pathway}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: "GREA"{ + tag = { [ + "$meta.id", + (meta.args_enr ? "args_enr = $meta.args_enr" : '') + ].join(',').trim() } + ext.args = { [ + (meta.args_enr ? "${meta.args_enr}" : ''), + "--set_min ${params.grea_set_min}", + "--set_max ${params.grea_set_max}", + "--permutation ${params.grea_permutation}" + ].join(' ').trim() } + publishDir = [ + path: { "${params.outdir}/enrichment_analysis/propr-${meta.pathway_name ?: params.pathway}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + withName: "MYGENE" { + ext.args = [ + "--columname ${params.features_id_col}" + ].join(' ').trim() + publishDir = [ + path: "${params.outdir}/enrichment_analysis/mygene", + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } diff --git a/conf/test.config b/conf/test.config index 4d4e9454..4ce34bf3 100644 --- a/conf/test.config +++ b/conf/test.config @@ -10,18 +10,19 @@ ---------------------------------------------------------------------------------------- */ -includeConfig 'rnaseq.config' +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} params { study_name = 'SRP254919' config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data input = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.samplesheet.csv' diff --git a/conf/test_experimental.config b/conf/test_experimental.config new file mode 100644 index 00000000..d18e2627 --- /dev/null +++ b/conf/test_experimental.config @@ -0,0 +1,55 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests without a GTF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple + pipeline test of the CoDA experimental mode. + + Use as follows: + nextflow run nf-core/differentialabundance -profile test_nogtf, --outdir + +---------------------------------------------------------------------------------------- +*/ + +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + +params { + study_name = 'SRP254919' + study_type = 'experimental' + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.samplesheet.csv' + matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.salmon.merged.gene_counts.top1000cov.tsv' + contrasts = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.contrasts.csv' + + // tool combinations to test + pathway = "deseq2,deseq2_ora,limma,limma_ora,propd,propd_grea,propd_ora,propr,cor" + + // some pathways are not included in the test, because they don't make sense to test for this dataset + // for example propd_fdr don't give significant results on this dataset, and the permutation will be scaped + // and pcor.bshrink is taking too long for this test (15min), and it should be tested on matrices with n > p, or n close to p. (eg. specific pathways, or on DE genes only, etc) + + //Features + features_metadata_cols = 'gene_id,gene_name' + + // Observations + observations_id_col = 'sample' + observations_name_col = 'sample' + + // Apply a higher filter to check that the filtering works + filtering_min_abundance = 10 + + // Exploratory + exploratory_main_variable = 'contrasts' + + // deseq2 specific + deseq2_vst_nsub = 900 // only 982 features in this dataset, so nsub=1000 not feasible +} diff --git a/docs/usage.md b/docs/usage.md index d474d611..59510554 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -368,9 +368,9 @@ The above pipeline run specified with a params file in yaml format: nextflow run nf-core/differentialabundance -profile docker -params-file params.yaml ``` -with `params.yaml` containing: +with: -```yaml +```yaml title="params.yaml" input: './samplesheet.csv' outdir: './results/' genome: 'GRCh37' @@ -393,7 +393,7 @@ It is a good idea to specify a pipeline version when running the pipeline on you First, go to the [nf-core/differentialabundance releases page](https://github.com/nf-core/differentialabundance/releases) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. -This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. For example, at the bottom of the MultiQC reports. +This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. To further assist in reproducibility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter. @@ -439,6 +439,8 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof - A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) - `apptainer` - A generic configuration profile to be used with [Apptainer](https://apptainer.org/) +- `wave` + - A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later). - `conda` - A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer. @@ -506,14 +508,6 @@ See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -## Azure Resource Requests - -To be used with the `azurebatch` profile by specifying the `-profile azurebatch`. -We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required. - -Note that the choice of VM size depends on your quota and the overall workload during the analysis. -For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes). - ## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. diff --git a/main.nf b/main.nf index 8d350dad..7f48b900 100644 --- a/main.nf +++ b/main.nf @@ -9,8 +9,6 @@ ---------------------------------------------------------------------------------------- */ -nextflow.enable.dsl = 2 - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS @@ -20,7 +18,6 @@ nextflow.enable.dsl = 2 include { DIFFERENTIALABUNDANCE } from './workflows/differentialabundance' include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_differentialabundance_pipeline' include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_differentialabundance_pipeline' - include { getGenomeAttribute } from './subworkflows/local/utils_nfcore_differentialabundance_pipeline' /* @@ -41,6 +38,12 @@ params.gtf = getGenomeAttribute('gtf') // WORKFLOW: Run main analysis pipeline depending on type of input // workflow NFCORE_DIFFERENTIALABUNDANCE { + + main: + + // + // WORKFLOW: Run pipeline + // DIFFERENTIALABUNDANCE () } /* @@ -51,12 +54,13 @@ workflow NFCORE_DIFFERENTIALABUNDANCE { workflow { + main: + // // SUBWORKFLOW: Run initialisation tasks // PIPELINE_INITIALISATION ( params.version, - params.help, params.validate_params, params.monochrome_logs, args, @@ -67,8 +71,20 @@ workflow { // // WORKFLOW: Run main workflow // + NFCORE_DIFFERENTIALABUNDANCE () + // + // SUBWORKFLOW: Run completion tasks + // + PIPELINE_COMPLETION ( + params.email, + params.email_on_fail, + params.plaintext_email, + params.outdir, + params.monochrome_logs, + params.hook_url, + ) } /* diff --git a/modules.json b/modules.json index 6362693e..131ea190 100644 --- a/modules.json +++ b/modules.json @@ -17,7 +17,7 @@ }, "custom/matrixfilter": { "branch": "master", - "git_sha": "f4ae1d942bd50c5c0b9bd2de1393ce38315ba57c", + "git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd", "installed_by": ["modules"] }, "custom/tabulartogseacls": { @@ -60,6 +60,11 @@ "git_sha": "9326d73af3fbe2ee90d9ce0a737461a727c5118e", "installed_by": ["modules"] }, + "mygene": { + "branch": "master", + "git_sha": "82024cf6325d2ee194e7f056d841ecad2f6856e9", + "installed_by": ["modules"] + }, "proteus/readproteingroups": { "branch": "master", "git_sha": "a069b29783583c219c1f23ed3dcf64a5aee1340b", @@ -72,22 +77,22 @@ }, "shinyngs/app": { "branch": "master", - "git_sha": "85519fe9deccf2c5f7ff1f3b5d3494c61a794643", + "git_sha": "91fc36585a50f9bae98cb5b3dff36ce64c83a6b4", "installed_by": ["modules"] }, "shinyngs/staticdifferential": { "branch": "master", - "git_sha": "85519fe9deccf2c5f7ff1f3b5d3494c61a794643", + "git_sha": "91fc36585a50f9bae98cb5b3dff36ce64c83a6b4", "installed_by": ["modules"] }, "shinyngs/staticexploratory": { "branch": "master", - "git_sha": "85519fe9deccf2c5f7ff1f3b5d3494c61a794643", + "git_sha": "91fc36585a50f9bae98cb5b3dff36ce64c83a6b4", "installed_by": ["modules"] }, "shinyngs/validatefomcomponents": { "branch": "master", - "git_sha": "85519fe9deccf2c5f7ff1f3b5d3494c61a794643", + "git_sha": "91fc36585a50f9bae98cb5b3dff36ce64c83a6b4", "installed_by": ["modules"] }, "untar": { @@ -106,17 +111,17 @@ "nf-core": { "utils_nextflow_pipeline": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "3aa0aec1d52d492fe241919f0c6100ebf0074082", "installed_by": ["subworkflows"] }, "utils_nfcore_pipeline": { "branch": "master", - "git_sha": "92de218a329bfc9a9033116eb5f65fd270e72ba3", + "git_sha": "1b6b9a3338d011367137808b49b923515080e3ba", "installed_by": ["subworkflows"] }, - "utils_nfvalidation_plugin": { + "utils_nfschema_plugin": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c", "installed_by": ["subworkflows"] } } diff --git a/modules/local/propr/grea/main.nf b/modules/local/propr/grea/main.nf new file mode 100644 index 00000000..2b503133 --- /dev/null +++ b/modules/local/propr/grea/main.nf @@ -0,0 +1,24 @@ +process PROPR_GREA { + tag "$meta.id" + label 'process_high' + + // conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'oras://community.wave.seqera.io/library/bioconductor-limma_r-ggplot2_r-propr:209490acb0e524e3' : + 'community.wave.seqera.io/library/bioconductor-limma_r-ggplot2_r-propr:17abd3f137436739' }" + + input: + tuple val(meta), path(adj) + tuple val(meta2), path(gmt) + + output: + tuple val(meta), path("*.grea.tsv"), emit: results + path "versions.yml", emit: versions + path "*.R_sessionInfo.log", emit: session_info + + when: + task.ext.when == null || task.ext.when + + script: + template 'grea.R' +} diff --git a/modules/local/propr/grea/templates/grea.R b/modules/local/propr/grea/templates/grea.R new file mode 100644 index 00000000..f5d50374 --- /dev/null +++ b/modules/local/propr/grea/templates/grea.R @@ -0,0 +1,279 @@ +#!/usr/bin/env Rscript + +################################################ +################################################ +## Functions ## +################################################ +################################################ + +#' Parse out options from a string without recourse to optparse +#' +#' @param x Long-form argument list like --opt1 val1 --opt2 val2 +#' +#' @return named list of options and values similar to optparse +parse_args <- function(x){ + args_list <- unlist(strsplit(x, ' ?--')[[1]])[-1] + args_vals <- lapply(args_list, function(x) scan(text=x, what='character', quiet = TRUE)) + + # Ensure the option vectors are length 2 (key/ value) to catch empty ones + args_vals <- lapply(args_vals, function(z){ length(z) <- 2; z}) + + parsed_args <- structure(lapply(args_vals, function(x) x[2]), names = lapply(args_vals, function(x) x[1])) + parsed_args[! is.na(parsed_args)] +} + +#' Flexibly read CSV or TSV files (determined by file extension) +#' +#' @param file Input file +#' @param header Boolean. TRUE if first row is header. False without header. +#' @param row.names The first column is used as row names by default. +#' Otherwise, give another number. Or use NULL when no row.names are present. +#' +#' @return output Data frame +read_delim_flexible <- function(file, header = TRUE, row.names = 1, check.names = TRUE){ + + ext <- tolower(tail(strsplit(basename(file), split = "\\\\.")[[1]], 1)) # Get the file extension + + if (ext == "tsv" || ext == "txt") { # If the file is a tsv or txt file + separator <- "\\t" # Set the separator variable to tab + } else if (ext == "csv") { # If the file is a csv file + separator <- "," + } else { + stop(paste("Unknown separator for", ext)) + } + + mat <- read.delim( # Read the file + file, + sep = separator, # Set the separator defined above + header = header, + row.names = row.names, + check.names = check.names + ) +} + +#' Loads the .gmt file and converts it into a knowledge database +#' +#' @param filename path of the .gmt file +#' @param genes vector of gene names. Note that this set should be as complete as possible. +#' So it should not only contain the target genes but also the background genes. +#' @return output a list with: `db` A knowledge database where each row is a graph node (eg. gene) +#' and each column is a concept (eg. GO term, pathway, etc) and `description` A list of descriptions +#' for each concept +load_gmt <- function(filename, nodes) { + + # read gmt file + gmt <- readLines(filename) + gmt <- strsplit(gmt, "\\t") + + # initialize database matrix + db <- matrix(0, nrow = length(nodes), ncol = length(gmt)) + rownames(db) <- nodes + colnames(db) <- sapply(gmt, function(entry) entry[[1]]) + + # description of the concepts + description <- list() + + # for concept in gmt + for (i in 1:length(gmt)) { + + # get concept and description + concept <- gmt[[i]][[1]] + description[[concept]] <- gmt[[i]][[2]] + + # fill 1 if gene is in concept + nodes_in_concept <- gmt[[i]][-c(1, 2)] + nodes_in_concept <- nodes_in_concept[nodes_in_concept %in% nodes] + db[nodes_in_concept, i] <- 1 + } + + return(list(db = db, description = description)) +} + +################################################ +################################################ +## Parse arguments ## +################################################ +################################################ + +# Set defaults and classes + +opt <- list( + prefix = ifelse('$task.ext.prefix' == 'null', '$meta.id', '$task.ext.prefix'), + + # input data + adj = '$adj', # adjacency matrix + gmt = '$gmt', # knowledge database .gmt file + + # parameters for gene sets + set_min = 15, # minimum number of genes in a set + set_max = 500, # maximum number of genes in a set + + # parameters for permutation test + permutation = 100, + + # other options + seed = NA, + ncores = as.integer('$task.cpus') +) + +opt_types <- list( + prefix = 'character', + adj = 'character', + gmt = 'character', + set_min = 'numeric', + set_max = 'numeric', + permutation = 'numeric', + seed = 'numeric', + ncores = 'numeric' +) + +# Apply parameter overrides + +args_opt <- parse_args('$task.ext.args') +for ( ao in names(args_opt)){ + if (! ao %in% names(opt)){ + stop(paste("Invalid option:", ao)) + } else { + + # Preserve classes from defaults where possible + args_opt[[ao]] <- as(args_opt[[ao]], opt_types[[ao]]) + + # handle NA, and avoid errors when NA is provided by user as character + if (args_opt[[ao]] %in% c('NA', NA)) args_opt[[ao]] <- NA + + # replace values + opt[[ao]] <- args_opt[[ao]] + } +} + +# Check if required parameters have been provided + +required_opts <- c('adj', 'gmt') # defines a vector required_opts containing the names of the required parameters. +missing <- required_opts[unlist(lapply(opt[required_opts], is.null)) | ! required_opts %in% names(opt)] +if (length(missing) > 0){ + stop(paste("Missing required options:", paste(missing, collapse=', '))) +} + +# Check file inputs are valid +for (file_input in c('adj', 'gmt')){ + if (is.null(opt[[file_input]])) { + stop(paste("Please provide", file_input), call. = FALSE) + } + if (! file.exists(opt[[file_input]])){ + stop(paste0('Value of ', file_input, ': ', opt[[file_input]], ' is not a valid file')) + } +} + +# TODO maybe add a function to pretty print the arguments? +print(opt) + +################################################ +################################################ +## Finish loading libraries ## +################################################ +################################################ + +library(propr) + +################################################ +################################################ +## Enrichment analysis ## +################################################ +################################################ + +# set seed when required + +if (!is.na(opt\$seed)) { + warning('Setting seed ', opt\$seed, ' for reproducibility') + set.seed(opt\$seed) +} + +# load adjacency matrix +# this matrix should have gene x gene dimensions + +adj <- as.matrix(read_delim_flexible( + opt\$adj, + header = TRUE, + row.names = 1, + check.names = TRUE +)) +if (nrow(adj) != ncol(adj)) { + stop('Adjacency matrix is not square') +} +if (!all(rownames(adj) == colnames(adj))) { + stop('Adjacency matrix row names are not equal to column names') +} + +# load and process knowledge database + +gmt <- load_gmt( + opt\$gmt, + rownames(adj) +) + +# filter gene sets +# gene sets with less than set_min or more than set_max genes are removed + +idx <- which(colSums(gmt\$db) > opt\$set_min & colSums(gmt\$db) < opt\$set_max) +gmt\$db <- gmt\$db[, idx] +gmt\$description <- gmt\$description[idx] + +# run GREA +# Basically, it calculates the odds ratio of the graph being enriched in each concept, +# and the FDR of the odds ratio through permutation tests + +odds <- runGraflex( + adj, + gmt\$db, + p=opt\$permutation, + ncores=opt\$ncores +) +odds\$Description <- sapply(odds\$Concept, function(concept) + gmt\$description[[concept]] +) + +################################################ +################################################ +## Generate outputs ## +################################################ +################################################ + +write.table( + odds, + file = paste0(opt\$prefix, '.grea.tsv'), + col.names = TRUE, + row.names = FALSE, + sep = '\\t', + quote = FALSE + +) + +################################################ +################################################ +## R SESSION INFO ## +################################################ +################################################ + +sink(paste0(opt\$prefix, ".R_sessionInfo.log")) +print(sessionInfo()) +sink() + +################################################ +################################################ +## VERSIONS FILE ## +################################################ +################################################ + +propr.version <- as.character(packageVersion('propr')) + +writeLines( + c( + '"${task.process}":', + paste(' r-propr:', propr.version) + ), +'versions.yml') + +################################################ +################################################ +################################################ +################################################ diff --git a/modules/local/propr/propd/main.nf b/modules/local/propr/propd/main.nf new file mode 100644 index 00000000..8bfd9e3a --- /dev/null +++ b/modules/local/propr/propd/main.nf @@ -0,0 +1,33 @@ +process PROPR_PROPD { + tag "$meta.id" + label 'process_medium' + + // conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'oras://community.wave.seqera.io/library/bioconductor-limma_r-ggplot2_r-propr:209490acb0e524e3' : + 'community.wave.seqera.io/library/bioconductor-limma_r-ggplot2_r-propr:17abd3f137436739' }" + + input: + tuple val(meta), path(count), path(samplesheet), val(contrast_variable), val(reference), val(target) + + output: + tuple val(meta), path("*.propd.rds") , emit: rds + tuple val(meta), path("*.propd.results.tsv") , emit: results + tuple val(meta), path("*.propd.results_filtered.tsv"), emit: results_filtered, optional: true + tuple val(meta), path("*.propd.adjacency.csv") , emit: adjacency , optional: true + tuple val(meta), path("*.propd.connectivity.tsv") , emit: connectivity , optional: true + tuple val(meta), path("*.propd.hub_genes.tsv") , emit: hub_genes , optional: true + tuple val(meta), path("*.propd.fdr.tsv") , emit: fdr , optional: true + tuple val(meta), path("*.propd.red_pairs.pdf") , emit: red_pairs , optional: true + tuple val(meta), path("*.propd.yellow_pairs.pdf") , emit: yellow_pairs , optional: true + tuple val(meta), path("*.propd.green_pairs.pdf") , emit: green_pairs , optional: true + path "*.warnings.log" , emit: warnings + path "*.R_sessionInfo.log" , emit: session_info + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + template 'propd.R' +} diff --git a/modules/local/propr/propd/templates/propd.R b/modules/local/propr/propd/templates/propd.R new file mode 100644 index 00000000..c4aae21a --- /dev/null +++ b/modules/local/propr/propd/templates/propd.R @@ -0,0 +1,713 @@ +#!/usr/bin/env Rscript + +################################################ +################################################ +## Functions ## +################################################ +################################################ + +#' Parse out options from a string without recourse to optparse +#' +#' @param x Long-form argument list like --opt1 val1 --opt2 val2 +#' +#' @return named list of options and values similar to optparse +parse_args <- function(x){ + args_list <- unlist(strsplit(x, ' ?--')[[1]])[-1] + args_vals <- lapply(args_list, function(x) scan(text=x, what='character', quiet = TRUE)) + + # Ensure the option vectors are length 2 (key/ value) to catch empty ones + args_vals <- lapply(args_vals, function(z){ length(z) <- 2; z}) + + parsed_args <- structure(lapply(args_vals, function(x) x[2]), names = lapply(args_vals, function(x) x[1])) + parsed_args[! is.na(parsed_args)] +} + +#' Flexibly read CSV or TSV files +#' +#' @param file Input file +#' @param header Boolean. TRUE if first row is header. False without header. +#' @param row.names The first column is used as row names by default. +#' Otherwise, give another number. Or use NULL when no row.names are present. +#' +#' @return output Data frame +read_delim_flexible <- function(file, header = TRUE, row.names = 1, check.names = TRUE){ + + ext <- tolower(tail(strsplit(basename(file), split = "\\\\.")[[1]], 1)) + + if (ext == "tsv" || ext == "txt") { + separator <- "\\t" + } else if (ext == "csv") { + separator <- "," + } else { + stop(paste("Unknown separator for", ext)) + } + + mat <- read.delim( + file, + sep = separator, + header = header, + row.names = row.names, + check.names = check.names + ) +} + +#' Get connectivity of genes from adjacency matrix +#' +#' The connectivity of a gene is the number of connections it has with other genes. +#' In other words, the degree of a gene. This connectivity can be weighted. In this +#' way, the strength of the theta value can be taken into account. +#' These information can be used to summarize the network and identify genes that are +#' potentially changing between groups. +#' +#' @param pd propd object +#' @param adj Adjacency matrix +#' @param de Differential proportionality values for each gene with respect to the +#' normalization reference +#' @param cutoff Cutoff value for de values to be considered significant +#' +#' @return data frame with the following columns: gene_id, degree, weighted_degree, +#' genewise_theta, average_theta, classification +get_connectivity <- function(pd, adj, de, cutoff, features_id_col='gene_id'){ + + # initialize empty data frame + connectivity <- data.frame(matrix(NA, nrow=ncol(pd@counts), ncol=6)) + colnames(connectivity) <- c( + features_id_col, + 'degree', + 'weighted_degree', + 'genewise_theta', + 'average_theta', + 'classification' + ) + + # add features ids + connectivity[,1] <- colnames(pd@counts) + + # add degree + # degree is the number of connections a gene has with other genes + connectivity[,2] <- colSums(adj) + + # add weighted degree + # each connection is weighted by the theta value + # so lower theta values (higher between group variance than within group variance) will have a higher weight + # NOTE this is a placeholder for the proper weighted degree, maybe we are gonna change the way how we compute it + mat <- getMatrix(pd) + diag(mat) <- NA + connectivity[,3] <- colSums((1 - mat) * adj, na.rm=TRUE) + + # add genewise theta + # a theta value can be associated to each gene by calculating the between group variance vs within group variance + # of the gene normalized with respect to a reference (in this case the geometric mean of the sample) + connectivity[,4] <- de + + # add average theta of the connections + connectivity[,5] <- colSums(mat * adj, na.rm=TRUE) / colSums(adj) + + # classification + # green for DE genes, and red for non-DE genes + connectivity[,6] <- 'green' + connectivity[which(de > cutoff), 6] <- 'red' + + return(connectivity) +} + +#' Determine hub genes based on connectivity +#' +#' Here hub genes are those that have a degree higher than the expected degree. +#' The expected degree is the number of connections that each gene would have +#' if the connections were distributed uniformly. In other words, the average +#' degree by node. +#' +#' @param connectivity Data frame with connectivity +#' @param cutoff Theta value for which DP pairs are considered significant. +#' @param weighted Boolean. If TRUE, use weighted degree to determine hub genes. +#' Otherwise, use degree. +#' +#' @return filtered and sorted connectivity data frame with hub genes +get_hub_genes <- function(connectivity, cutoff, weighted=FALSE){ + + # get the expected degree + total_degree <- if (weighted) sum(connectivity\$weighted_degree) else sum(connectivity\$degree) + n_nodes <- sum(connectivity\$degree > 0) + expected_degree <- total_degree / n_nodes + + # get hub genes + hub_genes <- connectivity[which(connectivity\$degree > expected_degree),] + + # sort hub genes + if (weighted) { + hub_genes <- hub_genes[order(hub_genes\$weighted_degree, decreasing=TRUE),] + } else { + hub_genes <- hub_genes[order(hub_genes\$degree, decreasing=TRUE),] + } + + return(hub_genes) +} + +#' Plot pairs of genes +#' +#' This function plots the following pairs of genes: +#' - xy vs sample +#' - x vs y +#' - x vs sample +#' - y vs sample +#' - xr vs yr +#' - xr vs sample +#' - yr vs sample +#' The pairs are colored according to the group they belong to. +#' +#' @param df Data frame with the following columns: xy, x, y, xr, yr, sample, group, color +#' @param x Name of the gene x +#' @param y Name of the gene y +#' @param title Title of the plot +plot_pairs <- function(df, x, y, title){ + + # Define the layout + layout_matrix <- matrix(c( + 0, 1, 8, # First row + 2, 3, 4, # Second row + 5, 6, 7 # Third row + ), nrow = 3, ncol = 3, byrow = TRUE) + layout(layout_matrix, widths = c(1, 1, 1), heights = c(1, 1, 1)) + + # Adjust margins and text sizes + par(mar = c(4, 4, 1, 1), cex = 1.5, lwd = 1.5) + + # plot xy vs sample + plot(x=df\$sample, y=df\$xy, xlab='sample', ylab=paste0(x, '/', y), col=df\$color) + + # plot x vs y + plot(x=df\$x, y=df\$y, xlab=x, ylab=y, col=df\$color) + + # plot x vs sample + plot(x=df\$sample, y=df\$x, xlab='sample', ylab=x, col=df\$color) + + # plot y vs sample + plot(x=df\$sample, y=df\$y, xlab='sample', ylab=y, col=df\$color) + + # plot xr vs yr + plot(x=df\$xr, y=df\$yr, xlab=paste0(x, '/ref'), ylab=paste0(y, '/ref'), col=df\$color) + + # plot xr vs sample + plot(x=df\$sample, y=df\$xr, xlab='sample', ylab=paste0(x, '/ref'), col=df\$color) + + # plot yr vs sample + plot(x=df\$sample, y=df\$yr, xlab='sample', ylab=paste0(y, '/ref'), col=df\$color) + + # add legend + plot.new() + par(mar = c(0, 0, 0, 0)) + legend( + "center", + legend = unique(df\$group), + col = unique(df\$color), + pch = 19, + cex = 1.5, + bty = "n") + + # TODO the title does not appear somehow, fix it + # Add main title + mtext( + title, + side = 3, + outer = TRUE, + line = 1, + cex = 2, + font = 2) +} + +################################################ +################################################ +## Parse arguments ## +################################################ +################################################ + +# Set defaults and classes + +opt <- list( + prefix = ifelse('$task.ext.prefix' == 'null', '$meta.id', '$task.ext.prefix'), + + # input count matrix + count = '$count', + features_id_col = 'gene_id', # column name of feature ids + + # comparison groups + samplesheet = '$samplesheet', + obs_id_col = 'sample', # column name of observation ids + contrast_variable = "$contrast_variable", # column name of contrast variable + reference_group = "$reference", # reference group for contrast variable + target_group = "$target", # target group for contrast variable + + # parameters for computing differential proportionality + alpha = NA, # alpha for boxcox transformation + moderated = TRUE, # use moderated theta + + # parameters for getting the significant differentially proportional pairs + fdr = 0.05, # FDR threshold + permutation = 0, # if permutation > 0, use permutation test to compute FDR + number_of_cutoffs = 100, # number of cutoffs for permutation test + + # parameters for getting the hub genes + weighted_degree = FALSE, # use weighted degree for hub genes or not + + # other parameters + seed = NA, # seed for reproducibility + ncores = as.integer('$task.cpus') +) + +opt_types <- list( + prefix = 'character', + count = 'character', + samplesheet = 'character', + features_id_col = 'character', + obs_id_col = 'character', + contrast_variable = 'character', + reference_group = 'character', + target_group = 'character', + alpha = 'numeric', + moderated = 'logical', + fdr = 'numeric', + permutation = 'numeric', + number_of_cutoffs = 'numeric', + weighted_degree = 'logical', + seed = 'numeric', + ncores = 'numeric' +) + +# Apply parameter overrides + +args_opt <- parse_args('$task.ext.args') +for ( ao in names(args_opt)){ + if (! ao %in% names(opt)){ + stop(paste("Invalid option:", ao)) + } else { + + # Preserve classes from defaults where possible + args_opt[[ao]] <- as(args_opt[[ao]], opt_types[[ao]]) + + # handle NA, and avoid errors when NA is provided by user as character + if (args_opt[[ao]] %in% c('NA', NA)) args_opt[[ao]] <- NA + + # replace values + opt[[ao]] <- args_opt[[ao]] + } +} + +# Check if required parameters have been provided + +required_opts <- c('count','samplesheet','contrast_variable','reference_group','target_group') +missing <- required_opts[unlist(lapply(opt[required_opts], is.null)) | ! required_opts %in% names(opt)] +if (length(missing) > 0){ + stop(paste("Missing required options:", paste(missing, collapse=', '))) +} + +# Check file inputs are valid + +for (file_input in c('count','samplesheet')){ + if (is.null(opt[[file_input]])) { + stop(paste("Please provide", file_input), call. = FALSE) + } + if (! file.exists(opt[[file_input]])){ + stop(paste0('Value of ', file_input, ': ', opt[[file_input]], ' is not a valid file')) + } +} + +# check parameters are valid + +if (opt\$permutation < 0) { + stop('permutation should be a positive integer') +} + +# TODO maybe add a function to pretty print the arguments? +print(opt) + +################################################ +################################################ +## Finish loading libraries ## +################################################ +################################################ + +library(propr) + +################################################ +################################################ +## Perform differential proportionality ## +################################################ +################################################ + +# set seed when required + +if (!is.na(opt\$seed)) { + warning('Setting seed ', opt\$seed, ' for reproducibility') + set.seed(opt\$seed) +} + +# read matrix + +mat <- read_delim_flexible( + opt\$count, + header = TRUE, + row.names = opt\$features_id_col, + check.names = FALSE +) +mat <- t(mat) # transpose matrix to have features (genes) as columns + +# parse group +# and filter matrix and group values, so that only the contrasted groups are kept +# TODO propd can also handle more than two groups +# but that dont work properly with the contrast format +# Should we provide an alternative way to do that? + +samplesheet <- read_delim_flexible( + opt\$samplesheet, + header = TRUE, + row.names = NULL, + check.names = FALSE +) +samplesheet <- samplesheet[,c(opt\$obs_id_col, opt\$contrast_variable)] +idx <- which(samplesheet[,2] %in% c(opt\$reference_group, opt\$target_group)) +mat <- mat[idx,] +samplesheet <- samplesheet[idx,] +group <- as.vector(samplesheet[,2]) +if (length(group) != nrow(mat)) stop('Error when parsing group') +if (length(unique(group)) != 2) stop('Only two groups are allowed for contrast') + +# compute differential proportionality + +pd <- propd( + mat, + group = group, + alpha = opt\$alpha, + weighted = FALSE, + p = opt\$permutation +) + +# compute DE theta values +# this is the theta value for each gene with respect to the normalization reference +# in this case, the reference is the geometric mean of the sample +# These DE values are only for interpreting purposes. +# TODO if we want to use the outcome from other DE analysis, at some point we should +# divide the below part into maybe a separate module that can take the DE and DP values as input +# and coordinate them through the pipeline + +ref <- exp(rowMeans(log(pd@counts))) +de <- runNormalization(pd, ref) + +# use F-stat FDR-adjusted p-values to get significant pairs, if permutation == 0 +# otherwise, get FDR values using permutation tests (more computationally expensive but likely more conservative FDRs) + +if (opt\$permutation == 0) { + + warning('FDR-adjusted p-values are used to get significant pairs.') + + # update FDR-adjusted p-values + + pd <- updateF( + pd, + moderated = opt\$moderated + ) + if (opt\$moderated) pd <- setActive(pd, what='theta_mod') + + # get theta value for which FDR is below desired threshold + # theta_cutoff is FALSE when no theta value has FDR below desired threshold + # otherwise it is the theta value for which FDR is below desired threshold + # Only when there is a meaningful theta, we can compute the next steps + # that involve extracting the significant pairs. + + theta_cutoff <- getCutoffFstat( + pd, + pval=opt\$fdr, + fdr_adjusted=TRUE + ) + if (theta_cutoff) { + + warning('Significant theta value found: ', theta_cutoff) + + # get adjacency matrix + # this matrix will have 1s for significant pairs and 0s for the rest + # diagonals are set to 0 + + adj <- getAdjacencyFstat( + pd, + pval=opt\$fdr, + fdr_adjusted=TRUE + ) + + # calculate gene connectivity and get hub genes + + connectivity <- get_connectivity( + pd, + adj, + de, + theta_cutoff, + features_id_col=opt\$features_id_col + ) + hub_genes <- get_hub_genes(connectivity, weighted=opt\$weighted_degree) + + # get significant pairs and classify them into red/yellow/green pairs + + results <- getSignificantResultsFstat( + pd, + pval=opt\$fdr, + fdr_adjusted=TRUE + ) + results <- results[,c("Partner", "Pair", "theta")] + results\$classification <- "red" + results\$classification[which(de[results\$Pair] < theta_cutoff | de[results\$Partner] < theta_cutoff)] <- "yellow" + results\$classification[which(de[results\$Pair] < theta_cutoff & de[results\$Partner] < theta_cutoff)] <- "green" + + # sort significant pairs + + results <- results[order(results\$theta),] + } + +} else { + + warning('Permutation tests are used to compute FDR values.') + + # calculate FDR values using permutation tests + # this part will call the updateCutoffs function iteratively + # as far as it does not find a meaningful theta value + # and does not reach the maximum number of iterations + + theta_cutoff <- FALSE + max_cutoff <- 1 + ntry <- 0 + while (!theta_cutoff & max_cutoff > 0 & ntry < 10) { + ntry <- ntry + 1 + + # get theta cutoffs to test the FDR + + if (ntry > 1) { + part <- pd@fdr[which(pd@fdr\$truecounts > 0),] + if (nrow(part) > 1) { + max_cutoff <- min(part\$cutoff) + } else { + break + } + } + + cutoffs <- as.numeric(quantile( + pd@results[pd@results\$theta < max_cutoff, 'theta'], + seq(0, 1, length.out = opt\$number_of_cutoffs) + )) + + # update FDR values + + pd <- updateCutoffs( + pd, + custom_cutoffs = cutoffs, + ncores = opt\$ncores + ) + + # check if any theta value has FDR below desired threshold + + theta_cutoff <- getCutoffFDR( + pd, + fdr=opt\$fdr, + window_size=1 + ) + } + + if (theta_cutoff) { + + warning('Significant theta value found: ', theta_cutoff) + + # get adjacency matrix + + adj <- getAdjacencyFDR( + pd, + fdr=opt\$fdr, + window_size=1 + ) + + # calculate gene connectivity and get hub genes + + connectivity <- get_connectivity( + pd, + adj, + de, + theta_cutoff, + features_id_col=opt\$features_id_col + ) + hub_genes <- get_hub_genes(connectivity, weighted=opt\$weighted_degree) + + # get significant pairs and classify them into red/yellow/green pairs + + results <- getSignificantResultsFDR( + pd, + fdr=opt\$fdr, + window_size=1 + ) + results <- results[,c("Partner", "Pair", "theta")] + results\$classification <- "red" + results\$classification[which(de[results\$Pair] < theta_cutoff | de[results\$Partner] < theta_cutoff)] <- "yellow" + results\$classification[which(de[results\$Pair] < theta_cutoff & de[results\$Partner] < theta_cutoff)] <- "green" + + # sort significant pairs + + results <- results[order(results\$theta),] + } +} + +# deal with the situation when no significant thetas are found +# For the moment, we just print a warning and set adj, hub_genes and results to NULL +# TODO take top n pairs when no cutoff has FDR below desired threshold + +if (!theta_cutoff) { + warning('No theta value has FDR below desired threshold.') + adj <- NULL + connectivity <- NULL + hub_genes <- NULL + results <- NULL +} + +################################################ +################################################ +## Generate outputs ## +################################################ +################################################ + +saveRDS( + pd, + file = paste0(opt\$prefix, '.propd.rds') +) + +write.table( + getResults(pd), + file = paste0(opt\$prefix, '.propd.results.tsv'), + col.names = TRUE, + row.names = FALSE, + sep = '\\t', + quote = FALSE +) + +if (theta_cutoff) { + write.table( + results, + file = paste0(opt\$prefix, '.propd.results_filtered.tsv'), + col.names = TRUE, + row.names = FALSE, + sep = '\\t', + quote = FALSE + ) + write.table( + adj, + file = paste0(opt\$prefix, '.propd.adjacency.csv'), + col.names = TRUE, + row.names = TRUE, + sep = ',', + quote = FALSE + ) + write.table( + connectivity, + file = paste0(opt\$prefix, '.propd.connectivity.tsv'), + col.names = TRUE, + row.names = FALSE, + sep = '\\t', + quote = FALSE + ) + write.table( + hub_genes, + file = paste0(opt\$prefix, '.propd.hub_genes.tsv'), + col.names = TRUE, + row.names = FALSE, + sep = '\\t', + quote = FALSE + ) +} + +if (opt\$permutation > 0) { + write.table( + pd@fdr, + file = paste0(opt\$prefix, '.propd.fdr.tsv'), + col.names = TRUE, + sep = '\\t', + quote = FALSE + ) +} + +################################################ +################################################ +## Plot red pairs ## +################################################ +################################################ + +if (theta_cutoff){ + + # get ratios between each gene and the normalization reference + ratios <- exp(logratio(pd@counts, 'clr', NA)) + + # plot for each pair type + for (pair_type in c("red", "yellow", "green")){ + pdf(paste0(opt\$prefix, '.propd.', pair_type, '_pairs.pdf'), width = 18, height = 18) + + # get pairs + pairs <- results[results\$classification == pair_type,] + + # for the top pairs + for (idx in c(1:3)){ + + # get x and y genes + x <- pairs[idx,'Partner'] + y <- pairs[idx,'Pair'] + + # create data frame + df <- data.frame( + xy=pd@counts[,x]/pd@counts[,y], + x=pd@counts[,x], + y=pd@counts[,y], + xr=ratios[,x], + yr=ratios[,y], + sample=c(1:nrow(pd@counts)), + group=group, + color=ifelse(group == opt\$target_group, 'red', 'blue')) + df <- df[order(df\$group, df\$sample),] + + # plot + title <- paste0("top ", idx, " ", pair_type, " pair with theta=", round(results[idx, 'theta'], 6)) + plot_pairs(df, x, y, title) + } + + } +} + +################################################ +################################################ +## WARNINGS ## +################################################ +################################################ + +sink(paste0(opt\$prefix, ".warnings.log")) +print(warnings()) +sink() + +################################################ +################################################ +## R SESSION INFO ## +################################################ +################################################ + +sink(paste0(opt\$prefix, ".R_sessionInfo.log")) +print(sessionInfo()) +sink() + +################################################ +################################################ +## VERSIONS FILE ## +################################################ +################################################ + +propr.version <- as.character(packageVersion('propr')) + +writeLines( + c( + '"${task.process}":', + paste(' r-propr:', propr.version) + ), +'versions.yml') + +################################################ +################################################ +################################################ +################################################ diff --git a/modules/local/propr/propr/main.nf b/modules/local/propr/propr/main.nf new file mode 100644 index 00000000..99e7770e --- /dev/null +++ b/modules/local/propr/propr/main.nf @@ -0,0 +1,27 @@ +process PROPR_PROPR { + tag "$meta.id" + label 'process_medium' + + // conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'oras://community.wave.seqera.io/library/bioconductor-limma_r-ggplot2_r-propr:209490acb0e524e3' : + 'community.wave.seqera.io/library/bioconductor-limma_r-ggplot2_r-propr:17abd3f137436739' }" + + input: + tuple val(meta), path(count) + + output: + tuple val(meta), path("*.propr.rds") , emit: propr + tuple val(meta), path("*.propr.matrix.csv") , emit: matrix + tuple val(meta), path("*.propr.fdr.tsv") , emit: fdr , optional:true + tuple val(meta), path("*.propr.adjacency.csv"), emit: adjacency , optional:true + path "*.warnings.log" , emit: warnings + path "*.R_sessionInfo.log" , emit: session_info + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + template 'propr.R' +} diff --git a/modules/local/propr/propr/templates/propr.R b/modules/local/propr/propr/templates/propr.R new file mode 100644 index 00000000..afbc35e6 --- /dev/null +++ b/modules/local/propr/propr/templates/propr.R @@ -0,0 +1,318 @@ +#!/usr/bin/env Rscript + +################################################ +################################################ +## Functions ## +################################################ +################################################ + +#' Parse out options from a string without recourse to optparse +#' +#' @param x Long-form argument list like --opt1 val1 --opt2 val2 +#' +#' @return named list of options and values similar to optparse + +parse_args <- function(x){ + args_list <- unlist(strsplit(x, ' ?--')[[1]])[-1] + args_vals <- lapply(args_list, function(x) scan(text=x, what='character', quiet = TRUE)) + + # Ensure the option vectors are length 2 (key/ value) to catch empty ones + args_vals <- lapply(args_vals, function(z){ length(z) <- 2; z}) + + parsed_args <- structure(lapply(args_vals, function(x) x[2]), names = lapply(args_vals, function(x) x[1])) + parsed_args[! is.na(parsed_args)] +} + +#' Flexibly read CSV or TSV files +#' +#' @param file Input file +#' @param header Boolean. TRUE if first row is header. False without header. +#' @param row.names The first column is used as row names by default. +#' Otherwise, give another number. Or use NULL when no row.names are present. +#' +#' @return output Data frame +read_delim_flexible <- function(file, header = TRUE, row.names = 1, check.names = TRUE){ + + ext <- tolower(tail(strsplit(basename(file), split = "\\\\.")[[1]], 1)) + + if (ext == "tsv" || ext == "txt") { + separator <- "\\t" + } else if (ext == "csv") { + separator <- "," + } else { + stop(paste("Unknown separator for", ext)) + } + + mat <- read.delim( + file, + sep = separator, + header = header, + row.names = row.names, + check.names = check.names + ) + + return(mat) +} + +################################################ +################################################ +## Parse arguments ## +################################################ +################################################ + +# Set defaults and classes + +opt <- list( + prefix = ifelse('$task.ext.prefix' == 'null', '$meta.id', '$task.ext.prefix'), + + # input count matrix + count = '$count', + features_id_col = 'gene_id', # column name for features (genes) + + # parameters for computing correlation coefficients + metric = 'rho', # correlation metric: rho, phi, phs, cor, vlr, pcor, pcor.shrink, pcor.bshrink + ivar = 'clr', # transformation: clr, alr, or the name(s) or index(es) of the variable(s) to be used as reference(s) + alpha = NA, # alpha value for Box-Cox transformation + + # parameters for getting the significant coefficients + fdr = 0.05, # FDR threshold + tails = "right", # FDR tail: right or both + permutation = 100, # number of permutations + number_of_cutoffs = 100, # number of cutoffs for which compute the FDRs + + # other parameters + seed = NA, # seed for reproducibility + ncores = as.integer('$task.cpus') +) + +opt_types <- list( + prefix = 'character', + count = 'character', + features_id_col = 'character', + metric = 'character', + ivar = 'character', + alpha = 'numeric', + fdr = 'numeric', + tails = 'character', + permutation = 'numeric', + number_of_cutoffs = 'numeric', + seed = 'numeric', + ncores = 'numeric' +) + +# Apply parameter overrides + +args_opt <- parse_args('$task.ext.args') + +for ( ao in names(args_opt)){ + if (! ao %in% names(opt)){ + stop(paste("Invalid option:", ao)) + } else { + + # Preserve classes from defaults + args_opt[[ao]] <- as(args_opt[[ao]], opt_types[[ao]]) + + # handle NA, and avoid errors when NA is provided by user as character + if (args_opt[[ao]] %in% c('NA', NA)) args_opt[[ao]] <- NA + + # replace values + opt[[ao]] <- args_opt[[ao]] + } +} + +# Check if required parameters have been provided + +required_opts <- c('count') # only count data is strictly required, other parameters have defaults +missing <- required_opts[unlist(lapply(opt[required_opts], is.null)) | ! required_opts %in% names(opt)] +if (length(missing) > 0){ + stop(paste("Missing required options:", paste(missing, collapse=', '))) +} + +# Check file inputs are valid + +for (file_input in c('count')){ + if (is.null(opt[[file_input]])) { + stop(paste("Please provide", file_input), call. = FALSE) + } + if (! file.exists(opt[[file_input]])){ + stop(paste0('Value of ', file_input, ': ', opt[[file_input]], ' is not a valid file')) + } +} + +# check parameters + +if (!opt\$metric %in% c('rho', 'phi', 'phs', 'cor', 'vlr', 'pcor', 'pcor.shrink', 'pcor.bshrink')) { + stop('Please make sure you provided the correct metric') +} + +if (opt\$metric == 'pcor.bshrink'){ + if (!is.na(opt\$alpha)) stop('Box-cox transformation is not implemented for pcor.bshrink yet.') + if (!opt\$ivar %in% c('clr', 'alr')) stop('Please make sure you provided the correct transformation: clr or alr') + +} else { + if (is.na(opt\$ivar)) warning('Warning: No transformation is required by user. We assume the input count data was already properly transformed.') +} + +# TODO maybe add a function to pretty print the arguments? +print(opt) + + +################################################ +################################################ +## Finish loading libraries ## +################################################ +################################################ + +library(propr) + +################################################ +################################################ +## Perform correlation analysis ## +################################################ +################################################ + +# set seed when required + +if (!is.na(opt\$seed)) { + warning('Setting seed ', opt\$seed, ' for reproducibility') + set.seed(opt\$seed) +} + +# load count matrix + +mat <- read_delim_flexible( + opt\$count, + header = TRUE, + row.names = opt\$features_id_col, + check.names = FALSE +) +mat <- t(mat) # transpose matrix to have features (genes) as columns + +# Compute correlation coefficients + +pr <- propr( + mat, + metric = opt\$metric, + ivar = opt\$ivar, + alpha = opt\$alpha, + p = opt\$permutation +) + +if (opt\$permutation > 0) { + + # update FDRs for each coefficient cutoff + + pr <- updateCutoffs( + pr, + number_of_cutoffs=opt\$number_of_cutoffs, + tails=opt\$tails, + ncores=opt\$ncores + ) + + # get cutoff at given FDR threshold + + cutoff <- getCutoffFDR( + pr, + fdr=opt\$fdr, + window_size=1 + ) + + if (cutoff) { + + # get adjacency matrix with the significant edges + + adj <- getAdjacencyFDR( + pr, + fdr=opt\$fdr, + window_size=1 + ) + + } else { + # TODO take top n pairs when no cutoff has FDR below desired threshold + # For the moment, we just print a warning and set adj to NULL + warning('No significant results found at FDR threshold ', opt\$fdr) + adj <- NULL + } +} + +################################################ +################################################ +## Generate outputs ## +################################################ +################################################ + +saveRDS( + pr, + file = paste0(opt\$prefix, '.propr.rds') +) + +write.table( + round(pr@matrix, 8), # round matrix decimals to avoid floating point inconsistencies + file = paste0(opt\$prefix, '.propr.matrix.csv'), + col.names = TRUE, + row.names = TRUE, + sep = ',', + quote = FALSE +) + +if (!is.null(adj)) { + write.table( + adj, + file = paste0(opt\$prefix, '.propr.adjacency.csv'), + col.names = TRUE, + row.names = FALSE, + sep = '\t', + quote = FALSE + ) +} + +if (opt\$permutation > 0) { + write.table( + pr@fdr, + file = paste0(opt\$prefix, '.propr.fdr.tsv'), + col.names = TRUE, + row.names = FALSE, + sep = '\t', + quote = FALSE + ) +} + +################################################ +################################################ +## WARNINGS ## +################################################ +################################################ + +sink(paste0(opt\$prefix, ".warnings.log")) +print(warnings()) +sink() + +################################################ +################################################ +## R SESSION INFO ## +################################################ +################################################ + +sink(paste0(opt\$prefix, ".R_sessionInfo.log")) +print(sessionInfo()) +sink() + +################################################ +################################################ +## VERSIONS FILE ## +################################################ +################################################ + +propr.version <- as.character(packageVersion('propr')) + +writeLines( + c( + '"${task.process}":', + paste(' r-propr:', propr.version) + ), +'versions.yml') + +################################################ +################################################ +################################################ +################################################ diff --git a/modules/nf-core/mygene/environment.yml b/modules/nf-core/mygene/environment.yml new file mode 100644 index 00000000..45442c49 --- /dev/null +++ b/modules/nf-core/mygene/environment.yml @@ -0,0 +1,7 @@ +name: mygene +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::mygene=3.2.2 diff --git a/modules/nf-core/mygene/main.nf b/modules/nf-core/mygene/main.nf new file mode 100644 index 00000000..25a21d8f --- /dev/null +++ b/modules/nf-core/mygene/main.nf @@ -0,0 +1,23 @@ +process MYGENE { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mygene:3.2.2--pyh5e36f6f_0': + 'biocontainers/mygene:3.2.2--pyh5e36f6f_0' }" + + input: + tuple val(meta), path(gene_list) + + output: + tuple val(meta), path("*.gmt"), emit: gmt + tuple val(meta), path("*.tsv"), emit: tsv , optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + template "mygene.py" +} diff --git a/modules/nf-core/mygene/meta.yml b/modules/nf-core/mygene/meta.yml new file mode 100644 index 00000000..f7aaa455 --- /dev/null +++ b/modules/nf-core/mygene/meta.yml @@ -0,0 +1,54 @@ +name: "mygene" +description: Fetch the GO concepts for a list of genes +keywords: + - mygene + - go + - annotation +tools: + - "mygene": + description: "A python wrapper to query/retrieve gene annotation data from Mygene.info." + homepage: "https://mygene.info/" + documentation: "https://docs.mygene.info/projects/mygene-py/en/latest/" + tool_dev_url: "https://github.com/biothings/mygene.py" + doi: "10.1093/nar/gks1114" + licence: ["Apache-2.0"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - gene_list: + type: file + description: A tsv/csv file that contains a list of gene ids in one of the columns. + By default, the column name should be "gene_id", but this can be changed + by using "--columname gene_id" in ext.args. + pattern: "*.{csv,tsv}" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - gmt: + type: file + description: | + Each row contains the GO id, a description, and a list of gene ids. + pattern: "*.gmt" + - tsv: + type: file + description: | + (optional) A tsv file with the following columns: + query, mygene_id, go_id, go_term, go_evidence, go_category, symbol, name, taxid + pattern: "*.tsv" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@suzannejin" +maintainers: + - "@suzannejin" diff --git a/modules/nf-core/mygene/templates/mygene.py b/modules/nf-core/mygene/templates/mygene.py new file mode 100644 index 00000000..2717c467 --- /dev/null +++ b/modules/nf-core/mygene/templates/mygene.py @@ -0,0 +1,347 @@ +#!/usr/bin/env python3 +import argparse +import mygene +import shlex + + +""" +This python script uses the mygene module to query the MyGene.info API and +retrieve the go terms associated with a list of gene ids. The gene ids should +ideally be Ensembl or Entrez ids. The script generates two outputs: + 1. A tsv file containing information related to each query. The columns + include query, mygene_id, go_id, go_term, go_evidence, go_category, + symbol, name, and taxid. + 2. A gmt file containing information related to each go term. Each row + includes the go id, go term, and the genes associated with that go term. + +Author: Suzanne Jin +License: Apache 2.0 (same as the mygene library) +""" + + +class Arguments: + """ + Parses the argments, including the ones coming from $task.ext.args. + """ + + def __init__(self) -> None: + self.input = "$gene_list" + self.prefix = "$task.ext.prefix" if "$task.ext.prefix" != "null" else "$meta.id" + self.output_gmt = self.prefix + ".gmt" + self.output_tsv = self.prefix + ".tsv" + self.parse_ext_args("$task.ext.args") + + def parse_ext_args(self, args_string: str) -> None: + """ + It parses the extended arguments. + """ + # skip when there are no extended arguments + if args_string == "null": + args_string = "" + + # Parse the extended arguments + args_list = shlex.split(args_string) # Split the string into a list of arguments + parser = argparse.ArgumentParser() + # input parameters + parser.add_argument( + "--columname", + default="gene_id", + help="Name of the column where the gene ids are stored in the input file. Default: gene_id", + ) + # filtering parameters + parser.add_argument( + "--species", + default=None, + help="Comma separated of common name of the species or taxon ids", + ) + parser.add_argument( + "--go_category", + default=None, + help="Comma separated list of GO categories to keep. Default: all", + ) + parser.add_argument( + "--go_evidence", + default=None, + help="Comma separated list of GO evidence codes to keep. Default: all", + ) + # additional parameters for querymany + parser.add_argument( + "--scopes", + default=None, + help="Comma separated list of scopes to search for.", + ) + parser.add_argument( + "--entrezonly", + default=False, + help="When true, the query returns only the hits with valid Entrez gene ids. Default: false.", + ) + parser.add_argument( + "--ensemblonly", + default=False, + help="When true, the query returns only the hits with valid Ensembl gene ids. Default: False", + ) + # output parameters + parser.add_argument( + "--generate_tsv", + default=False, + help="Also generate a tsv file with the gene based information. Default: False", + ) + args = parser.parse_args(args_list) + + # Convert "null" values to default values + # convert true to True and false to False + for attr in vars(args): + value = getattr(args, attr) + if value == "null": + setattr(args, attr, parser.get_default(attr)) + elif value == "true": + setattr(args, attr, True) + elif value == "false": + setattr(args, attr, False) + + # check if the arguments are valid + if args.go_category: + args.go_category = args.go_category.upper() + for category in args.go_category.split(","): + if category not in ["BP", "MF", "CC"]: + raise ValueError("The GO category should be one of BP, MF, or CC.") + if args.go_evidence: + args.go_evidence = args.go_evidence.upper() + + # Assign args attributes to self attributes + for attr in vars(args): + setattr(self, attr, getattr(args, attr)) + + def print_args(self) -> None: + """ + Print the arguments. + """ + for attr in vars(self): + print(f"{attr}: {getattr(self, attr)}") + + +class Version: + """ + Parse the versions of the modules used in the script. + """ + + @staticmethod + def get_versions(modules: list) -> dict: + """ + This function takes a list of modules and returns a dictionary with the + versions of each module. + """ + return {module.__name__: module.__version__ for module in modules} + + @staticmethod + def format_yaml_like(data: dict, indent: int = 0) -> str: + """ + Formats a dictionary to a YAML-like string. + + Args: + data (dict): The dictionary to format. + indent (int): The current indentation level. + + Returns: + yaml_str: A string formatted as YAML. + """ + yaml_str = "" + for key, value in data.items(): + spaces = " " * indent + if isinstance(value, dict): + yaml_str += f"{spaces}{key}:\\n{Version.format_yaml_like(value, indent + 1)}" + else: + yaml_str += f"{spaces}{key}: {value}\\n" + return yaml_str + + +class MyGene: + """ + This class will query the MyGene.info API and retrieve the go terms + associated with a list of gene ids. + + In concrete, if first queries the mygene API to get the mygene ids for each + of the query gene. Then, it queries for the annotations, and parses the go + terms all together with all the other information. + """ + + def __init__( + self, + query: list, + species: str, + scopes: str, + entrezonly: bool, + ensemblonly: bool, + go_category: str = None, + go_evidence: str = None, + ) -> None: + self.query = query + self.fields = "go,symbol,name,taxid" + self.species = species + self.scopes = scopes + self.entrezonly = entrezonly + self.ensemblonly = ensemblonly + self.go_category = go_category + self.go_evidence = go_evidence + self.mg = mygene.MyGeneInfo() + self.idmap = self.query2idmap() + print(f"fetched {len(self.idmap)} ids from {len(self.query)} queries") + + def query2idmap(self) -> dict: + """ + It returns a dictionary with the mygene ids as keys and the query ids as values. + """ + q = self.mg.querymany( + self.query, + scopes=self.scopes, + species=self.species, + entrezonly=self.entrezonly, + ensemblonly=self.ensemblonly, + returnall=True, + ) + return {dic["_id"]: dic["query"] for dic in q["out"] if "_id" in dic} + + def id2info(self) -> list: + """ + It returns a list of dictionaries with the info returned from getgenes for all the query ids. + Each dictionary contains the annotations for the corresponding query gene. + """ + return self.mg.getgenes(list(set(self.idmap)), fields=self.fields, species=self.species) + + def parse_go_based_info(self) -> dict: + """ + It queries the annotations for all query ids and then parses a go + centric dictionary. It is a dictionary of dictionaries with the + following format: {{go_id1: [go_term, gene1, gene2, ...]}, ...} + """ + info = {} + for dic in self.id2info(): + if "go" not in dic: + continue + if self.go_category: + dic["go"] = { + category: dic["go"][category] for category in self.go_category.split(",") if category in dic["go"] + } + for category, go_list in dic["go"].items(): + if not isinstance(go_list, list): + go_list = [go_list] + for go in go_list: + if (self.go_evidence) and (go["evidence"] not in self.go_evidence.split(",")): + continue + + if go["id"] not in info: + info[go["id"]] = [go["term"], self.idmap[dic["_id"]]] + else: + info[go["id"]].append(self.idmap[dic["_id"]]) + return info + + def parse_gene_based_info(self) -> dict: + """ + It queries the annotations for all query ids and then parses a go + centric dictionary. + + At the end it returns a dictionary {query gene: {}} of dictionaries + with the following keys: query, mygene_id, go_id, go_term, go_evidence, + go_category, symbol, name, taxid. + """ + info = {} + for dic in self.id2info(): + if "go" not in dic: + continue + if self.go_category: + dic["go"] = { + category: dic["go"][category] for category in self.go_category.split(",") if category in dic["go"] + } + for category, go_list in dic["go"].items(): + if not isinstance(go_list, list): + go_list = [go_list] + for go in go_list: + if (self.go_evidence) and (go["evidence"] not in self.go_evidence.split(",")): + continue + + current_info = { + "query": self.idmap[dic["_id"]], + "mygene_id": dic["_id"], + "go_id": go["id"], + "go_term": go["term"], + "go_evidence": go["evidence"], + "go_category": category, + "symbol": dic["symbol"], + "name": dic["name"], + "taxid": dic["taxid"], + } + info[self.idmap[dic["_id"]]] = current_info + return info + + def parse_and_save_to_gmt(self, filename: str) -> list: + """ + It parses and saves go centric information to a gmt file. + The final gmt output will be sorted following the go id order. + """ + info = self.parse_go_based_info() + info = dict(sorted(info.items(), key=lambda x: x[0])) + with open(filename, "w") as f: + for go_id, go_list in info.items(): + tmp = sorted(go_list[1:]) + f.write(go_id + "\\t" + go_list[0] + "\\t" + "\\t".join(tmp) + "\\n") + print(f"saved {len(info)} go terms to {filename}") + + def parse_and_save_to_tsv(self, filename: str) -> None: + """ + It parses and saves gene centric information in a tsv file. + The final tsv output will be sorted following the input query gene list order. + """ + info = self.parse_gene_based_info() + with open(filename, "w") as f: + f.write("\\t".join(info[self.query[0]].keys()) + "\\n") + for gene in self.query: # sorted by query gene list + if gene in info: + f.write("\\t".join([str(val) for val in info[gene].values()]) + "\\n") + print(f"saved {len(info)} gene centric info to {filename}") + + +def load_list(filename: str, columname: str) -> list: + """ + It loads the list of gene ids from a file. + The columname is the name of the column where the gene ids are stored. + """ + if filename.split(".")[-1] == "tsv": + sep = "\\t" + elif filename.split(".")[-1] == "csv": + sep = "," + else: + raise ValueError("The input file extension should be either tsv or csv.") + with open(filename, "r") as f: + idx = f.readline().strip().split(sep).index(columname) + return [line.strip().split(sep)[idx] for line in f] + + +if __name__ == "__main__": + # parse and print arguments + args = Arguments() + args.print_args() + + # load gene list + gene_list = load_list(args.input, args.columname) + + # run mygene api + mg = MyGene( + gene_list, + species=args.species, + scopes=args.scopes, + entrezonly=args.entrezonly, + ensemblonly=args.ensemblonly, + go_category=args.go_category, + go_evidence=args.go_evidence, + ) + + # parse annotations and save output files + mg.parse_and_save_to_gmt(args.output_gmt) + if args.generate_tsv: + mg.parse_and_save_to_tsv(args.output_tsv) + + # write versions to file + versions_this_module = {} + versions_this_module["${task.process}"] = Version.get_versions([argparse, mygene]) + with open("versions.yml", "w") as f: + f.write(Version.format_yaml_like(versions_this_module)) diff --git a/modules/nf-core/mygene/tests/default_tsv.config b/modules/nf-core/mygene/tests/default_tsv.config new file mode 100644 index 00000000..08bd6fac --- /dev/null +++ b/modules/nf-core/mygene/tests/default_tsv.config @@ -0,0 +1,3 @@ +process{ + ext.args = "--generate_tsv true" +} \ No newline at end of file diff --git a/modules/nf-core/mygene/tests/go_category.config b/modules/nf-core/mygene/tests/go_category.config new file mode 100644 index 00000000..771f8f4d --- /dev/null +++ b/modules/nf-core/mygene/tests/go_category.config @@ -0,0 +1,3 @@ +process { + ext.args = "--go_category bp,mf" +} \ No newline at end of file diff --git a/modules/nf-core/mygene/tests/go_evidence.config b/modules/nf-core/mygene/tests/go_evidence.config new file mode 100644 index 00000000..b19de214 --- /dev/null +++ b/modules/nf-core/mygene/tests/go_evidence.config @@ -0,0 +1,3 @@ +process { + ext.args = "--go_evidence EXP,IDA" +} \ No newline at end of file diff --git a/modules/nf-core/mygene/tests/main.nf.test b/modules/nf-core/mygene/tests/main.nf.test new file mode 100644 index 00000000..e5ba64ca --- /dev/null +++ b/modules/nf-core/mygene/tests/main.nf.test @@ -0,0 +1,106 @@ +nextflow_process { + + name "Test Process MYGENE" + script "../main.nf" + process "MYGENE" + + tag "modules" + tag "modules_nfcore" + tag "mygene" + + test("mygene - default options") { + + tag "default" + + when { + process { + """ + input[0] = [ + [id : 'test'], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.gene_meta.tsv") + ] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.gmt).match("mygene - default options - gmt") }, + { assert snapshot(process.out.versions).match("mygene - default options - versions") } + ) + } + } + + test("mygene - default with tsv file") { + + tag "default_with_tsv" + config "./default_tsv.config" + + when { + process { + """ + input[0] = [ + [id : 'test'], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.gene_meta.tsv") + ] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.gmt).match("mygene - default with tsv file - gmt") }, + { assert snapshot(process.out.tsv).match("mygene - default with tsv file - tsv") }, + { assert snapshot(process.out.versions).match("mygene - default with tsv file - versions") } + ) + } + } + + test("mygene - filter by go category") { + + tag "filter_by_go_category" + config "./go_category.config" + + when { + process { + """ + input[0] = [ + [id : 'test'], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.gene_meta.tsv") + ] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.gmt).match("mygene - filter by go category - gmt") }, + { assert snapshot(process.out.versions).match("mygene - filter by go category - versions") } + ) + } + } + + test("mygene - filter by go evidence") { + + tag "filter_by_go_evidence" + config "./go_evidence.config" + + when { + process { + """ + input[0] = [ + [id : 'test'], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.gene_meta.tsv") + ] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.gmt).match("mygene - filter by go evidence - gmt") }, + { assert snapshot(process.out.versions).match("mygene - filter by go evidence - versions") } + ) + } + } +} diff --git a/modules/nf-core/mygene/tests/main.nf.test.snap b/modules/nf-core/mygene/tests/main.nf.test.snap new file mode 100644 index 00000000..d6a334c7 --- /dev/null +++ b/modules/nf-core/mygene/tests/main.nf.test.snap @@ -0,0 +1,135 @@ +{ + "mygene - filter by go evidence - versions": { + "content": [ + [ + "versions.yml:md5,09d72645c3ae7e886af6e8bd2876c72b" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:20:31.854823" + }, + "mygene - default options - versions": { + "content": [ + [ + "versions.yml:md5,09d72645c3ae7e886af6e8bd2876c72b" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:19:43.081388" + }, + "mygene - default with tsv file - versions": { + "content": [ + [ + "versions.yml:md5,09d72645c3ae7e886af6e8bd2876c72b" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:20:01.837699" + }, + "mygene - default options - gmt": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.gmt:md5,d76d4d06dad199c5e3ecef7060876834" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:19:43.060437" + }, + "mygene - filter by go category - versions": { + "content": [ + [ + "versions.yml:md5,09d72645c3ae7e886af6e8bd2876c72b" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:20:17.233994" + }, + "mygene - filter by go evidence - gmt": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.gmt:md5,da6b31a5f889e3aedb16b4154f9652af" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:20:31.827798" + }, + "mygene - default with tsv file - tsv": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.tsv:md5,018e23173b224cbf328751006593900e" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:20:01.81872" + }, + "mygene - default with tsv file - gmt": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.gmt:md5,d76d4d06dad199c5e3ecef7060876834" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:20:01.79811" + }, + "mygene - filter by go category - gmt": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.gmt:md5,213c1d1d2345df8ea51d67cb1670f4f7" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-20T17:20:17.208509" + } +} \ No newline at end of file diff --git a/modules/nf-core/mygene/tests/tags.yml b/modules/nf-core/mygene/tests/tags.yml new file mode 100644 index 00000000..c867c978 --- /dev/null +++ b/modules/nf-core/mygene/tests/tags.yml @@ -0,0 +1,2 @@ +mygene: + - "modules/nf-core/mygene/**" diff --git a/modules/nf-core/shinyngs/app/environment.yml b/modules/nf-core/shinyngs/app/environment.yml index 0e6de401..43a09fff 100644 --- a/modules/nf-core/shinyngs/app/environment.yml +++ b/modules/nf-core/shinyngs/app/environment.yml @@ -4,4 +4,4 @@ channels: - bioconda - defaults dependencies: - - bioconda::r-shinyngs=1.8.8 + - bioconda::r-shinyngs=2.0.0 diff --git a/modules/nf-core/shinyngs/app/main.nf b/modules/nf-core/shinyngs/app/main.nf index ef05a863..39b2db46 100644 --- a/modules/nf-core/shinyngs/app/main.nf +++ b/modules/nf-core/shinyngs/app/main.nf @@ -15,8 +15,8 @@ process SHINYNGS_APP { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/r-shinyngs:1.8.8--r43hdfd78af_0' : - 'biocontainers/r-shinyngs:1.8.8--r43hdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/r-shinyngs:2.0.0--r43hdfd78af_0' : + 'biocontainers/r-shinyngs:2.0.0--r43hdfd78af_0' }" input: tuple val(meta), path(sample), path(feature_meta), path(assay_files) // Experiment-level info diff --git a/modules/nf-core/shinyngs/app/tests/main.nf.test.snap b/modules/nf-core/shinyngs/app/tests/main.nf.test.snap index f87c17d7..434d540c 100644 --- a/modules/nf-core/shinyngs/app/tests/main.nf.test.snap +++ b/modules/nf-core/shinyngs/app/tests/main.nf.test.snap @@ -4,41 +4,41 @@ "data.rds", "app.R:md5,d41d8cd98f00b204e9800998ecf8427e", [ - "versions.yml:md5,9a3135ae8ff362a9671b280dcc5781da" + "versions.yml:md5,a6c3af4b2fd261b4049c92449ea6bb4d" ] ], "meta": { "nf-test": "0.8.4", "nextflow": "23.10.1" }, - "timestamp": "2024-05-03T08:47:11.758494" + "timestamp": "2024-06-25T09:43:49.880332" }, "mouse - multi matrix": { "content": [ "data.rds", "app.R:md5,bedcfc45b6cdcc2b8fe3627987e2b17a", [ - "versions.yml:md5,9a3135ae8ff362a9671b280dcc5781da" + "versions.yml:md5,a6c3af4b2fd261b4049c92449ea6bb4d" ] ], "meta": { "nf-test": "0.8.4", "nextflow": "23.10.1" }, - "timestamp": "2024-05-03T08:46:37.144273" + "timestamp": "2024-06-25T09:43:15.455356" }, "mouse - single matrix": { "content": [ "data.rds", "app.R:md5,bedcfc45b6cdcc2b8fe3627987e2b17a", [ - "versions.yml:md5,9a3135ae8ff362a9671b280dcc5781da" + "versions.yml:md5,a6c3af4b2fd261b4049c92449ea6bb4d" ] ], "meta": { "nf-test": "0.8.4", "nextflow": "23.10.1" }, - "timestamp": "2024-05-03T08:46:57.227288" + "timestamp": "2024-06-25T09:43:35.309081" } } \ No newline at end of file diff --git a/modules/nf-core/shinyngs/staticdifferential/environment.yml b/modules/nf-core/shinyngs/staticdifferential/environment.yml index bec57084..f352c61e 100644 --- a/modules/nf-core/shinyngs/staticdifferential/environment.yml +++ b/modules/nf-core/shinyngs/staticdifferential/environment.yml @@ -4,4 +4,4 @@ channels: - bioconda - defaults dependencies: - - bioconda::r-shinyngs=1.8.8 + - bioconda::r-shinyngs=2.0.0 diff --git a/modules/nf-core/shinyngs/staticdifferential/main.nf b/modules/nf-core/shinyngs/staticdifferential/main.nf index c61ccb4a..40582d66 100644 --- a/modules/nf-core/shinyngs/staticdifferential/main.nf +++ b/modules/nf-core/shinyngs/staticdifferential/main.nf @@ -4,8 +4,8 @@ process SHINYNGS_STATICDIFFERENTIAL { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/r-shinyngs:1.8.8--r43hdfd78af_0' : - 'biocontainers/r-shinyngs:1.8.8--r43hdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/r-shinyngs:2.0.0--r43hdfd78af_0' : + 'biocontainers/r-shinyngs:2.0.0--r43hdfd78af_0' }" input: tuple val(meta), path(differential_result) // Differential info: contrast and differential stats diff --git a/modules/nf-core/shinyngs/staticexploratory/environment.yml b/modules/nf-core/shinyngs/staticexploratory/environment.yml index 1c923f1b..74f9d52f 100644 --- a/modules/nf-core/shinyngs/staticexploratory/environment.yml +++ b/modules/nf-core/shinyngs/staticexploratory/environment.yml @@ -4,4 +4,4 @@ channels: - bioconda - defaults dependencies: - - bioconda::r-shinyngs=1.8.8 + - bioconda::r-shinyngs=2.0.0 diff --git a/modules/nf-core/shinyngs/staticexploratory/main.nf b/modules/nf-core/shinyngs/staticexploratory/main.nf index 1a3104b3..379a2b11 100644 --- a/modules/nf-core/shinyngs/staticexploratory/main.nf +++ b/modules/nf-core/shinyngs/staticexploratory/main.nf @@ -4,8 +4,8 @@ process SHINYNGS_STATICEXPLORATORY { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/r-shinyngs:1.8.8--r43hdfd78af_0' : - 'biocontainers/r-shinyngs:1.8.8--r43hdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/r-shinyngs:2.0.0--r43hdfd78af_0' : + 'biocontainers/r-shinyngs:2.0.0--r43hdfd78af_0' }" input: tuple val(meta), path(sample), path(feature_meta), path(assay_files) diff --git a/modules/nf-core/shinyngs/staticexploratory/tests/main.nf.test.snap b/modules/nf-core/shinyngs/staticexploratory/tests/main.nf.test.snap index b95d031c..289428ef 100644 --- a/modules/nf-core/shinyngs/staticexploratory/tests/main.nf.test.snap +++ b/modules/nf-core/shinyngs/staticexploratory/tests/main.nf.test.snap @@ -8,14 +8,14 @@ "pca3d.png", "sample_dendrogram.png", [ - "versions.yml:md5,526fbe61b95ad3a722d7470ca1874ca3" + "versions.yml:md5,e04025d7790ddfa09ba5bd719cfba8c7" ] ], "meta": { "nf-test": "0.8.4", "nextflow": "23.10.1" }, - "timestamp": "2024-05-03T08:48:20.908769" + "timestamp": "2024-06-25T10:24:53.456056" }, "mouse - defaults": { "content": [ @@ -26,14 +26,14 @@ "pca3d.png", "sample_dendrogram.png", [ - "versions.yml:md5,526fbe61b95ad3a722d7470ca1874ca3" + "versions.yml:md5,e04025d7790ddfa09ba5bd719cfba8c7" ] ], "meta": { "nf-test": "0.8.4", "nextflow": "23.10.1" }, - "timestamp": "2024-05-03T08:48:06.589763" + "timestamp": "2024-06-25T10:24:39.111271" }, "mouse - specify log": { "content": [ @@ -44,14 +44,14 @@ "pca3d.png", "sample_dendrogram.png", [ - "versions.yml:md5,526fbe61b95ad3a722d7470ca1874ca3" + "versions.yml:md5,e04025d7790ddfa09ba5bd719cfba8c7" ] ], "meta": { "nf-test": "0.8.4", "nextflow": "23.10.1" }, - "timestamp": "2024-05-03T08:48:41.352789" + "timestamp": "2024-06-25T10:25:14.646472" }, "mouse - html": { "content": [ @@ -67,13 +67,13 @@ false, false, [ - "versions.yml:md5,526fbe61b95ad3a722d7470ca1874ca3" + "versions.yml:md5,e04025d7790ddfa09ba5bd719cfba8c7" ] ], "meta": { "nf-test": "0.8.4", "nextflow": "23.10.1" }, - "timestamp": "2024-05-03T08:49:04.969108" + "timestamp": "2024-06-25T10:25:38.256352" } } \ No newline at end of file diff --git a/modules/nf-core/shinyngs/validatefomcomponents/environment.yml b/modules/nf-core/shinyngs/validatefomcomponents/environment.yml index 4f3067bc..07485298 100644 --- a/modules/nf-core/shinyngs/validatefomcomponents/environment.yml +++ b/modules/nf-core/shinyngs/validatefomcomponents/environment.yml @@ -4,4 +4,4 @@ channels: - bioconda - defaults dependencies: - - bioconda::r-shinyngs=1.8.8 + - bioconda::r-shinyngs=2.0.0 diff --git a/modules/nf-core/shinyngs/validatefomcomponents/main.nf b/modules/nf-core/shinyngs/validatefomcomponents/main.nf index fad3948a..bedab3e6 100644 --- a/modules/nf-core/shinyngs/validatefomcomponents/main.nf +++ b/modules/nf-core/shinyngs/validatefomcomponents/main.nf @@ -4,8 +4,8 @@ process SHINYNGS_VALIDATEFOMCOMPONENTS { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/r-shinyngs:1.8.8--r43hdfd78af_0' : - 'biocontainers/r-shinyngs:1.8.8--r43hdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/r-shinyngs:2.0.0--r43hdfd78af_0' : + 'biocontainers/r-shinyngs:2.0.0--r43hdfd78af_0' }" input: tuple val(meta), path(sample), path(assay_files) diff --git a/nextflow.config b/nextflow.config index 6390c9bf..e608354d 100644 --- a/nextflow.config +++ b/nextflow.config @@ -176,7 +176,6 @@ params { // ShinyNGS shinyngs_build_app = true - shinyngs_guess_unlog_matrices = true // Note: for shinyapps deployment, in addition to setting these values, // SHINYAPPS_TOKEN and SHINYAPPS_SECRET must be available to the @@ -194,15 +193,18 @@ params { igenomes_ignore = false // Boilerplate options - outdir = null - publish_dir_mode = 'copy' - email = null - email_on_fail = null - plaintext_email = false - monochrome_logs = false - hook_url = null - help = false - version = false + outdir = null + publish_dir_mode = 'copy' + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + hook_url = null + help = false + help_full = false + show_hidden = false + version = false + pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/' // Config options config_profile_name = null @@ -212,131 +214,141 @@ params { config_profile_contact = null config_profile_url = null - // Max resource options - // Defaults only, expecting to be overwritten - max_memory = '128.GB' - max_cpus = 16 - max_time = '240.h' - // Schema validation default options - validationFailUnrecognisedParams = false - validationLenientMode = false - validationSchemaIgnoreParams = 'genomes,igenomes_base' - validationShowHiddenParams = false - validate_params = true + validate_params = true + + // ---------------------------------------------- // + // Experimental analysis options // + // ---------------------------------------------- // + + tools = "${projectDir}/assets/tools_samplesheet.csv" + pathway = null + + // propd options + propd_alpha = null + propd_moderated = true + propd_fdr = 0.05 + propd_permutation = 0 + propd_ncutoffs = 100 + propd_weighted_degree = false + + // propr options + propr_metric = 'rho' + propr_ivar = 'clr' + propr_alpha = null + propr_fdr = 0.05 + propr_permutation = 100 + propr_ncutoffs = 100 + propr_tails = "right" + + // grea options + grea_set_min = 15 + grea_set_max = 500 + grea_permutation = 100 } // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// Load nf-core custom profiles from different Institutions -try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -// Load nf-core/differentialabundance custom profiles from different institutions. -// Warning: Uncomment only if a pipeline-specific institutional config already exists on nf-core/configs! -// try { -// includeConfig "${params.custom_config_base}/pipeline/differentialabundance.config" -// } catch (Exception e) { -// System.err.println("WARNING: Could not load nf-core/config/differentialabundance profiles: ${params.custom_config_base}/pipeline/differentialabundance.config") -// } profiles { debug { - dumpHashes = true - process.beforeScript = 'echo $HOSTNAME' - cleanup = false + dumpHashes = true + process.beforeScript = 'echo $HOSTNAME' + cleanup = false nextflow.enable.configProcessNamesValidation = true } conda { - conda.enabled = true - docker.enabled = false - conda.enabled = true - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - channels = ['conda-forge', 'bioconda', 'defaults'] - apptainer.enabled = false + conda.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + conda.channels = ['conda-forge', 'bioconda'] + apptainer.enabled = false } mamba { - conda.enabled = true - conda.useMamba = true - docker.enabled = false - conda.enabled = true - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - apptainer.enabled = false + conda.enabled = true + conda.useMamba = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + apptainer.enabled = false } docker { - docker.enabled = true - conda.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - apptainer.enabled = false - docker.runOptions = '-u $(id -u):$(id -g)' + docker.enabled = true + conda.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + apptainer.enabled = false + docker.runOptions = '-u $(id -u):$(id -g)' } arm { - docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64' + docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64' } singularity { - singularity.enabled = true - singularity.autoMounts = true - conda.enabled = false - docker.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - apptainer.enabled = false + singularity.enabled = true + singularity.autoMounts = true + conda.enabled = false + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + apptainer.enabled = false } podman { - podman.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - shifter.enabled = false - charliecloud.enabled = false - apptainer.enabled = false + podman.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + apptainer.enabled = false } shifter { - shifter.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - podman.enabled = false - charliecloud.enabled = false - apptainer.enabled = false + shifter.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + apptainer.enabled = false } charliecloud { - charliecloud.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - apptainer.enabled = false + charliecloud.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + apptainer.enabled = false } apptainer { - apptainer.enabled = true - apptainer.autoMounts = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false + apptainer.enabled = true + apptainer.autoMounts = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + wave { + apptainer.ociAutoPull = true + singularity.ociAutoPull = true + wave.enabled = true + wave.freeze = true + wave.strategy = 'conda,container' } gitpod { - executor.name = 'local' - executor.cpus = 4 - executor.memory = 8.GB + executor.name = 'local' + executor.cpus = 4 + executor.memory = 8.GB } test { includeConfig 'conf/test.config' } test_nogtf { includeConfig 'conf/test_nogtf.config' } @@ -348,27 +360,24 @@ profiles { test_affy { includeConfig 'conf/test_affy.config' } test_maxquant { includeConfig 'conf/test_maxquant.config' } test_soft {includeConfig 'conf/test_soft.config' } + test_experimental {includeConfig 'conf/test_experimental.config' } } -// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile -// Will not be used unless Apptainer / Docker / Podman / Singularity are enabled -// Set to your registry if you have a mirror of containers -apptainer.registry = 'quay.io' -docker.registry = 'quay.io' -podman.registry = 'quay.io' -singularity.registry = 'quay.io' +// Load nf-core custom profiles from different Institutions +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null" -// Nextflow plugins -plugins { - id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet -} +// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile +// Will not be used unless Apptainer / Docker / Podman / Charliecloud / Singularity are enabled +// Set to your registry if you have a mirror of containers +apptainer.registry = 'quay.io' +docker.registry = 'quay.io' +podman.registry = 'quay.io' +singularity.registry = 'quay.io' +charliecloud.registry = 'quay.io' // Load igenomes.config if required -if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' -} else { - params.genomes = [:] -} +includeConfig !params.igenomes_ignore ? 'conf/igenomes.config' : 'conf/igenomes_ignored.config' + // Export these variables to prevent local Python/R libraries from conflicting with those in the container // The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. // See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable. @@ -380,8 +389,15 @@ env { JULIA_DEPOT_PATH = "/usr/local/share/julia" } -// Capture exit codes from upstream processes when piping -process.shell = ['/bin/bash', '-euo', 'pipefail'] +// Set bash options +process.shell = """\ +bash + +set -e # Exit if a tool returns a non-zero status/exit code +set -u # Treat unset variables and parameters as an error +set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute +set -C # No clobber - prevent output redirection from overwriting files. +""" // Disable process selector warnings by default. Use debug profile to enable warnings. nextflow.enable.configProcessNamesValidation = false @@ -410,43 +426,46 @@ manifest { homePage = 'https://github.com/nf-core/differentialabundance' description = 'Differential abundance analysis' mainScript = 'main.nf' - nextflowVersion = '!>=23.10.0' - version = '1.5.0' + nextflowVersion = '!>=24.04.2' + version = '1.6.0dev' doi = '10.5281/zenodo.7568000' } -// Load modules.config for DSL2 module specific options -includeConfig 'conf/modules.config' +// Nextflow plugins +plugins { + id 'nf-schema@2.1.1' // Validation of pipeline parameters and creation of an input channel from a sample sheet +} -// Function to ensure that resource requirements don't go beyond -// a maximum limit -def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } +validation { + defaultIgnoreParams = ["genomes"] + help { + enabled = true + command = "nextflow run $manifest.name -profile --input samplesheet.csv --outdir " + fullParameter = "help_full" + showHiddenParameter = "show_hidden" + beforeText = """ +-\033[2m----------------------------------------------------\033[0m- + \033[0;32m,--.\033[0;30m/\033[0;32m,-.\033[0m +\033[0;34m ___ __ __ __ ___ \033[0;32m/,-._.--~\'\033[0m +\033[0;34m |\\ | |__ __ / ` / \\ |__) |__ \033[0;33m} {\033[0m +\033[0;34m | \\| | \\__, \\__/ | \\ |___ \033[0;32m\\`-._,-`-,\033[0m + \033[0;32m`._,._,\'\033[0m +\033[0;35m ${manifest.name} ${manifest.version}\033[0m +-\033[2m----------------------------------------------------\033[0m- +""" + afterText = """${manifest.doi ? "* The pipeline\n" : ""}${manifest.doi.tokenize(",").collect { " https://doi.org/${it.trim().replace('https://doi.org/','')}"}.join("\n")}${manifest.doi ? "\n" : ""} +* The nf-core framework + https://doi.org/10.1038/s41587-020-0439-x + +* Software dependencies + https://github.com/${manifest.name}/blob/master/CITATIONS.md +""" + } + summary { + beforeText = validation.help.beforeText + afterText = validation.help.afterText } } + +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' diff --git a/nextflow_schema.json b/nextflow_schema.json index 7e329647..c2695f71 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/differentialabundance/master/nextflow_schema.json", "title": "nf-core/differentialabundance pipeline parameters", "description": "Differential abundance analysis", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -24,7 +24,7 @@ "default": "rnaseq", "description": "A string identifying the technology used to produce the data", "help_text": "Currently 'rnaseq' or 'affy_array' may be specified.", - "enum": ["rnaseq", "affy_array", "maxquant", "geo_soft_file"], + "enum": ["rnaseq", "affy_array", "maxquant", "geo_soft_file", "experimental"], "fa_icon": "far fa-keyboard" }, "input": { @@ -61,6 +61,31 @@ } } }, + "run_experimental_branch_through_the_toolsheet_config": { + "title": "Run experimental branch through the toolsheet config", + "type": "object", + "fa_icon": "fas fa-terminal", + "description": "Values required for experimental analysis", + "properties": { + "tools": { + "type": "string", + "format": "file-path", + "exists": true, + "mimetype": "text/csv", + "schema": "assets/schema_tools.json", + "pattern": "^\\S+\\.(csv|tsv|yaml)$", + "description": "Path to comma-separated file containing samplesheet", + "help_text": "This file defines possible combinations of tools, which are to be run by the pipeline", + "default": "${projectDir}/assets/tools_samplesheet.csv" + }, + "pathway": { + "type": "string", + "fa_icon": "fas fa-border-all", + "description": "(experimantal only): Choose a (list of) pathway from those predefined in the tool sheet", + "help_text": "Choose the a subset of pathways to run. Pathways are defined in the tool sheet." + } + } + }, "abundance_values": { "title": "Abundance values", "type": "object", @@ -84,14 +109,12 @@ }, "affy_cel_files_archive": { "type": "string", - "default": "null", "description": "Alternative to matrix: a compressed CEL files archive such as often found in GEO", "fa_icon": "fas fa-file-archive", "help_text": "Use this option to provide a raw archive of CEL files from Affymetrix arrays. Will be ignored if a matrix is specified." }, "querygse": { "type": "string", - "default": "null", "description": "Use SOFT files from GEO by providing the GSE study identifier", "fa_icon": "fas fa-keyboard", "help_text": "Use this option to provide a GSE study identifier." @@ -727,6 +750,89 @@ }, "fa_icon": "fas fa-border-all" }, + "propd_specific_options": { + "title": "propd specific options", + "type": "object", + "description": "Parameters to run differential proportionality", + "default": "", + "properties": { + "propd_alpha": { + "type": "string", + "default": "null", + "description": "Alpha parameter for Box-Cox transformation. Default is skipped, and log transformation is used." + }, + "propd_moderated": { + "type": "boolean", + "default": true, + "description": "Use moderated version of differential proportionality coefficients. This is usually needed when p>>n." + }, + "propd_fdr": { + "type": "number", + "default": 0.05, + "description": "FDR threshold for filtering the significant pairs." + }, + "propd_permutation": { + "type": "integer", + "default": 0, + "description": "When this parameter is 0, use the default statistical test to determine the FDR. When this number is > 0, run permutation test to get the FDR with as many permutations as defined by this parameter. Usually a minimum number of 100 permutations is needed." + }, + "propd_ncutoffs": { + "type": "integer", + "default": 100, + "description": "Because it is expensive to calculate an associated p-value for all the pairs through all the permutation tests, this number is used to define on how many values to calculate the FDR. The higher this value is, the higher the granularity." + }, + "propd_weighted_degree": { + "type": "boolean", + "default": false, + "description": "If true, use weighted degree to filter the hub genes. Otherwie, use the degree." + } + } + }, + "propr_specific_options": { + "title": "propr specific options", + "type": "object", + "description": "Parameters to run correlation analysis through propr package", + "default": "", + "properties": { + "propr_metric": { + "type": "string", + "default": "rho", + "description": "The correlation metric to use.", + "enum": ["rho", "cor", "pcor", "pcor.shrink", "pcor.bshrink", "vlr", "phi", "phs"] + }, + "propr_ivar": { + "type": "string", + "default": "clr", + "description": "Which logratio transformation to use. 'clr' means centered logratio transformation, which uses the sample geometric mean as reference. 'alr' means additive logratio transformation, and it can use any gene or genes as reference(s). One can give the gene name(s) or index(es) as reference(s)." + }, + "propr_alpha": { + "type": "string", + "default": "null", + "description": "Alpha parameter for Box-Cox transformation. Default is skipped, and log transformation is used." + }, + "propr_fdr": { + "type": "number", + "default": 0.05, + "description": "FDR threshold for filtering the significant pairs." + }, + "propr_permutation": { + "type": "integer", + "default": 100, + "description": "Number of permutations for the permutation test that is used to compute the FDR of observing a given correlation coefficient value by chance." + }, + "propr_ncutoffs": { + "type": "integer", + "default": 100, + "description": "Because it is expensive to calculate an associated p-value for all the pairs through all the permutation tests, this number is used to define on how many values to calculate the FDR. The higher this value is, the higher the granularity." + }, + "propr_tails": { + "type": "string", + "default": "right", + "enum": ["right", "both"], + "description": "Use one-sided 'right'-tailed test or 'both' for two-sided test with symmetric cutpoints." + } + } + }, "gsea": { "title": "GSEA", "type": "object", @@ -950,6 +1056,29 @@ }, "fa_icon": "fas fa-layer-group" }, + "grea_specific_options": { + "title": "GREA specific options", + "type": "object", + "description": "Parameters to run GREA (Gene Ratio Enrichment Analysis)", + "default": "", + "properties": { + "grea_set_min": { + "type": "integer", + "default": 15, + "description": "Min size: exclude smaller sets" + }, + "grea_set_max": { + "type": "integer", + "default": 500, + "description": "Max size: exclude larger sets" + }, + "grea_permutation": { + "type": "integer", + "default": 100, + "description": "Number of permutations for the permutation test that is used to compute the FDR of observing a given odds ratio by chance." + } + } + }, "shiny_app_settings": { "title": "Shiny app settings", "type": "object", @@ -979,12 +1108,6 @@ "default": "null", "description": "The name of the app to push to in your shinyapps.io account", "fa_icon": "fas fa-file-signature" - }, - "shinyngs_guess_unlog_matrices": { - "type": "boolean", - "default": true, - "description": "Should we guess the log status of matrices and unlog for the app?", - "help_text": "In the app context, it's usually helpful if things are not in log scale, so that e.g. fold changes make some sense with respect to observed values. This flag will cause the shinyngs app-building script to make a guess based on observed values as to the log status of input matrices, and adjust the loading accordingly." } }, "fa_icon": "fab fa-app-store-ios" @@ -1108,6 +1231,14 @@ "fa_icon": "fas fa-ban", "hidden": true, "help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`." + }, + "igenomes_base": { + "type": "string", + "format": "directory-path", + "description": "The base path to the igenomes reference files", + "fa_icon": "fas fa-ban", + "hidden": true, + "default": "s3://ngi-igenomes/igenomes/" } } }, @@ -1159,41 +1290,6 @@ } } }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" - }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "pattern": "^(\\d+\\.?\\s*(s|m|h|d|day)\\s*)+$", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" - } - } - }, "generic_options": { "title": "Generic options", "type": "object", @@ -1201,12 +1297,6 @@ "description": "Less common options for the pipeline, typically set in a config file.", "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, "version": { "type": "boolean", "description": "Display version and exit.", @@ -1253,90 +1343,85 @@ "default": true, "fa_icon": "fas fa-check-square" }, - "validationShowHiddenParams": { - "type": "boolean", - "fa_icon": "far fa-eye-slash", - "description": "Show all params when using `--help`", - "hidden": true, - "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." - }, - "validationFailUnrecognisedParams": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters fails when an unrecognised parameter is found.", - "hidden": true, - "help_text": "By default, when an unrecognised parameter is found, it returns a warinig." - }, - "validationLenientMode": { - "type": "boolean", + "pipelines_testdata_base_path": { + "type": "string", "fa_icon": "far fa-check-circle", - "description": "Validation of parameters in lenient more.", - "hidden": true, - "help_text": "Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode)." + "description": "Base URL or local path to location of pipeline test dataset files", + "default": "https://raw.githubusercontent.com/nf-core/test-datasets/", + "hidden": true } } } }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" + }, + { + "$ref": "#/$defs/run_experimental_branch_through_the_toolsheet_config" + }, + { + "$ref": "#/$defs/abundance_values" + }, + { + "$ref": "#/$defs/observations_options" }, { - "$ref": "#/definitions/abundance_values" + "$ref": "#/$defs/features_options" }, { - "$ref": "#/definitions/observations_options" + "$ref": "#/$defs/affy_input_options" }, { - "$ref": "#/definitions/features_options" + "$ref": "#/$defs/proteus_input_options" }, { - "$ref": "#/definitions/affy_input_options" + "$ref": "#/$defs/filtering" }, { - "$ref": "#/definitions/proteus_input_options" + "$ref": "#/$defs/exploratory_analysis" }, { - "$ref": "#/definitions/filtering" + "$ref": "#/$defs/differential_analysis" }, { - "$ref": "#/definitions/exploratory_analysis" + "$ref": "#/$defs/deseq2_specific_options_rna_seq_only" }, { - "$ref": "#/definitions/differential_analysis" + "$ref": "#/$defs/limma_specific_options_microarray_only" }, { - "$ref": "#/definitions/deseq2_specific_options_rna_seq_only" + "$ref": "#/$defs/propd_specific_options" }, { - "$ref": "#/definitions/limma_specific_options_microarray_only" + "$ref": "#/$defs/propr_specific_options" }, { - "$ref": "#/definitions/gsea" + "$ref": "#/$defs/gsea" }, { - "$ref": "#/definitions/gprofiler2" + "$ref": "#/$defs/gprofiler2" }, { - "$ref": "#/definitions/shiny_app_settings" + "$ref": "#/$defs/grea_specific_options" }, { - "$ref": "#/definitions/gene_set_options" + "$ref": "#/$defs/shiny_app_settings" }, { - "$ref": "#/definitions/reporting_options" + "$ref": "#/$defs/gene_set_options" }, { - "$ref": "#/definitions/reference_genome_options" + "$ref": "#/$defs/reporting_options" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/$defs/reference_genome_options" }, { - "$ref": "#/definitions/max_job_request_options" + "$ref": "#/$defs/institutional_config_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/pyproject.toml b/pyproject.toml deleted file mode 100644 index 56110621..00000000 --- a/pyproject.toml +++ /dev/null @@ -1,15 +0,0 @@ -# Config file for Python. Mostly used to configure linting of bin/*.py with Ruff. -# Should be kept the same as nf-core/tools to avoid fighting with template synchronisation. -[tool.ruff] -line-length = 120 -target-version = "py38" -cache-dir = "~/.cache/ruff" - -[tool.ruff.lint] -select = ["I", "E1", "E4", "E7", "E9", "F", "UP", "N"] - -[tool.ruff.lint.isort] -known-first-party = ["nf_core"] - -[tool.ruff.lint.per-file-ignores] -"__init__.py" = ["E402", "F401"] diff --git a/subworkflows/local/correlation/main.nf b/subworkflows/local/correlation/main.nf new file mode 100644 index 00000000..d62f435f --- /dev/null +++ b/subworkflows/local/correlation/main.nf @@ -0,0 +1,35 @@ +// +// Perform correlation analysis +// +include {PROPR_PROPR as PROPR} from "../../../modules/local/propr/propr/main.nf" + +workflow CORRELATION { + take: + ch_counts // [ meta, counts] with meta keys: method, args_cor + + main: + + // initialize empty results channels + ch_matrix = Channel.empty() + ch_adjacency = Channel.empty() + + // branch tools to select the correct correlation analysis method + ch_counts + .branch { + propr: it[0]["method"] == "propr" + } + .set { ch_counts } + + // ---------------------------------------------------- + // Perform correlation analysis with propr + // ---------------------------------------------------- + PROPR(ch_counts.propr.unique()) + ch_matrix = PROPR.out.matrix.mix(ch_matrix) + ch_adjacency = PROPR.out.adjacency.mix(ch_adjacency) + + // TODO: divide propr module into cor, propr, pcor, pcorbshrink, etc. + + emit: + matrix = ch_matrix + adjacency = ch_adjacency +} diff --git a/subworkflows/local/differential/main.nf b/subworkflows/local/differential/main.nf new file mode 100644 index 00000000..680e0c91 --- /dev/null +++ b/subworkflows/local/differential/main.nf @@ -0,0 +1,183 @@ +// +// Perform differential analysis +// +include { PROPR_PROPD as PROPD } from "../../../modules/local/propr/propd/main.nf" +include { DESEQ2_DIFFERENTIAL as DESEQ2_NORM } from "../../../modules/nf-core/deseq2/differential/main" +include { DESEQ2_DIFFERENTIAL } from '../../../modules/nf-core/deseq2/differential/main' +include { LIMMA_DIFFERENTIAL } from '../../../modules/nf-core/limma/differential/main' +include { FILTER_DIFFTABLE as FILTER_DIFFTABLE_DESEQ2 } from '../../../modules/local/filter_difftable' +include { FILTER_DIFFTABLE as FILTER_DIFFTABLE_LIMMA } from '../../../modules/local/filter_difftable' + +workflow DIFFERENTIAL { + take: + ch_counts // [ meta_exp, counts ] with meta keys: method, args_diff + ch_samplesheet // [ meta_exp, samplesheet ] + ch_contrasts // [ meta_contrast, contrast_variable, reference, target ] + + main: + + // initialize empty results channels + // NOTE that ch_results pairwise and adjacency are a special case of results, which stores pairwise DE results (see propd) + // whereas ch_results stores gene-wise DE results (traditional methods like deseq2 and limma only provide gene-wise results) + + ch_results_genewise = Channel.empty() + ch_results_genewise_filtered = Channel.empty() + ch_results_pairwise = Channel.empty() + ch_results_pairwise_filtered = Channel.empty() + ch_adjacency = Channel.empty() + ch_norm = Channel.empty() // channel to store the normalized data + ch_norm_for_plotting = Channel.empty() // channel to store the normalized data for plotting + + // branch the data channel to the correct differential analysis method, based on the method key specified in the meta data + ch_counts + .branch { + propd: it[0]["method"] == "propd" + deseq2: it[0]["method"] == "deseq2" + limma: it[0]["method"] == "limma" + } + .set { ch_counts } + + // ---------------------------------------------------- + // Perform differential analysis with PROPD + // ---------------------------------------------------- + + // TODO propd currently don't support blocking, so we should not run propd with same contrast_variable, reference and target, + // but different blocking variable, since it will simply run the same analysis again. + + ch_counts.propd + .combine(ch_samplesheet) + .filter{ meta_counts, counts, meta_samplesheet, samplesheet -> meta_counts.subMap(meta_samplesheet.keySet()) == meta_samplesheet } + .combine(ch_contrasts) + .map { + meta_data, counts, meta_samplesheet, samplesheet, meta_contrast, contrast_variable, reference, target -> + def meta = meta_data.clone() + ['contrast': meta_contrast.id] + return [ meta, counts, samplesheet, contrast_variable, reference, target ] + } + .unique() + .set { ch_propd } + + PROPD(ch_propd) + + ch_results_genewise = PROPD.out.connectivity.mix(ch_results_genewise) + ch_results_genewise_filtered = PROPD.out.hub_genes.mix(ch_results_genewise_filtered) + ch_results_pairwise = PROPD.out.results.mix(ch_results_pairwise) + ch_results_pairwise_filtered = PROPD.out.results_filtered.mix(ch_results_pairwise_filtered) + ch_adjacency = PROPD.out.adjacency.mix(ch_adjacency) + + // ---------------------------------------------------- + // Perform differential analysis with DESEQ2 + // ---------------------------------------------------- + + // parse input channels for deseq2 modules + + if (params.transcript_length_matrix) { ch_transcript_lengths = Channel.of([ exp_meta, file(params.transcript_length_matrix, checkIfExists: true)]).first() } else { ch_transcript_lengths = Channel.of([[],[]]) } + if (params.control_features) { ch_control_features = Channel.of([ exp_meta, file(params.control_features, checkIfExists: true)]).first() } else { ch_control_features = Channel.of([[],[]]) } + ch_counts.deseq2 + .combine(ch_samplesheet) + .combine(ch_contrasts) + .combine(ch_transcript_lengths) + .combine(ch_control_features) + .unique() + .multiMap { + meta_counts, counts, meta_samplesheet, samplesheet, meta_contrast, contrast_variable, reference, target, meta_lengths, lengths, meta_control, control -> + def meta = meta_counts.clone() + ['contrast': meta_contrast.id] + contrast: [ meta, contrast_variable, reference, target ] + samples_and_matrix: [ meta, samplesheet, counts ] + control_features: [ meta, control ] + transcript_lengths: [ meta, lengths ] + } + .set { ch_deseq2 } + + // normalize the data using deseq2 + // NOTE that for the moment the output of this module is only needed for plot_exploratory, + // and as input for GSEA (for the moment GSEA cannot take the preranked DE results from DESEQ2_DIFFERENTIAL) + + DESEQ2_NORM ( + ch_deseq2.contrast.first(), + ch_deseq2.samples_and_matrix, + ch_deseq2.control_features, + ch_deseq2.transcript_lengths + ) + + ch_norm = DESEQ2_NORM.out.normalised_counts.mix(ch_norm) + ch_norm_for_plotting = DESEQ2_NORM.out.normalised_counts + .join(DESEQ2_NORM.out.rlog_counts) + .join(DESEQ2_NORM.out.vst_counts) // CHECK if this is correct, otherwise add only if these outputs are present + .map{ it.tail() } + .mix(ch_norm_for_plotting) + + // run DE analysis using deseq2 + + DESEQ2_DIFFERENTIAL ( + ch_deseq2.contrast, + ch_deseq2.samples_and_matrix, + ch_deseq2.control_features, + ch_deseq2.transcript_lengths + ) + + ch_results_genewise = DESEQ2_DIFFERENTIAL.out.results.mix(ch_results_genewise) + + // filter results + + // TODO modify the module to accept these parameters as meta/ext.args in the same way how propd does + ch_logfc_deseq2 = Channel.value([ "log2FoldChange", params.differential_min_fold_change ]) + ch_padj_deseq2 = Channel.value([ "padj", params.differential_max_qval ]) + + FILTER_DIFFTABLE_DESEQ2( + DESEQ2_DIFFERENTIAL.out.results, + ch_logfc_deseq2, + ch_padj_deseq2 + ) + + ch_results_genewise_filtered = FILTER_DIFFTABLE_DESEQ2.out.filtered.mix(ch_results_genewise_filtered) + + // ---------------------------------------------------- + // Perform differential analysis with limma + // ---------------------------------------------------- + + // parse input channels for limma + // TODO provide the normalized data to limma + + ch_counts.limma + .combine(ch_samplesheet) + .filter{ meta_counts, counts, meta_samplesheet, samplesheet -> meta_counts.subMap(meta_samplesheet.keySet()) == meta_samplesheet } + .combine(ch_contrasts) + .unique() + .multiMap { + meta_counts, counts, meta_samplesheet, samplesheet, meta_contrast, contrast_variable, reference, target -> + def meta = meta_counts.clone() + meta_contrast.clone() + input1: [ meta, contrast_variable, reference, target ] + input2: [ meta, samplesheet, counts ] + } + .set { ch_limma } + + // run limma + + LIMMA_DIFFERENTIAL(ch_limma.input1, ch_limma.input2) + + ch_results_genewise = LIMMA_DIFFERENTIAL.out.results.mix(ch_results_genewise) + + // filter results + + // note that these are column names specific for limma output table + // TODO modify the module to accept these parameters as meta/ext.args in the same way how propd does + ch_logfc_limma = Channel.value([ "logFC", params.differential_min_fold_change ]) + ch_padj_limma = Channel.value([ "adj.P.Val", params.differential_max_qval ]) + + FILTER_DIFFTABLE_LIMMA( + LIMMA_DIFFERENTIAL.out.results, + ch_logfc_limma, + ch_padj_limma + ) + + ch_results_genewise_filtered = FILTER_DIFFTABLE_LIMMA.out.filtered.mix(ch_results_genewise_filtered) + + emit: + results_genewise = ch_results_genewise + results_genewise_filtered = ch_results_genewise_filtered + results_pairwise = ch_results_pairwise + results_pairwise_filtered = ch_results_pairwise_filtered + adjacency = ch_adjacency + ch_norm = ch_norm + ch_norm_for_plotting = ch_norm_for_plotting +} diff --git a/subworkflows/local/enrichment/main.nf b/subworkflows/local/enrichment/main.nf new file mode 100644 index 00000000..7508cef3 --- /dev/null +++ b/subworkflows/local/enrichment/main.nf @@ -0,0 +1,85 @@ +// +// Perform enrichment analysis +// +include { MYGENE } from "../../../modules/nf-core/mygene/main.nf" +include { PROPR_GREA as GREA } from "../../../modules/local/propr/grea/main.nf" +include { GPROFILER2_GOST } from "../../../modules/nf-core/gprofiler2/gost/main.nf" + +workflow ENRICHMENT { + take: + ch_counts // [ meta, counts] with meta keys: method, args_cor + ch_results_genewise // [ meta, results] with meta keys: method, args_cor + ch_results_genewise_filtered // [ meta, results] with meta keys: method, args_cor + ch_adjacency // [ meta, adj_matrix] with meta keys: method, args_cor + // TODO: add ch_gm when provided by user, etc. + + main: + + // initialize empty results channels + ch_enriched = Channel.empty() + + // ---------------------------------------------------- + // Construct gene set database + // ---------------------------------------------------- + + // TODO this should be optional, only run when there is no gene set data provided by user + + MYGENE(ch_counts.take(1)) // only one data is provided to this pipeline + ch_gmt = MYGENE.out.gmt + + // ---------------------------------------------------- + // Perform enrichment analysis with GREA + // ---------------------------------------------------- + + ch_adjacency + .filter { it[0]["method"] == "grea" } + .unique() + .set { ch_adjacency_to_grea } + + GREA(ch_adjacency_to_grea, ch_gmt.collect()) + ch_enriched = ch_enriched.mix(GREA.out.results) + + // ---------------------------------------------------- + // Perform enrichment analysis with GSEA + // ---------------------------------------------------- + + // todo: add gsea here + + // ---------------------------------------------------- + // Perform enrichment analysis with gprofiler2 + // ---------------------------------------------------- + + // parse input channels + // TODO we need to find a way to combine these information with also args coming from toolsheet and modules.config + + if (!params.gprofiler2_background_file) { // If deactivated, use empty list as "background" + ch_background = [] + } else if (params.gprofiler2_background_file == "auto") { // If auto, use input matrix as background + ch_background = ch_counts.map { meta, counts -> counts } + } else { + ch_background = Channel.from(file(params.gprofiler2_background_file, checkIfExists: true)) + } + + ch_gmt = ch_gmt.map { meta, gmt -> gmt } + + ch_results_genewise_filtered + .filter { it[0]["method"] == "gprofiler2" } + .unique() + .set { ch_for_gprofiler2 } + + // run gprofiler2 + + GPROFILER2_GOST( + ch_for_gprofiler2, + ch_gmt, + ch_background + ) + + // collect results + + ch_enriched = ch_enriched.mix(GPROFILER2_GOST.out.all_enrich) + + // TODO also collect the files for report purposes + emit: + enriched = ch_enriched +} diff --git a/subworkflows/local/experimental/main.nf b/subworkflows/local/experimental/main.nf new file mode 100644 index 00000000..143a8305 --- /dev/null +++ b/subworkflows/local/experimental/main.nf @@ -0,0 +1,173 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + IMPORT LOCAL MODULES/SUBWORKFLOWS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +include { DIFFERENTIAL } from '../differential/main.nf' +include { CORRELATION } from '../correlation/main.nf' +include { ENRICHMENT } from '../enrichment/main.nf' + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + DEFINE AUXILIARY FUNCTIONS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +// function used to preprocess the input of a subworkflow +// Basically, given ch_input and ch_tools_args, +// it updates the meta data of ch_input with the required method value and args + +def preprocess_subworkflow_input( ch_input, ch_tools_args, method_field_name) { + return ch_input + .combine(ch_tools_args) + // filter the tools_args to match the pathway_name + // if no pathway_name is provided, then it will match all + // NOTE that no pathway_name is provided when ch_tools only have one element. By doing this, it avoids the recomputation of processes that use the same tool/args but belong to different pathways + .filter{ meta, input, pathway, arg_maps -> meta["pathway_name"] ? meta["pathway_name"] == pathway["pathway_name"] : true } + // update meta with method value and args + .map{ meta, input, pathway, arg_map -> + def meta_clone = meta.clone() + pathway + arg_map.clone() + meta_clone["method"] = meta_clone.remove(method_field_name) + return [meta_clone, input] + } + .filter{ meta, input -> meta["method"] != []} +} + +// function used to postprocess the output of a subworkflow +// Basically, given ch_results and field_name, +// it removes the field_name from the meta data + +def postprocess_subworkflow_output( ch_results, field_name ) { + return ch_results + .map{ meta, data -> + def meta_clone = meta.clone() + meta_clone.removeAll{it.key in field_name} + return [meta_clone, data] + } +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RUN WORKFLOW +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +workflow EXPERIMENTAL { + take: + ch_contrasts // [ meta, contrast_variable, reference, target ] + ch_samplesheet // [ meta, samplesheet ] + ch_counts // [ meta, counts] + ch_tools // [ pathway_name, differential_map, correlation_map, enrichment_map ] + + main: + + // split toolsheet into channels: diff, corr, enr + // NOTE that when only one tools pathway is provided, the pathway_name is not needed, and hence removed + // by doing so, it avoids the recomputation of processes that use the same tool/args but belong to different pathways + + ch_tools.view() + ch_tools.count() + .combine(ch_tools) + .multiMap{ + n, pathway, differential_map, correlation_map, enrichment_map -> + def pathway_name = n == 1 ? ["pathway_name":""] : pathway + diff: [ pathway_name, differential_map ] + corr: [ pathway_name, correlation_map ] + enr: [ pathway_name, enrichment_map ] + } + .set{ ch_tools } + + // initialize empty results channels + + ch_results_genewise = Channel.empty() // differential results for genewise analysis - it should be a table + ch_results_genewise_filtered = Channel.empty() // differential results for genewise analysis - filtered - it should be a table + ch_results_pairwise = Channel.empty() // differential results for pairwise analysis - it should be a table + ch_results_pairwise_filtered = Channel.empty() // differential results for pairwise analysis - filtered - it should be a table + ch_adjacency = Channel.empty() // adjacency matrix showing the connections between the genes, with values 1|0 + ch_correlation = Channel.empty() // correlation matrix + ch_enriched = Channel.empty() // output table from enrichment analysis + + // ---------------------------------------------------- + // DIFFERENTIAL ANALYSIS BLOCK + // ---------------------------------------------------- + + // preprocess the input of the subworkflow, by adding 'method' and 'args_diff' + // NOTE that here we only preprocess ch_counts, to use it as the carrier of the method information + // since the rest of data channels would be combined with the ch_counts correspondingly, we don't need to preprocess them all + // This MUST be changed, if this is not the case (eg. if one wants to use a specific ch_contrasts element for a specific method, etc) + + preprocess_subworkflow_input(ch_counts, ch_tools.diff, "diff_method") + .set{ ch_counts_diff } + + // run differential subworkflow + + DIFFERENTIAL( + ch_counts_diff, + ch_samplesheet, + ch_contrasts + ) + + // collect and postprocess the output of the subworkflow, by removing 'method' and 'args_diff' + + ch_results_genewise = postprocess_subworkflow_output(DIFFERENTIAL.out.results_genewise,["method", "args_diff"]).mix(ch_results_genewise) + ch_results_genewise_filtered = postprocess_subworkflow_output(DIFFERENTIAL.out.results_genewise_filtered,["method", "args_diff"]).mix(ch_results_genewise_filtered) + ch_results_pairwise = postprocess_subworkflow_output(DIFFERENTIAL.out.results_pairwise,["method", "args_diff"]).mix(ch_results_pairwise) + ch_results_pairwise_filtered = postprocess_subworkflow_output(DIFFERENTIAL.out.results_pairwise_filtered,["method", "args_diff"]).mix(ch_results_pairwise_filtered) + ch_adjacency = postprocess_subworkflow_output(DIFFERENTIAL.out.adjacency,["method", "args_diff"]).mix(ch_adjacency) + + // ---------------------------------------------------- + // CORRELATION ANALYSIS BLOCK + // ---------------------------------------------------- + + // preprocess the input of the subworkflow, by adding 'method' and 'args_cor' + + preprocess_subworkflow_input(ch_counts, ch_tools.corr, "cor_method") + .set{ ch_counts_corr } + + // run correlation subworkflow + + CORRELATION( + ch_counts_corr + ) + + // collect and postprocess the output of the subworkflow, by removing 'method' and 'args_cor' + + ch_correlation = postprocess_subworkflow_output(CORRELATION.out.matrix,["method", "args_cor"]).mix(ch_correlation) + ch_adjacency = postprocess_subworkflow_output(CORRELATION.out.adjacency,["method", "args_cor"]).mix(ch_adjacency) + + // ---------------------------------------------------- + // FUNCTIONAL ENRICHMENT BLOCK + // ---------------------------------------------------- + + // preprocess the input of the subworkflow, by adding 'method' and 'args_enr' + // this is done by matching the 'pathway_name' in ch_tools + + preprocess_subworkflow_input(ch_counts, ch_tools.enr, "enr_method") + .set{ ch_counts_enr } + preprocess_subworkflow_input(ch_results_genewise, ch_tools.enr, "enr_method") + .set{ ch_results_genewise_enr } + preprocess_subworkflow_input(ch_results_genewise_filtered, ch_tools.enr, "enr_method") + .set{ ch_results_genewise_filtered_enr } + preprocess_subworkflow_input(ch_adjacency, ch_tools.enr, "enr_method") + .set{ ch_adjacency_enr } + + // run enrichment subworkflow + + ENRICHMENT( + ch_counts_enr, + ch_results_genewise_enr, + ch_results_genewise_filtered_enr, + ch_adjacency_enr + ) + + // collect the output of the subworkflow + + ch_enriched = ch_enriched.mix(ENRICHMENT.out.enriched) + + // ---------------------------------------------------- + // VISUALIZATION BLOCK + // ---------------------------------------------------- + + // TODO: call visualization stuff here +} diff --git a/subworkflows/local/utils_nfcore_differentialabundance_pipeline/main.nf b/subworkflows/local/utils_nfcore_differentialabundance_pipeline/main.nf index 07b557ed..81ca7d0e 100644 --- a/subworkflows/local/utils_nfcore_differentialabundance_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_differentialabundance_pipeline/main.nf @@ -8,29 +8,25 @@ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { UTILS_NFVALIDATION_PLUGIN } from '../../nf-core/utils_nfvalidation_plugin' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' -include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' +include { UTILS_NFSCHEMA_PLUGIN } from '../../nf-core/utils_nfschema_plugin' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { samplesheetToList } from 'plugin/nf-schema' include { completionEmail } from '../../nf-core/utils_nfcore_pipeline' include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' -include { dashedLine } from '../../nf-core/utils_nfcore_pipeline' -include { nfCoreLogo } from '../../nf-core/utils_nfcore_pipeline' include { imNotification } from '../../nf-core/utils_nfcore_pipeline' include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' -include { workflowCitation } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW TO INITIALISE PIPELINE -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_INITIALISATION { take: version // boolean: Display version and exit - help // boolean: Display help text validate_params // boolean: Boolean whether to validate parameters against the schema at runtime monochrome_logs // boolean: Do not use coloured log outputs nextflow_cli_args // array: List of positional nextflow CLI args @@ -54,16 +50,10 @@ workflow PIPELINE_INITIALISATION { // // Validate parameters and generate parameter summary to stdout // - pre_help_text = nfCoreLogo(monochrome_logs) - post_help_text = '\n' + workflowCitation() + '\n' + dashedLine(monochrome_logs) - def String workflow_command = "nextflow run ${workflow.manifest.name} -profile --input samplesheet.csv --outdir " - UTILS_NFVALIDATION_PLUGIN ( - help, - workflow_command, - pre_help_text, - post_help_text, + UTILS_NFSCHEMA_PLUGIN ( + workflow, validate_params, - "nextflow_schema.json" + null ) // @@ -72,6 +62,7 @@ workflow PIPELINE_INITIALISATION { UTILS_NFCORE_PIPELINE ( nextflow_cli_args ) + // // Custom validation for pipeline parameters // @@ -83,9 +74,9 @@ workflow PIPELINE_INITIALISATION { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW FOR PIPELINE COMPLETION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_COMPLETION { @@ -97,10 +88,8 @@ workflow PIPELINE_COMPLETION { outdir // path: Path to output directory where results will be published monochrome_logs // boolean: Disable ANSI colour codes in log output hook_url // string: hook URL for notifications - multiqc_report // string: Path to MultiQC report main: - summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") // @@ -108,21 +97,32 @@ workflow PIPELINE_COMPLETION { // workflow.onComplete { if (email || email_on_fail) { - completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs, multiqc_report.toList()) + completionEmail( + summary_params, + email, + email_on_fail, + plaintext_email, + outdir, + monochrome_logs, + [] + ) } completionSummary(monochrome_logs) - if (hook_url) { imNotification(summary_params, hook_url) } } + + workflow.onError { + log.error "Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting" + } } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Check and validate pipeline parameters @@ -138,7 +138,7 @@ def validateInputSamplesheet(input) { def (metas, fastqs) = input[1..2] // Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end - def endedness_ok = metas.collect{ it.single_end }.unique().size == 1 + def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1 if (!endedness_ok) { error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}") } @@ -170,7 +170,6 @@ def genomeExistsError() { error(error_string) } } - // // Generate methods description for MultiQC // @@ -180,8 +179,7 @@ def toolCitationText() { // Uncomment function in methodsDescriptionText to render in MultiQC report def citation_text = [ "Tools used in the workflow included:", - "FastQC (Andrews 2010),", - "MultiQC (Ewels et al. 2016)", + "." ].join(' ').trim() @@ -193,8 +191,7 @@ def toolBibliographyText() { // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "
  • Author (2023) Pub name, Journal, DOI
  • " : "", // Uncomment function in methodsDescriptionText to render in MultiQC report def reference_text = [ - "
  • Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
  • ", - "
  • Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354
  • " + ].join(' ').trim() return reference_text @@ -207,8 +204,18 @@ def methodsDescriptionText(mqc_methods_yaml) { meta["manifest_map"] = workflow.manifest.toMap() // Pipeline DOI - meta["doi_text"] = meta.manifest_map.doi ? "(doi: ${meta.manifest_map.doi})" : "" - meta["nodoi_text"] = meta.manifest_map.doi ? "": "
  • If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
  • " + if (meta.manifest_map.doi) { + // Using a loop to handle multiple DOIs + // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers + // Removing ` ` since the manifest.doi is a string and not a proper list + def temp_doi_ref = "" + def manifest_doi = meta.manifest_map.doi.tokenize(",") + manifest_doi.each { doi_ref -> + temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + } + meta["doi_text"] = temp_doi_ref.substring(0, temp_doi_ref.length() - 2) + } else meta["doi_text"] = "" + meta["nodoi_text"] = meta.manifest_map.doi ? "" : "
  • If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
  • " // Tool references meta["tool_citations"] = "" @@ -226,3 +233,4 @@ def methodsDescriptionText(mqc_methods_yaml) { return description_html.toString() } + diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf index ac31f28f..0fcbf7b3 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf @@ -2,18 +2,13 @@ // Subworkflow with functionality that may be useful for any Nextflow pipeline // -import org.yaml.snakeyaml.Yaml -import groovy.json.JsonOutput -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NEXTFLOW_PIPELINE { - take: print_version // boolean: print version dump_parameters // boolean: dump parameters @@ -26,7 +21,7 @@ workflow UTILS_NEXTFLOW_PIPELINE { // Print workflow version and exit on --version // if (print_version) { - log.info "${workflow.manifest.name} ${getWorkflowVersion()}" + log.info("${workflow.manifest.name} ${getWorkflowVersion()}") System.exit(0) } @@ -49,16 +44,16 @@ workflow UTILS_NEXTFLOW_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Generate version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -76,13 +71,13 @@ def getWorkflowVersion() { // Dump pipeline parameters to a JSON file // def dumpParametersToJSON(outdir) { - def timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') - def filename = "params_${timestamp}.json" - def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") - def jsonStr = JsonOutput.toJson(params) - temp_pf.text = JsonOutput.prettyPrint(jsonStr) + def timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss') + def filename = "params_${timestamp}.json" + def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") + def jsonStr = groovy.json.JsonOutput.toJson(params) + temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr) - FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") + nextflow.extension.FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") temp_pf.delete() } @@ -90,37 +85,40 @@ def dumpParametersToJSON(outdir) { // When running with -profile conda, warn if channels have not been set-up appropriately // def checkCondaChannels() { - Yaml parser = new Yaml() + def parser = new org.yaml.snakeyaml.Yaml() def channels = [] try { def config = parser.load("conda config --show channels".execute().text) channels = config.channels - } catch(NullPointerException | IOException e) { - log.warn "Could not verify conda channel configuration." - return + } + catch (NullPointerException e) { + log.warn("Could not verify conda channel configuration.") + return null + } + catch (IOException e) { + log.warn("Could not verify conda channel configuration.") + return null } // Check that all channels are present // This channel list is ordered by required channel priority. - def required_channels_in_order = ['conda-forge', 'bioconda', 'defaults'] + def required_channels_in_order = ['conda-forge', 'bioconda'] def channels_missing = ((required_channels_in_order as Set) - (channels as Set)) as Boolean // Check that they are in the right order - def channel_priority_violation = false - def n = required_channels_in_order.size() - for (int i = 0; i < n - 1; i++) { - channel_priority_violation |= !(channels.indexOf(required_channels_in_order[i]) < channels.indexOf(required_channels_in_order[i+1])) - } + def channel_priority_violation = required_channels_in_order != channels.findAll { ch -> ch in required_channels_in_order } if (channels_missing | channel_priority_violation) { - log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + - " There is a problem with your Conda configuration!\n\n" + - " You will need to set-up the conda-forge and bioconda channels correctly.\n" + - " Please refer to https://bioconda.github.io/\n" + - " The observed channel order is \n" + - " ${channels}\n" + - " but the following channel order is required:\n" + - " ${required_channels_in_order}\n" + - "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + log.warn """\ + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + There is a problem with your Conda configuration! + You will need to set-up the conda-forge and bioconda channels correctly. + Please refer to https://bioconda.github.io/ + The observed channel order is + ${channels} + but the following channel order is required: + ${required_channels_in_order} + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + """.stripIndent(true) } } diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config index d0a926bf..a09572e5 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config +++ b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config @@ -3,7 +3,7 @@ manifest { author = """nf-core""" homePage = 'https://127.0.0.1' description = """Dummy pipeline""" - nextflowVersion = '!>=23.04.0' + nextflowVersion = '!>=23.04.0' version = '9.9.9' doi = 'https://doi.org/10.5281/zenodo.5070524' } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf index 14558c39..5cb7bafe 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf @@ -2,17 +2,13 @@ // Subworkflow with utility functions specific to the nf-core pipeline template // -import org.yaml.snakeyaml.Yaml -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NFCORE_PIPELINE { - take: nextflow_cli_args @@ -25,23 +21,20 @@ workflow UTILS_NFCORE_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Warn if a -profile or Nextflow config has not been provided to run the pipeline // def checkConfigProvided() { - valid_config = true + def valid_config = true as Boolean if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { - log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + - "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + - " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + - " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + - " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + - "Please refer to the quick start section and usage docs for the pipeline.\n " + log.warn( + "[${workflow.manifest.name}] You are attempting to run the pipeline without any custom configuration!\n\n" + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + "Please refer to the quick start section and usage docs for the pipeline.\n " + ) valid_config = false } return valid_config @@ -52,12 +45,14 @@ def checkConfigProvided() { // def checkProfileProvided(nextflow_cli_args) { if (workflow.profile.endsWith(',')) { - error "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + error( + "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } if (nextflow_cli_args[0]) { - log.warn "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + log.warn( + "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } } @@ -66,25 +61,21 @@ def checkProfileProvided(nextflow_cli_args) { // def workflowCitation() { def temp_doi_ref = "" - String[] manifest_doi = workflow.manifest.doi.tokenize(",") - // Using a loop to handle multiple DOIs + def manifest_doi = workflow.manifest.doi.tokenize(",") + // Handling multiple DOIs // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers // Removing ` ` since the manifest.doi is a string and not a proper list - for (String doi_ref: manifest_doi) temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" - return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + - "* The pipeline\n" + - temp_doi_ref + "\n" + - "* The nf-core framework\n" + - " https://doi.org/10.1038/s41587-020-0439-x\n\n" + - "* Software dependencies\n" + - " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + manifest_doi.each { doi_ref -> + temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" + } + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + "* The pipeline\n" + temp_doi_ref + "\n" + "* The nf-core framework\n" + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + "* Software dependencies\n" + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" } // // Generate workflow version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -102,8 +93,8 @@ def getWorkflowVersion() { // Get software versions for pipeline // def processVersionsFromYAML(yaml_file) { - Yaml yaml = new Yaml() - versions = yaml.load(yaml_file).collectEntries { k, v -> [ k.tokenize(':')[-1], v ] } + def yaml = new org.yaml.snakeyaml.Yaml() + def versions = yaml.load(yaml_file).collectEntries { k, v -> [k.tokenize(':')[-1], v] } return yaml.dumpAsMap(versions).trim() } @@ -113,8 +104,8 @@ def processVersionsFromYAML(yaml_file) { def workflowVersionToYAML() { return """ Workflow: - $workflow.manifest.name: ${getWorkflowVersion()} - Nextflow: $workflow.nextflow.version + ${workflow.manifest.name}: ${getWorkflowVersion()} + Nextflow: ${workflow.nextflow.version} """.stripIndent().trim() } @@ -122,11 +113,7 @@ def workflowVersionToYAML() { // Get channel of software versions used in pipeline in YAML format // def softwareVersionsToYAML(ch_versions) { - return ch_versions - .unique() - .map { processVersionsFromYAML(it) } - .unique() - .mix(Channel.of(workflowVersionToYAML())) + return ch_versions.unique().map { version -> processVersionsFromYAML(version) }.unique().mix(Channel.of(workflowVersionToYAML())) } // @@ -134,25 +121,31 @@ def softwareVersionsToYAML(ch_versions) { // def paramsSummaryMultiqc(summary_params) { def summary_section = '' - for (group in summary_params.keySet()) { - def group_params = summary_params.get(group) // This gets the parameters of that particular group - if (group_params) { - summary_section += "

    $group

    \n" - summary_section += "
    \n" - for (param in group_params.keySet()) { - summary_section += "
    $param
    ${group_params.get(param) ?: 'N/A'}
    \n" + summary_params + .keySet() + .each { group -> + def group_params = summary_params.get(group) + // This gets the parameters of that particular group + if (group_params) { + summary_section += "

    ${group}

    \n" + summary_section += "
    \n" + group_params + .keySet() + .sort() + .each { param -> + summary_section += "
    ${param}
    ${group_params.get(param) ?: 'N/A'}
    \n" + } + summary_section += "
    \n" } - summary_section += "
    \n" } - } - String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" - yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" - yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" - yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" - yaml_file_text += "plot_type: 'html'\n" - yaml_file_text += "data: |\n" - yaml_file_text += "${summary_section}" + def yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" as String + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" return yaml_file_text } @@ -161,7 +154,7 @@ def paramsSummaryMultiqc(summary_params) { // nf-core logo // def nfCoreLogo(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map String.format( """\n ${dashedLine(monochrome_logs)} @@ -180,7 +173,7 @@ def nfCoreLogo(monochrome_logs=true) { // Return dashed line // def dashedLine(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map return "-${colors.dim}----------------------------------------------------${colors.reset}-" } @@ -188,7 +181,7 @@ def dashedLine(monochrome_logs=true) { // ANSII colours used for terminal logging // def logColours(monochrome_logs=true) { - Map colorcodes = [:] + def colorcodes = [:] as Map // Reset / Meta colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" @@ -200,54 +193,54 @@ def logColours(monochrome_logs=true) { colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" // Regular Colors - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" // Bold - colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" - colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" - colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" - colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" - colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" - colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" - colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" - colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" // Underline - colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" - colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" - colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" - colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" - colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" - colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" - colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" - colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" // High Intensity - colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" - colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" - colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" - colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" - colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" - colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" - colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" - colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" // Bold High Intensity - colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" - colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" - colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" - colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" - colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" - colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" - colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" return colorcodes } @@ -262,14 +255,15 @@ def attachMultiqcReport(multiqc_report) { mqc_report = multiqc_report.getVal() if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { if (mqc_report.size() > 1) { - log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + log.warn("[${workflow.manifest.name}] Found multiple reports from process 'MULTIQC', will use only one") } mqc_report = mqc_report[0] } } - } catch (all) { + } + catch (Exception all) { if (multiqc_report) { - log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + log.warn("[${workflow.manifest.name}] Could not attach MultiQC report to summary email") } } return mqc_report @@ -281,26 +275,35 @@ def attachMultiqcReport(multiqc_report) { def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs=true, multiqc_report=null) { // Set up the e-mail variables - def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + def subject = "[${workflow.manifest.name}] Successful: ${workflow.runName}" if (!workflow.success) { - subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + subject = "[${workflow.manifest.name}] FAILED: ${workflow.runName}" } def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] misc_fields['Date Started'] = workflow.start misc_fields['Date Completed'] = workflow.complete misc_fields['Pipeline script file path'] = workflow.scriptFile misc_fields['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision - misc_fields['Nextflow Version'] = workflow.nextflow.version - misc_fields['Nextflow Build'] = workflow.nextflow.build + if (workflow.repository) { + misc_fields['Pipeline repository Git URL'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['Pipeline repository Git Commit'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['Pipeline Git branch/tag'] = workflow.revision + } + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp def email_fields = [:] @@ -338,39 +341,41 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Render the sendmail template def max_multiqc_email_size = (params.containsKey('max_multiqc_email_size') ? params.max_multiqc_email_size : 0) as nextflow.util.MemoryUnit - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def smail_fields = [email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes()] def sf = new File("${workflow.projectDir}/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() // Send the HTML e-mail - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (email_address) { try { - if (plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + if (plaintext_email) { +new org.codehaus.groovy.GroovyException('Send plaintext e-mail, not HTML') } // Try to send HTML e-mail using sendmail def sendmail_tf = new File(workflow.launchDir.toString(), ".sendmail_tmp.html") sendmail_tf.withWriter { w -> w << sendmail_html } - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" - } catch (all) { + ['sendmail', '-t'].execute() << sendmail_html + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (sendmail)-") + } + catch (Exception all) { // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + def mail_cmd = ['mail', '-s', subject, '--content-type=text/html', email_address] mail_cmd.execute() << email_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (mail)-") } } // Write summary e-mail HTML to a file def output_hf = new File(workflow.launchDir.toString(), ".pipeline_report.html") output_hf.withWriter { w -> w << email_html } - FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html"); + nextflow.extension.FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html") output_hf.delete() // Write summary e-mail TXT to a file def output_tf = new File(workflow.launchDir.toString(), ".pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt"); + nextflow.extension.FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt") output_tf.delete() } @@ -378,15 +383,17 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Print pipeline summary on completion // def completionSummary(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (workflow.success) { if (workflow.stats.ignoredCount == 0) { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Pipeline completed successfully${colors.reset}-") + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-") } - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.red} Pipeline completed with errors${colors.reset}-") } } @@ -395,21 +402,30 @@ def completionSummary(monochrome_logs=true) { // def imNotification(summary_params, hook_url) { def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] - misc_fields['start'] = workflow.start - misc_fields['complete'] = workflow.complete - misc_fields['scriptfile'] = workflow.scriptFile - misc_fields['scriptid'] = workflow.scriptId - if (workflow.repository) misc_fields['repository'] = workflow.repository - if (workflow.commitId) misc_fields['commitid'] = workflow.commitId - if (workflow.revision) misc_fields['revision'] = workflow.revision - misc_fields['nxf_version'] = workflow.nextflow.version - misc_fields['nxf_build'] = workflow.nextflow.build - misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp + misc_fields['start'] = workflow.start + misc_fields['complete'] = workflow.complete + misc_fields['scriptfile'] = workflow.scriptFile + misc_fields['scriptid'] = workflow.scriptId + if (workflow.repository) { + misc_fields['repository'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['commitid'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['revision'] = workflow.revision + } + misc_fields['nxf_version'] = workflow.nextflow.version + misc_fields['nxf_build'] = workflow.nextflow.build + misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp def msg_fields = [:] msg_fields['version'] = getWorkflowVersion() @@ -434,13 +450,13 @@ def imNotification(summary_params, hook_url) { def json_message = json_template.toString() // POST - def post = new URL(hook_url).openConnection(); + def post = new URL(hook_url).openConnection() post.setRequestMethod("POST") post.setDoOutput(true) post.setRequestProperty("Content-Type", "application/json") - post.getOutputStream().write(json_message.getBytes("UTF-8")); - def postRC = post.getResponseCode(); - if (! postRC.equals(200)) { - log.warn(post.getErrorStream().getText()); + post.getOutputStream().write(json_message.getBytes("UTF-8")) + def postRC = post.getResponseCode() + if (!postRC.equals(200)) { + log.warn(post.getErrorStream().getText()) } } diff --git a/subworkflows/nf-core/utils_nfschema_plugin/main.nf b/subworkflows/nf-core/utils_nfschema_plugin/main.nf new file mode 100644 index 00000000..4994303e --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/main.nf @@ -0,0 +1,46 @@ +// +// Subworkflow that uses the nf-schema plugin to validate parameters and render the parameter summary +// + +include { paramsSummaryLog } from 'plugin/nf-schema' +include { validateParameters } from 'plugin/nf-schema' + +workflow UTILS_NFSCHEMA_PLUGIN { + + take: + input_workflow // workflow: the workflow object used by nf-schema to get metadata from the workflow + validate_params // boolean: validate the parameters + parameters_schema // string: path to the parameters JSON schema. + // this has to be the same as the schema given to `validation.parametersSchema` + // when this input is empty it will automatically use the configured schema or + // "${projectDir}/nextflow_schema.json" as default. This input should not be empty + // for meta pipelines + + main: + + // + // Print parameter summary to stdout. This will display the parameters + // that differ from the default given in the JSON schema + // + if(parameters_schema) { + log.info paramsSummaryLog(input_workflow, parameters_schema:parameters_schema) + } else { + log.info paramsSummaryLog(input_workflow) + } + + // + // Validate the parameters using nextflow_schema.json or the schema + // given via the validation.parametersSchema configuration option + // + if(validate_params) { + if(parameters_schema) { + validateParameters(parameters_schema:parameters_schema) + } else { + validateParameters() + } + } + + emit: + dummy_emit = true +} + diff --git a/subworkflows/nf-core/utils_nfschema_plugin/meta.yml b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml new file mode 100644 index 00000000..f7d9f028 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml @@ -0,0 +1,35 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "utils_nfschema_plugin" +description: Run nf-schema to validate parameters and create a summary of changed parameters +keywords: + - validation + - JSON schema + - plugin + - parameters + - summary +components: [] +input: + - input_workflow: + type: object + description: | + The workflow object of the used pipeline. + This object contains meta data used to create the params summary log + - validate_params: + type: boolean + description: Validate the parameters and error if invalid. + - parameters_schema: + type: string + description: | + Path to the parameters JSON schema. + This has to be the same as the schema given to the `validation.parametersSchema` config + option. When this input is empty it will automatically use the configured schema or + "${projectDir}/nextflow_schema.json" as default. The schema should not be given in this way + for meta pipelines. +output: + - dummy_emit: + type: boolean + description: Dummy emit to make nf-core subworkflows lint happy +authors: + - "@nvnieuwk" +maintainers: + - "@nvnieuwk" diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test new file mode 100644 index 00000000..842dc432 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test @@ -0,0 +1,117 @@ +nextflow_workflow { + + name "Test Subworkflow UTILS_NFSCHEMA_PLUGIN" + script "../main.nf" + workflow "UTILS_NFSCHEMA_PLUGIN" + + tag "subworkflows" + tag "subworkflows_nfcore" + tag "subworkflows/utils_nfschema_plugin" + tag "plugin/nf-schema" + + config "./nextflow.config" + + test("Should run nothing") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } + + test("Should run nothing - custom schema") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params - custom schema") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } +} diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config new file mode 100644 index 00000000..0907ac58 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config @@ -0,0 +1,8 @@ +plugins { + id "nf-schema@2.1.0" +} + +validation { + parametersSchema = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + monochromeLogs = true +} \ No newline at end of file diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json similarity index 95% rename from subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json rename to subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json index 7626c1c9..331e0d2f 100644 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/./master/nextflow_schema.json", "title": ". pipeline parameters", "description": "", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -87,10 +87,10 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf b/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf deleted file mode 100644 index 2585b65d..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +++ /dev/null @@ -1,62 +0,0 @@ -// -// Subworkflow that uses the nf-validation plugin to render help text and parameter summary -// - -/* -======================================================================================== - IMPORT NF-VALIDATION PLUGIN -======================================================================================== -*/ - -include { paramsHelp } from 'plugin/nf-validation' -include { paramsSummaryLog } from 'plugin/nf-validation' -include { validateParameters } from 'plugin/nf-validation' - -/* -======================================================================================== - SUBWORKFLOW DEFINITION -======================================================================================== -*/ - -workflow UTILS_NFVALIDATION_PLUGIN { - - take: - print_help // boolean: print help - workflow_command // string: default commmand used to run pipeline - pre_help_text // string: string to be printed before help text and summary log - post_help_text // string: string to be printed after help text and summary log - validate_params // boolean: validate parameters - schema_filename // path: JSON schema file, null to use default value - - main: - - log.debug "Using schema file: ${schema_filename}" - - // Default values for strings - pre_help_text = pre_help_text ?: '' - post_help_text = post_help_text ?: '' - workflow_command = workflow_command ?: '' - - // - // Print help message if needed - // - if (print_help) { - log.info pre_help_text + paramsHelp(workflow_command, parameters_schema: schema_filename) + post_help_text - System.exit(0) - } - - // - // Print parameter summary to stdout - // - log.info pre_help_text + paramsSummaryLog(workflow, parameters_schema: schema_filename) + post_help_text - - // - // Validate parameters relative to the parameter JSON schema - // - if (validate_params){ - validateParameters(parameters_schema: schema_filename) - } - - emit: - dummy_emit = true -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml deleted file mode 100644 index 3d4a6b04..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +++ /dev/null @@ -1,44 +0,0 @@ -# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFVALIDATION_PLUGIN" -description: Use nf-validation to initiate and validate a pipeline -keywords: - - utility - - pipeline - - initialise - - validation -components: [] -input: - - print_help: - type: boolean - description: | - Print help message and exit - - workflow_command: - type: string - description: | - The command to run the workflow e.g. "nextflow run main.nf" - - pre_help_text: - type: string - description: | - Text to print before the help message - - post_help_text: - type: string - description: | - Text to print after the help message - - validate_params: - type: boolean - description: | - Validate the parameters and error if invalid. - - schema_filename: - type: string - description: | - The filename of the schema to validate against. -output: - - dummy_emit: - type: boolean - description: | - Dummy emit to make nf-core subworkflows lint happy -authors: - - "@adamrtalbot" -maintainers: - - "@adamrtalbot" - - "@maxulysse" diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test deleted file mode 100644 index 5784a33f..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +++ /dev/null @@ -1,200 +0,0 @@ -nextflow_workflow { - - name "Test Workflow UTILS_NFVALIDATION_PLUGIN" - script "../main.nf" - workflow "UTILS_NFVALIDATION_PLUGIN" - tag "subworkflows" - tag "subworkflows_nfcore" - tag "plugin/nf-validation" - tag "'plugin/nf-validation'" - tag "utils_nfvalidation_plugin" - tag "subworkflows/utils_nfvalidation_plugin" - - test("Should run nothing") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success } - ) - } - } - - test("Should run help") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with command") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with extra text") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = "pre-help-text" - post_help_text = "post-help-text" - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('pre-help-text') } }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } }, - { assert workflow.stdout.any { it.contains('post-help-text') } } - ) - } - } - - test("Should validate params") { - - when { - - params { - monochrome_logs = true - test_data = '' - outdir = 1 - } - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = true - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.failed }, - { assert workflow.stdout.any { it.contains('ERROR ~ ERROR: Validation of pipeline parameters failed!') } } - ) - } - } -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml deleted file mode 100644 index 60b1cfff..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -subworkflows/utils_nfvalidation_plugin: - - subworkflows/nf-core/utils_nfvalidation_plugin/** diff --git a/workflows/differentialabundance.nf b/workflows/differentialabundance.nf index 020c0747..365ba850 100644 --- a/workflows/differentialabundance.nf +++ b/workflows/differentialabundance.nf @@ -43,10 +43,10 @@ if (params.study_type == 'affy_array'){ error("Query GSE not specified or features metadata columns not specified") } } else { - // If this is not microarray data or maxquant output, and this an RNA-seq dataset, + // If this is not microarray data or maxquant output, and this an RNA-seq dataset or experimental analysis, // then assume we're reading from a matrix - if (params.study_type == "rnaseq" && params.matrix) { + if (params.study_type in ["rnaseq", "experimental"] && params.matrix) { matrix_file = file(params.matrix, checkIfExists: true) ch_in_raw = Channel.of([ exp_meta, matrix_file]) } else { @@ -62,6 +62,7 @@ if (params.control_features) { ch_control_features = Channel.of([ exp_meta, file def run_gene_set_analysis = params.gsea_run || params.gprofiler2_run if (run_gene_set_analysis) { + ch_gene_sets = Channel.of([]) // For methods that can run without gene sets if (params.gene_sets_files) { gene_sets_files = params.gene_sets_files.split(",") ch_gene_sets = Channel.of(gene_sets_files).map { file(it, checkIfExists: true) } @@ -74,8 +75,6 @@ if (run_gene_set_analysis) { if (!params.gprofiler2_token && !params.gprofiler2_organism) { error("To run gprofiler2, please provide a run token, GMT file or organism!") } - } else { - ch_gene_sets = [] // For methods that can run without gene sets } } @@ -98,7 +97,9 @@ citations_file = file(params.citations_file, checkIfExists: true) */ include { TABULAR_TO_GSEA_CHIP } from '../modules/local/tabular_to_gsea_chip' -include { FILTER_DIFFTABLE } from '../modules/local/filter_difftable' +include { FILTER_DIFFTABLE } from '../modules/local/filter_difftable' +include { EXPERIMENTAL } from '../subworkflows/local/experimental/main.nf' + /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -132,6 +133,8 @@ include { GEOQUERY_GETGEO } from '../modules/n include { ZIP as MAKE_REPORT_BUNDLE } from '../modules/nf-core/zip/main' include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' +include { samplesheetToList } from 'plugin/nf-schema' + /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RUN MAIN WORKFLOW @@ -140,7 +143,8 @@ include { softwareVersionsToYAML } from '../subworkfl workflow DIFFERENTIALABUNDANCE { - // Set up some basic variables + main: + ch_versions = Channel.empty() // Channel for the contrasts file ch_contrasts_file = Channel.from([[exp_meta, file(params.contrasts)]]) @@ -254,7 +258,7 @@ workflow DIFFERENTIALABUNDANCE { // Otherwise we can just use the matrix input; save it to the workdir so that it does not // just appear wherever the user runs the pipeline - matrix_as_anno_filename = "${workflow.workDir}/matrix_as_anno.${matrix_file.getExtension()}" + matrix_as_anno_filename = "${workflow.workDir}/${matrix_file.getBaseName()}_as_anno.${matrix_file.getExtension()}" if (params.study_type == 'maxquant'){ ch_features_matrix = ch_in_norm } else { @@ -262,7 +266,8 @@ workflow DIFFERENTIALABUNDANCE { } ch_features = ch_features_matrix .map{ meta, matrix -> - matrix.copyTo(matrix_as_anno_filename) + matrix_copy = file(matrix_as_anno_filename) + matrix_copy.exists() && matrix.getText().md5().equals(matrix_copy.getText().md5()) ?: matrix.copyTo(matrix_as_anno_filename) [ meta, file(matrix_as_anno_filename) ] } } @@ -303,7 +308,7 @@ workflow DIFFERENTIALABUNDANCE { ch_norm = VALIDATOR.out.assays } - if(params.study_type != 'rnaseq') { + if(params.study_type !in ['rnaseq', 'experimental']) { ch_matrix_for_differential = ch_norm } else{ @@ -354,8 +359,45 @@ workflow DIFFERENTIALABUNDANCE { ch_processed_matrices = ch_norm .map{ it.tail() } .first() - } - else{ + } else if (params.study_type == 'experimental') { + + // Convert the toolsheet.csv in a channel with the proper format + ch_tools = Channel.fromList(samplesheetToList(params.tools, './assets/schema_tools.json')) + .map { + it -> + def pathway_name = it[0].subMap(["pathway_name"]) + def differential_map = it[0].subMap(["diff_method","args_diff"]) + def correlation_map = it[0].subMap(["cor_method","args_cor"]) + def enrichment_map = it[0].subMap(["enr_method","args_enr"]) + [ pathway_name, differential_map, correlation_map, enrichment_map ] + }.unique() + + // Filter the tools to the pathway(s) of interest, or run everything if requested + if (params.pathway == "all") { + ch_tools + .set{ ch_tools } + } else { + ch_tools + .filter{ + it[0]["pathway_name"] in params.pathway.tokenize(',') + } + .set{ ch_tools } + } + + EXPERIMENTAL( + ch_contrasts, + VALIDATOR.out.sample_meta, + CUSTOM_MATRIXFILTER.out.filtered, + ch_tools + ) + + // TODO for the moment, these channels are allocated to not breaking the next part. + // they have to be properly handled afterwards + ch_norm = Channel.empty() + ch_differential = Channel.empty() + ch_processed_matrices = Channel.empty() + ch_model = Channel.empty() + } else { DESEQ2_NORM ( ch_contrasts.first(), @@ -483,7 +525,6 @@ workflow DIFFERENTIALABUNDANCE { } // For gprofiler2, token and organism have priority and will override a gene_sets file - GPROFILER2_GOST( ch_filtered_diff, ch_gene_sets.first(), @@ -585,9 +626,9 @@ workflow DIFFERENTIALABUNDANCE { // Make a new contrasts file from the differential metas to guarantee the // same order as the differential results - ch_app_differential = ch_differential.first().map{it[0].keySet().join(',')} + ch_app_differential = ch_differential.first().map{it[0].keySet().tail().join(',')} .concat( - ch_differential.map{it[0].values().join(',')} + ch_differential.map{it[0].values().tail().join(',')} ) .collectFile(name: 'contrasts.csv', newLine: true, sort: false) .map{