[Do not merge!] Pseudo PR for first release #8

mashehu · 2024-08-05T06:46:39Z

Do not merge! This is a PR of dev compared to first release for whole-pipeline reviewing purposes. Changes should be made to dev and this PR should not be merged into first-commit-for-pseudo-pr!

Co-authored-by: Fabian Lehmann <[email protected]> Co-authored-by: David Frantz <[email protected]>

…core labels

(e.g. nf-core github actions)

Important! Template update for nf-core/tools v2.14.1

github-actions · 2024-08-05T06:48:00Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit caee766

+| ✅ 202 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

nextflow_config - Config manifest.version should end in dev: 1.0.0
readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

files_exist - File is ignored: conf/igenomes.config
files_exist - File is ignored: conf/igenomes_ignored.config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-rangeland_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-rangeland_logo_light.png
files_exist - File found: docs/images/nf-core-rangeland_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-rangeland_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowRangeland.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.sensors_level2= LND04 LND05 LND07
nextflow_config - Config default value correct: params.start_date= 1984-01-01
nextflow_config - Config default value correct: params.end_date= 2006-12-31
nextflow_config - Config default value correct: params.resolution= 30
nextflow_config - Config default value correct: params.indexes= NDVI BLUE GREEN RED NIR SWIR1 SWIR2
nextflow_config - Config default value correct: params.mosaic_visualization= true
nextflow_config - Config default value correct: params.pyramid_visualization= true
nextflow_config - Config default value correct: params.group_size= 100
nextflow_config - Config default value correct: params.publish_dir_enabled= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-rangeland_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rangeland_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rangeland_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'image/tiff, application/x-tar'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - FORCE_GENERATE_ANALYSIS_MASK found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_GENERATE_TILE_ALLOW_LIST found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_PREPROCESS found in conf/modules.config and Nextflow scripts.
modules_config - HIGHER_LEVEL_CONFIG found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_HIGHER_LEVEL found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_PYRAMID found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_MOSAIC found in conf/modules.config and Nextflow scripts.
modules_config - CHECK_RESULTS found in conf/modules.config and Nextflow scripts.
modules_config - CHECK_RESULTS_FULL found in conf/modules.config and Nextflow scripts.
modules_config - PREPROCESS_CONFIG found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - UNTAR_ found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.2

Run details

nf-core/tools version 3.0.2
Run at 2024-11-26 15:12:05

mashehu

Very nice work! Almost there 🤏🏻

I think an option to resolve symlinks and copy the actual files would be good for reproducibility
I have a feeling the modules could rely more on the strengths of nextflow, e.g. many have for loops over files, these should be seperate nextflow jobs imo.

mashehu · 2024-08-05T06:50:02Z

README.md

 ```bash
-nextflow run nf-core/rangeland \
+nextflow run nf-core/rangeland/main.nf \


Suggested change

nextflow run nf-core/rangeland/main.nf \

nextflow run nf-core/rangeland \

mashehu · 2024-08-05T06:52:13Z

bin/merge_boa.r

+
+args = commandArgs(trailingOnly=TRUE)
+
+
+if (length(args) < 3) {
+    stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
+}
+
+fout <- args[1]
+finp <- args[2:length(args)]
+nf <- length(finp)
+
+require(raster)


Suggested change

args = commandArgs(trailingOnly=TRUE)

if (length(args) < 3) {

stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)

}

fout <- args[1]

finp <- args[2:length(args)]

nf <- length(finp)

require(raster)

require(raster)

args = commandArgs(trailingOnly=TRUE)

if (length(args) < 3) {

stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)

}

fout <- args[1]

finp <- args[2:length(args)]

nf <- length(finp)

at least in genomics it is standard the load the libraries at the beginning of an R script.

mashehu · 2024-08-05T06:53:32Z

bin/merge_boa.r

+for (i in 1:nf){
+
+    data <- brick(finp[i])[]
+
+    num <- num + !is.na(data)
+
+    data[is.na(data)] <- 0
+    sum <- sum + data
+
+}


how large is nf here usually? for larger nf try to use apply instead of a for-loop to improve the performance

This highly depends on the type, size and overlap of the pipeline's input data. It may become >100 for some extreme cases, but for our currently used data its usually between 5 and 20. The merge scripts are mostly untouched from the previous (not nf-core) installation of this pipeline. I will rework them and also include the other changes you suggested.

mashehu · 2024-08-05T06:53:58Z

bin/merge_qai.r

+
+args = commandArgs(trailingOnly=TRUE)
+
+
+if (length(args) < 3) {
+    stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
+}
+
+fout <- args[1]
+finp <- args[2:length(args)]
+nf <- length(finp)
+
+require(raster)


Suggested change

args = commandArgs(trailingOnly=TRUE)

if (length(args) < 3) {

stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)

}

fout <- args[1]

finp <- args[2:length(args)]

nf <- length(finp)

require(raster)

require(raster)

args = commandArgs(trailingOnly=TRUE)

if (length(args) < 3) {

stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)

}

fout <- args[1]

finp <- args[2:length(args)]

nf <- length(finp)

mashehu · 2024-08-05T06:54:24Z

bin/merge_boa.r

@@ -0,0 +1,44 @@
+#!/usr/bin/env Rscript
+


maybe add a short comment block in the beginning of your custom scripts explaining what they do.

mashehu · 2024-08-05T08:37:05Z

docs/usage.md

+--resolution '[integer]'
+```
+
+The default value is 30, as most Landsat satellite natively provide this resolution.


Suggested change

The default value is 30, as most Landsat satellite natively provide this resolution.

The default value is `30`, as most Landsat satellite natively provide this resolution.

mashehu · 2024-08-05T08:38:33Z

docs/usage.md

+--end_date   '[YYYY-MM-DD]'
+```
+
+Default values are `'1984-01-01'` for the start date and `'2006-12-31'` for the end date.


Suggested change

Default values are `'1984-01-01'` for the start date and `'2006-12-31'` for the end date.

we show the default values on the parameters page, so easier to keep the docs in sync, by only having them in one place (if no further explanation of the choice of default values is given)

mashehu · 2024-08-05T08:38:54Z

docs/usage.md

+
+### Group size
+
+The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to 100.


Suggested change

The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to 100.

The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to `100`.

mashehu · 2024-08-05T08:40:55Z

docs/usage.md

+
+### Visualization
+
+The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results (this is the default behavior, make sure to not disable mosaic when using test profiles) .


Suggested change

The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results (this is the default behavior, make sure to not disable mosaic when using test profiles) .

The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results.

mashehu · 2024-08-05T08:41:32Z

docs/usage.md

+pyramid_visualization = '[boolean]'
+```
+
+### FORCE configuration


this section can be removed is task.cpus is used instead

nictru

In addition to what @mashehu already said:

Adding FORCE to bioconda would not only allow for more versatile environment definitions in the pipeline, but would also allow users to install your tool without having to compile it. If you need assistance with that, feel free to reach out to me or the #bioconda channel on slack.
The pipeline encodes the information that we usually handle via the meta map as directory and file names. This works, but it is less extendable and harder to debug than the meta map, which allows storing an arbitrary number of named meta fields.

But looks already pretty good!

nictru · 2024-08-05T09:32:45Z

modules/local/force-generate_analysis_mask.nf

+
+    label 'process_single'
+
+    container "docker.io/davidfrantz/force:3.7.10"


Would it be possible for you to add FORCE to bioconda? This is easier than one would think, I added a module with a similar installation process recently via this PR. This way, we could have all installation modalities (conda, singularity, docker) easily available (as bioconda packages are automatically added to biocontainers)

FORCE is not bioinformatics, so is out of scope of bioconda. We are relaxing this requirement for now for non biology pipelines.

nictru · 2024-08-05T09:41:50Z

nextflow.config

-        apptainer.enabled       = false
-        docker.runOptions       = '-u $(id -u):$(id -g)'
+        docker.enabled         = true
+        docker.userEmulation   = true


docker.userEmulation is not supported any more in the latest versions of nextflow - I think it is already not supported in 23.04.0, which is the oldest nextflow version that this pipeline is supposed to run on

nictru · 2024-08-05T09:44:50Z

modules/local/check_results.nf

@@ -0,0 +1,41 @@
+nextflow.enable.dsl = 2
+
+process CHECK_RESULTS {


This process misses a tag

nictru · 2024-08-05T09:58:04Z

modules/local/check_results.nf

+
+    label 'process_low'
+
+    container 'docker.io/rocker/geospatial:4.3.1'


As far as I can see, the only package used from the geospatial image is terra. The corresponding R package is already on conda-forge, so I guess adding it to Bioconda will be redundant. But we can create images using seqera containers, which gave the following:

Docker: community.wave.seqera.io/library/r-terra:1.7-71--57cecb7a052577e0

Singularity: oras://community.wave.seqera.io/library/r-terra:1.7-71--bbada5308a9d09c7

nictru · 2024-08-05T10:09:29Z

modules/local/force-generate_tile_allow_list.nf

+
+    script:
+    """
+    force-tile-extent $aoi tmp/ tile_allow.txt


It will be created by path 'tmp/datacube-definition.prj' (line 11)

nictru · 2024-08-05T10:17:30Z

subworkflows/local/preprocessing.nf

+        ch_versions = ch_versions.mix(FORCE_PREPROCESS.out.versions.first())
+
+        //Group by tile, date and sensor
+        boa_tiles = FORCE_PREPROCESS.out.boa_tiles.flatten().map{ [ "${extractDirectory(it)}_${it.simpleName}", it ] }.groupTuple()


In n-core we usually don't encode information as directory/file names, but instead use a meta map

I'm aware of that. The reasons we decided to maintain the name-encoded information is that it is the common approach in remote sensing and somewhat expected by FORCE. I will look into switching to meta maps.

I think it's fine to encode it in file names if it's common/the standard in the field - I would say it's not a blocker for this release.

But if that's the case, I think it would be important to add validation checks to ensure that the file name structure is exactly as expected for the pipeline.

But of course it wouldn't hurt to copy information like that into a meta.map to accompany the files through the pipeline.

…essing results, some minor adjustments in the docs.

jfy133

Overall really good! You've done a great job at sticking with nf-core structure/guidelines despite the different fields!

A few things I also noticed:

Try to stiick to nf-core guidelines for things such as modules structure, even when they are local
I would highly recommend adding more validation checks to your inlput nextflow schema:
- patterns: to the nextflow_schema to get better user validation (E.g., file suffix checks; or strings with delimiters - regex is your friend)
- exists for all required files
Missing a CHANGELOG update, even if it just says 'first release'
For the modules with loops inside, I strongly recommend as @mashehu pointed out, to consider parallelising these where you can using Nextflow (or at least with bash), otherwise it's not maximising the benefits of the language

P.S. I vaguely remember me commenting about removing MultiQC from somewhere, please ignore it- I just remembered we need it for software version reporting :)

jfy133 · 2024-08-13T06:10:53Z

README.md

+1. Read satellite imagery, digital elevation model, endmember definition, water vapor database and area of interest definition
+2. Generate allow list and analysis mask to determine which pixels from the satellite data can be used
+3. Preprocess data to obtain atmospherically corrected images alongside quality assurance information
+4. Classify pixels by applying linear spectral unmixing
+5. Time series analyses to obtain trends in vegetation dynamics
+6. Create mosaic and pyramid visualizations of the results


Not a requirement, but a diagram would be nice here :) (also helps non-expert reviewers to follow what are meant to be assessing :)

jfy133 · 2024-08-13T06:15:49Z

bin/merge_boa.r

@@ -0,0 +1,44 @@
+#!/usr/bin/env Rscript


Please add a license and author to all of these scripts.

If you have no preference, just put MIT as the license and point to the pipeline's repo for the file itself.

Example (although style ripped off from @edmundmiller): https://github.com/nf-core/funcscan/blob/2a5c32a4dd72ed18b53a797af7bd8e11694af9e1/bin/ampcombi_download.py#L3-L10

simpler example: https://github.com/nf-core/mag/blob/master/bin/combine_tables.py#L3-L4

jfy133 · 2024-08-13T06:18:18Z

conf/modules.config

+        errorStrategy = 'retry'
+        maxRetries    = 5


Generally stuff like process execution information goes in base.conf @mashehu (as the other reviewer), what do you think here?

The reason why I say this is modules.conf can be more easily overwritten due to config loading order (and this is OK because file naming/locations are more often customisable by a user), whereas stuff like retrying or maxRetry defaults you probably want secure as the 'fall back' behaviour

jfy133 · 2024-08-13T06:21:22Z

conf/modules.config

+        publishDir  = [
+            [
+                path:    { "${params.outdir}/trend/pyramid/" },
+                saveAs:  { "${it.substring(12,it.indexOf("."))}/trend/${it.substring(0,11)}/$it" },


This seems like it has the potential to be a bit brittle (coming from bioinformatics where filenaming can be a wildwest). What is the full it string? Is the output file name hardcoded or has the potential to be dynamic?

The file names are standardized by FORCE up to the exact digit positions (e.g. https://force-eo.readthedocs.io/en/latest/components/lower-level/level2/format.html#naming-convention). The name won't dynamically change. This is another example of path/name-encoded information in earth observation.

jfy133 · 2024-08-13T06:24:39Z

conf/test.config

+    sensors_level1 = 'LT04,LT05'
+    sensors_level2 = 'LND04 LND05'


Is it correct these have a different delimiter?

Thanks for mentioning that. We actually don't need the first parameter any more. That's a remnant of a prior version of the workflow where sensors_level1 was used in a cli command to download some input data, hence the different delimiter. I will remove the first parameter.

jfy133 · 2024-08-13T07:23:03Z

subworkflows/local/preprocessing.nf

+                                .toSortedList{ a,b -> a[1][0].simpleName <=> b[1][0].simpleName }
+                                .flatMap{it}
+                                .groupTuple( remainder : true, size : params.group_size ).map{ [ it[0], it[1] .flatten() ] }
+        qai_tiles_to_merge = qai_tiles.filter{ x -> x[1].size() > 1 }


Suggested change

qai_tiles_to_merge = qai_tiles.filter{ x -> x[1].size() > 1 }

qai_tiles_to_merge = qai_tiles.filter{ x -> x[1].size() > 1 }

jfy133 · 2024-08-13T07:28:45Z

workflows/rangeland.nf

+*/
+
+
+// check wether provided input is within provided time range


Suggested change

// check wether provided input is within provided time range

// check whether provided input is within provided time range

jfy133 · 2024-08-13T07:29:52Z

workflows/rangeland.nf

+    cube_file      = file( "$params.data_cube" )
+    aoi_file       = file( "$params.aoi" )
+    endmember_file = file( "$params.endmember" )


Suggested change

cube_file = file( "$params.data_cube" )

aoi_file = file( "$params.aoi" )

endmember_file = file( "$params.endmember" )

cube_file = file( params.data_cube )

aoi_file = file( params.aoi )

endmember_file = file( params.endmember )

jfy133 · 2024-08-13T07:33:52Z

workflows/rangeland.nf

+        data = base_path.map(it -> file("$it/*/*", type: 'dir')).flatten()
+        data = data.flatten().filter{ inRegion(it) }


Is flatten necessary on both lines?

No, I'll remove the redundancy.

jfy133 · 2024-08-13T07:38:24Z

workflows/rangeland.nf

+    if (params.config_profile_name == 'Test profile') {
+        woody_change_ref      = file("$params.woody_change_ref")
+        woody_yoc_ref         = file("$params.woody_yoc_ref")
+        herbaceous_change_ref = file("$params.herbaceous_change_ref")
+        herbaceous_yoc_ref    = file("$params.herbaceous_yoc_ref")
+        peak_change_ref       = file("$params.peak_change_ref")
+        peak_yoc_ref          = file("$params.peak_yoc_ref")
+
+        CHECK_RESULTS(grouped_trend_data, woody_change_ref, woody_yoc_ref, herbaceous_change_ref, herbaceous_yoc_ref, peak_change_ref, peak_yoc_ref)
+        ch_versions = ch_versions.mix(CHECK_RESULTS.out.versions)
+    }
+
+    if (params.config_profile_name == 'Full test profile') {
+        UNTAR_REF([[:], params.reference])
+        ref_path = UNTAR_REF.out.untar.map(it -> it[1])
+        tar_versions.mix(UNTAR_REF.out.versions)
+
+        CHECK_RESULTS_FULL(grouped_trend_data, ref_path)
+        ch_versions = ch_versions.mix(CHECK_RESULTS_FULL.out.versions)
+    }


You should not embed test specific code within the pipeline itself, (it's not particularly realistic), for this you should add nf-test to the pipeline and use that for a more structured/standardised approach.

A few pipelines now have this (ampliseq, rnaseq, etc.) but if you need pointers let me know.

…iles

Important! Template update for nf-core/tools v3.0.2

Co-authored-by: Matthias Hörtenhuber <[email protected]>

Adressing review suggestions for Version 1.0.0

Felix-Kummer and others added 20 commits October 20, 2023 15:54

Added all workflow files and logic

4c9da5b

Co-authored-by: Fabian Lehmann <[email protected]> Co-authored-by: David Frantz <[email protected]>

Finalized output structure and documentation

bd1195c

Made local modules emit versions

eae59d4

Fixed mosaic output.

e2756fe

Fixed nf-core lint warnings

d0e747c

Moved to a more stable version of FORCE

30d36cb

Fixed publishDir for visualization processes

9beb427

Added version number to nf-validation plugin

9dc069b

Removed unnecessary resource requests in modules.conf in favor of nf-…

841399d

…core labels

fixed docker test failures in certain situations

8c1c9d9

(e.g. nf-core github actions)

Ensured that remaining tests run after a single test failure

6fb5757

Merge branch 'dev' into nf-core-template-merge-2.14.1

edf0b33

Fixed indents in merge scripts

9b17ae7

Merge pull request #6 from nf-core/nf-core-template-merge-2.14.1

4ff325a

Important! Template update for nf-core/tools v2.14.1

Added appropriate raster comparisons for trend tif's to test.R

665e862

Increased robustness of test profile against FORCE-related flakiness

b4e8152

Removed only_tile parameter

f292928

Added new options for higher level processing and visualization.

52c93c9

Added test_full profile

a4d297f

bumped version to 1.0.0

b896dda

mashehu commented Aug 5, 2024

View reviewed changes

mashehu mentioned this pull request Aug 5, 2024

Release 1.0.0 #7

Open

8 tasks

nictru reviewed Aug 5, 2024

View reviewed changes

Felix-Kummer added 3 commits August 7, 2024 15:11

Implemented minor reviewer requests

443ba0a

Added parameters to control publishing behavior for intermediate proc…

3dca391

…essing results, some minor adjustments in the docs.

Added descriptions to custom scripts.

a5df347

jfy133 requested changes Aug 13, 2024

View reviewed changes

Felix-Kummer mentioned this pull request Aug 13, 2024

Adressing review suggestions for Version 1.0.0 #9

Merged

Applied more minor changes requested by the reviewers

2e208fc

Felix-Kummer and others added 30 commits September 18, 2024 16:25

Added date format to date parameters

e036cb3

Removed deprecated docker parameter

7564dd6

Added tags to all modules

cd5fb92

Removed restrictions on FORCE thread numbers and corresponding parameter

f12f0c9

Replaced usage of params in modules and subworkflows with channels

7f7805b

Added more output channels to the top level workflow

fd09671

Added automatic tarball extraction for input parameters

a4d2e7f

Changed UNTAR container to its default container image

99ff909

Changed force-pyramid to process a single file instead of groups of f…

605f154

…iles

Added missing description in the manifest scope.

e8731e1

Merge branch 'nf-core:dev' into dev

ac70e35

Merge branch 'dev' into nf-core-template-merge-3.0.2

7d32cb3

Merge branch 'dev' into nf-core-template-merge-3.0.2

7e34131

Merge pull request #10 from nf-core/nf-core-template-merge-3.0.2

6fe422b

Important! Template update for nf-core/tools v3.0.2

Merge remote-tracking branch 'origin/dev' into dev

b6c2e63

Updated input mimetype in schema

22166e9

Added missing fields to schema

c822ba7

Updated changelog

15e4c1e

Fixed some linter warnings

415edb4

Replaced for-loops with GNU parallel

0585e5e

Migrated check result modules to nf-test

bd82d81

Added parameter to disable all output publishing

a8da0e3

Fixed log output for preprocessing

e67107d

Added pipeline-level nf-test tests

88ea70f

Aligned run command in usage docs with other pipelines

72996b1

Updated nf-core workflows and modules

ba1d17f

Fix nf-test entry in .gitignore

d8be4ef

Co-authored-by: Matthias Hörtenhuber <[email protected]>

Fixed wrong pipeline version in usage.md

3d64227

Co-authored-by: Matthias Hörtenhuber <[email protected]>

Added concrete boolean values for parameters in usage.md

a651b55

Merge pull request #9 from CRC-FONDA/dev

caee766

Adressing review suggestions for Version 1.0.0

	nextflow run nf-core/rangeland/main.nf \
	nextflow run nf-core/rangeland \

	The default value is 30, as most Landsat satellite natively provide this resolution.
	The default value is `30`, as most Landsat satellite natively provide this resolution.


		### Group size

		The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to 100.


		### Visualization

		The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results (this is the default behavior, make sure to not disable mosaic when using test profiles) .


		label 'process_single'

		container "docker.io/davidfrantz/force:3.7.10"

		@@ -0,0 +1,41 @@
		nextflow.enable.dsl = 2

		process CHECK_RESULTS {


		label 'process_low'

		container 'docker.io/rocker/geospatial:4.3.1'

	qai_tiles_to_merge = qai_tiles.filter{ x -> x[1].size() > 1 }

	qai_tiles_to_merge = qai_tiles.filter{ x -> x[1].size() > 1 }

		*/


		// check wether provided input is within provided time range

	// check wether provided input is within provided time range
	// check whether provided input is within provided time range

		data = base_path.map(it -> file("$it//", type: 'dir')).flatten()
		data = data.flatten().filter{ inRegion(it) }

[Do not merge!] Pseudo PR for first release #8

Are you sure you want to change the base?

[Do not merge!] Pseudo PR for first release #8

Conversation

mashehu commented Aug 5, 2024

github-actions bot commented Aug 5, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

mashehu left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nictru left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfy133 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 5, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

mashehu left a comment •

edited

Loading