Prepare for release (#307)

* increase version and add changelog * update pre-commit * add more docs * Update docs/source/common-configuration-patterns.md Co-authored-by: Matic Lubej <[email protected]> --------- Co-authored-by: Matic Lubej <[email protected]>
sentinel-hub · Nov 22, 2023 · 1d1b9ff · 1d1b9ff
1 parent 4df3570
commit 1d1b9ff
Show file tree

Hide file tree

Showing 4 changed files with 58 additions and 16 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -26,7 +26,7 @@ repos:
         language_version: python3
 
   - repo: https://github.com/charliermarsh/ruff-pre-commit
-    rev: "v0.1.5"
+    rev: "v0.1.6"
     hooks:
       - id: ruff
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,14 @@
+## [Version 1.7.0] - 2023-11-22
+With this release we push `eo-grow` towards a more `ray` centered execution model.
+
+- The local EOExecutor models with multiprocessing/multithreading have been removed. (Most) pipelines no longer have the `use_ray` and `workers` parameters. In order to run instances locally one has to set up a local cluster (via `ray start --head`). We included a `debug` parameter that uses `EOExecutor` instead of `RayExecutor` so that IDE breakpoints work in most pipelines.
+- Pipeline chain configs have been adjusted. The user can now specify what kind of resources the main pipeline process would require. This also allows one to run pipelines entirely on worker instances.
+- The `ray_worker_type` field was replaced with `worker_resources` that allows for precise resource request specifications.
+- Fixed a but where CLI variables were not applied for config chains.
+- Removed `TestPipeline` and the `eogrow-test` command.
+- Some `ValueError` exceptions were changed to `TypeError`.
+
+
 ## [Version 1.6.3] - 2023-11-07
 
 - Pipelines can request specific type of worker when run on a ray cluster with the `ray_worker_type` field.

diff --git a/docs/source/common-configuration-patterns.md b/docs/source/common-configuration-patterns.md
@@ -102,11 +102,11 @@ In certain use cases we have multiple pipelines that are meant to be run in a ce
 But the user still needs to run them in the correct order and by hand. This we can automate with a simple pipeline chain that links them together:
 ```
 [ // end_to_end_run.json
-  {"**download": "${config_path}/01_download.json"},
-  {"**preprocess": "${config_path}/02_preprocess_data.json"},
-  {"**predict": "${config_path}/03_use_model.json"},
-  {"**export": "${config_path}/04_export_maps.json"},
-  {"**ingest": "${config_path}/05_ingest_byoc.json"},
+  {"pipeline_config": {"**download": "${config_path}/01_download.json"}},
+  {"pipeline_config": {"**preprocess": "${config_path}/02_preprocess_data.json"}},
+  {"pipeline_config": {"**predict": "${config_path}/03_use_model.json"}},
+  {"pipeline_config": {"**export": "${config_path}/04_export_maps.json"}},
+  {"pipeline_config": {"**ingest": "${config_path}/05_ingest_byoc.json"}},
 ]
 ```
 
@@ -119,28 +119,59 @@ In experimentation we often want to run the same pipeline for multiple parameter
 ```
 [ // run_threshold_experiments.json
   {
-    "variables": {"threshold": 0.1},
-    "**pipeline": "${config_path}/extract_trees.json"
+    "pipeline_config:{
+      "variables": {"threshold": 0.1},
+      "**pipeline": "${config_path}/extract_trees.json"
+    },
   },
   {
-    "variables": {"threshold": 0.2},
-    "**pipeline": "${config_path}/extract_trees.json"
+    "pipeline_config:{
+      "variables": {"threshold": 0.2},
+      "**pipeline": "${config_path}/extract_trees.json"
+    },
   },
   {
-    "variables": {"threshold": 0.3},
-    "**pipeline": "${config_path}/extract_trees.json"
+    "pipeline_config:{
+      "variables": {"threshold": 0.3},
+      "**pipeline": "${config_path}/extract_trees.json"
+    },
   },
   {
-    "variables": {"threshold": 0.4},
-    "**pipeline": "${config_path}/extract_trees.json"
+    "pipeline_config:{
+      "variables": {"threshold": 0.4},
+      "**pipeline": "${config_path}/extract_trees.json"
+     }
   }
 ]
 ```
 
-### Using variables with pipelines
+### Using variables with pipeline chains
 
 While there is no syntactic sugar for specifying pipeline-chain-wide variables in JSON files, one can do that through CLI. Running `eogrow end_to_end_run.json -v "year:2019"` will set the variable `year` to 2019 for all pipelines in the chain.
 
+### Specifying resources for pipeline execution
+
+Pipeline chains also allow the user to specify resources needed by the main process of each pipeline in a similar way that a pipeline config can specify resources needed by its workers.
+
+```
+[ // end_to_end_run.json
+  {
+    "pipeline_config": {"**download": "${config_path}/01_download.json"}
+  }
+  {
+    "pipeline_config": {"**predict": "${config_path}/03_use_model.json"},
+    "pipeline_resources": {"memory": 2e9} // ~ 2GB RAM reserved for the main process
+  }
+  {
+    "pipeline_config": {"**export": "${config_path}/04_export_maps.json"}
+  }
+]
+```
+
+This also allows us to run certain pipelines on specially tagged workers. When setting up the cluster, one can tag workers with custom resources, for instance a `r5.4xlarge` worker with `big_RAM_worker: 1`. If we set `"pipeline_resources": {"resources": {"big_RAM_worker": 1}}` then the pipeline will run ONLY on such workers, and the whole worker instance will be assigned to it. This is great for pipelines which have a large workload in the main process.
+
+Pipeline chains can be 1 pipeline long, so this can also be used with a single pipeline.
+
 ## Path modification via variables
 
 In some cases one wants fine grained control over path specifications. The following is a simplified example of how one can provide separate download paths for a large amount of batch pipelines.

diff --git a/eogrow/__init__.py b/eogrow/__init__.py
@@ -1,3 +1,3 @@
 """The main module of the eo-grow package."""
 
-__version__ = "1.6.3"
+__version__ = "1.7.0"