Skip to content

Commit

Permalink
Prepare for release (#307)
Browse files Browse the repository at this point in the history
* increase version and add changelog

* update pre-commit

* add more docs

* Update docs/source/common-configuration-patterns.md

Co-authored-by: Matic Lubej <[email protected]>

---------

Co-authored-by: Matic Lubej <[email protected]>
  • Loading branch information
zigaLuksic and mlubej authored Nov 22, 2023
1 parent 4df3570 commit 1d1b9ff
Show file tree
Hide file tree
Showing 4 changed files with 58 additions and 16 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ repos:
language_version: python3

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.1.5"
rev: "v0.1.6"
hooks:
- id: ruff

Expand Down
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
## [Version 1.7.0] - 2023-11-22
With this release we push `eo-grow` towards a more `ray` centered execution model.

- The local EOExecutor models with multiprocessing/multithreading have been removed. (Most) pipelines no longer have the `use_ray` and `workers` parameters. In order to run instances locally one has to set up a local cluster (via `ray start --head`). We included a `debug` parameter that uses `EOExecutor` instead of `RayExecutor` so that IDE breakpoints work in most pipelines.
- Pipeline chain configs have been adjusted. The user can now specify what kind of resources the main pipeline process would require. This also allows one to run pipelines entirely on worker instances.
- The `ray_worker_type` field was replaced with `worker_resources` that allows for precise resource request specifications.
- Fixed a but where CLI variables were not applied for config chains.
- Removed `TestPipeline` and the `eogrow-test` command.
- Some `ValueError` exceptions were changed to `TypeError`.


## [Version 1.6.3] - 2023-11-07

- Pipelines can request specific type of worker when run on a ray cluster with the `ray_worker_type` field.
Expand Down
59 changes: 45 additions & 14 deletions docs/source/common-configuration-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,11 @@ In certain use cases we have multiple pipelines that are meant to be run in a ce
But the user still needs to run them in the correct order and by hand. This we can automate with a simple pipeline chain that links them together:
```
[ // end_to_end_run.json
{"**download": "${config_path}/01_download.json"},
{"**preprocess": "${config_path}/02_preprocess_data.json"},
{"**predict": "${config_path}/03_use_model.json"},
{"**export": "${config_path}/04_export_maps.json"},
{"**ingest": "${config_path}/05_ingest_byoc.json"},
{"pipeline_config": {"**download": "${config_path}/01_download.json"}},
{"pipeline_config": {"**preprocess": "${config_path}/02_preprocess_data.json"}},
{"pipeline_config": {"**predict": "${config_path}/03_use_model.json"}},
{"pipeline_config": {"**export": "${config_path}/04_export_maps.json"}},
{"pipeline_config": {"**ingest": "${config_path}/05_ingest_byoc.json"}},
]
```

Expand All @@ -119,28 +119,59 @@ In experimentation we often want to run the same pipeline for multiple parameter
```
[ // run_threshold_experiments.json
{
"variables": {"threshold": 0.1},
"**pipeline": "${config_path}/extract_trees.json"
"pipeline_config:{
"variables": {"threshold": 0.1},
"**pipeline": "${config_path}/extract_trees.json"
},
},
{
"variables": {"threshold": 0.2},
"**pipeline": "${config_path}/extract_trees.json"
"pipeline_config:{
"variables": {"threshold": 0.2},
"**pipeline": "${config_path}/extract_trees.json"
},
},
{
"variables": {"threshold": 0.3},
"**pipeline": "${config_path}/extract_trees.json"
"pipeline_config:{
"variables": {"threshold": 0.3},
"**pipeline": "${config_path}/extract_trees.json"
},
},
{
"variables": {"threshold": 0.4},
"**pipeline": "${config_path}/extract_trees.json"
"pipeline_config:{
"variables": {"threshold": 0.4},
"**pipeline": "${config_path}/extract_trees.json"
}
}
]
```

### Using variables with pipelines
### Using variables with pipeline chains

While there is no syntactic sugar for specifying pipeline-chain-wide variables in JSON files, one can do that through CLI. Running `eogrow end_to_end_run.json -v "year:2019"` will set the variable `year` to 2019 for all pipelines in the chain.

### Specifying resources for pipeline execution

Pipeline chains also allow the user to specify resources needed by the main process of each pipeline in a similar way that a pipeline config can specify resources needed by its workers.

```
[ // end_to_end_run.json
{
"pipeline_config": {"**download": "${config_path}/01_download.json"}
}
{
"pipeline_config": {"**predict": "${config_path}/03_use_model.json"},
"pipeline_resources": {"memory": 2e9} // ~ 2GB RAM reserved for the main process
}
{
"pipeline_config": {"**export": "${config_path}/04_export_maps.json"}
}
]
```

This also allows us to run certain pipelines on specially tagged workers. When setting up the cluster, one can tag workers with custom resources, for instance a `r5.4xlarge` worker with `big_RAM_worker: 1`. If we set `"pipeline_resources": {"resources": {"big_RAM_worker": 1}}` then the pipeline will run ONLY on such workers, and the whole worker instance will be assigned to it. This is great for pipelines which have a large workload in the main process.

Pipeline chains can be 1 pipeline long, so this can also be used with a single pipeline.

## Path modification via variables

In some cases one wants fine grained control over path specifications. The following is a simplified example of how one can provide separate download paths for a large amount of batch pipelines.
Expand Down
2 changes: 1 addition & 1 deletion eogrow/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""The main module of the eo-grow package."""

__version__ = "1.6.3"
__version__ = "1.7.0"

0 comments on commit 1d1b9ff

Please sign in to comment.