Skip to content

Commit

Permalink
Merge pull request #53 from openEOPlatform/vito_memory
Browse files Browse the repository at this point in the history
document terrascope job options
  • Loading branch information
jdries authored Nov 21, 2022
2 parents f62509c + fe3cb17 commit 62039d3
Showing 1 changed file with 50 additions and 2 deletions.
52 changes: 50 additions & 2 deletions federation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,13 +176,61 @@ on the appropriate processing back-ends.
Subsequent interaction (starting the jobs, polling their status, requesting the result assets, ...)
can be done through the "master" `job` object created above, in the same way as with normal batch jobs.

### Validity signer URLs (Batch job results)
### Validity of signed URLs in batch job results

Batch job results are accessible to the user via signed URLs stored in the result assets. Within the platform,
these URLs have a validity (expiry time) of 7 days. Within these 7 days, the results of a batch job can be accessed
by any person with the URL. Each time a user requests the results from the job endpoint (`GET /jobs/{job_id}/results`),
a freshly signed URL (valid for 7 days) is created for the result assets.

### Customizing batch job resources on Terrascope

Jobs running on the (Terrascope) cluster get assigned a default amount of CPU and memory resources. This
may not always be enough for your job, for instance when using UDF's. Also for very large jobs, you may want
to tune your resource settings to optimize for cost.

The example below shows how to start a job with all options set to their default values. It is important to highlight
that default settings are subject to change by the backend whenever needed.

```python
job_options = {
"executor-memory": "2G",
"executor-memoryOverhead": "3G",
"executor-cores": 2,
"task-cpus": 1,
"executor-request-cores": "400m",
"max-executors": "100",
"driver-memory": "8G",
"driver-memoryOverhead": "2G",
"driver-cores": 5,
}
cube.execute_batch(job_options=job_options)
```

This is a short overview of the various options:

- executor-memory: memory assigned to your workers, for the JVM that executes most predefined processes
- executor-memoryOverhead: memory assigned on top of the JVM, for instance to run UDF's
- executor-cores: number of CPUs per worker (executor). The number of parallel tasks is executor-cores/task-cpus
- task-cpus: CPUs assigned to a single task. UDF's using libraries like Tensorflow can benefit from further parallellization on the level of individual tasks.
- executor-request-cores: this settings is only relevant for Kubernetes based backends, allows to overcommit CPU
- max-executors: the maximum number of workers assigned to your job. Maximum number of parallel tasks is `max-executors*executor-cores/task-cpus`. Increasing this can inflate your costs, while not necessarily improving performance!
- driver-memory: memory assigned to the spark 'driver' JVM that controls execution of your batch job
- driver-memoryOverhead: memory assigned to the spark 'driver' on top of JVM memory, for Python processes.

#### Learning more

The topic of resource optimization is a complex one, and here we just give a short summary. The goal of openEO is to hide most of these
details from the user, but we realize that advanced users sometimes want to have a bit more insight, so in the spirit of being open, we give some hints.

To learn more about these options, we point to the piece of code that handles this:

https://github.com/Open-EO/openeo-geopyspark-driver/blob/faf5d5364a82e870e42efd2a8aee9742f305da9f/openeogeotrellis/backend.py#L1213

Most memory related options are translated to Apache Spark configuration settings, which are documented here:

https://spark.apache.org/docs/3.3.1/configuration.html#application-properties

### Batch job results on Sentinel Hub

If you are processing data and the underlying back-end is Sentinel Hub, the output extent of your batch job results is currently larger than your input extent because Sentinel Hub processes whole tiles (this may change in the future and the data will be cropped to your input extent).
If you are processing data and the underlying back-end is Sentinel Hub, the output extent of your batch job results is currently larger than your input extent because Sentinel Hub processes whole tiles (this may change in the future and the data will be cropped to your input extent).

0 comments on commit 62039d3

Please sign in to comment.