GitHub - NASA-IMPACT/veda-pforge-job-runner: Apache Beam + EMR Serverless Job Runner for Pangeo Forge Recipes

veda-pforge-job-runner

EMR Serverless + Apache Beam Job Runner

Getting Started

Create a personal access token in Github with the "workflow" scope

To kick off jobs on GH you'll need to provide inputs. Note that .github/workflows/job-runner.yaml in this repository describes the allowed inputs and defaults. Currently, the only non-defaulted required inputs are repo and job_name:

on:
 workflow_dispatch:
   inputs:
     repo:
       description: 'The https github url for the recipe feedstock'
       required: true
     ref:
       description: 'The tag or branch to target in your recipe repo'
       required: true
       default: 'main'
     feedstock_subdir:
       description: 'The subdir of the feedstock directory in the repo'
       required: true
       default: 'feedstock'
     spark_params:
       description: 'space delimited --conf values: https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-spark.html'
       required: true
       default: '--conf spark.executor.cores=16 --conf spark.executor.memory=60G --conf spark.executor.memoryOverhead=60G --conf spark.driver.memory=10G --conf spark.driver.memoryOverhead=4G --conf spark.shuffle.file.buffer=64k --conf spark.default.parallelism=1280 --conf spark.emr-serverless.executor.disk=200G'
     job_name:
       description: 'Name the EMR job'
       required: true

Manual Trigger Option:

Head to GH Action tab. Select the job you want to run from the left-hand navigation, under "Actions". The current job name is "dispatch job". Since the "dispatch job" workflow has a workflow_dispatch trigger, you can select "Run workflow" and use the form to input suitable options.

Curl Trigger Option:

Another way to trigger a job is to construct a JSON snippet that describes the recipe inputs you want to run like the example below (this example actually describes the integration tests). We'll pass this to GH Actions in future examples below via a curl POST.

# NOTE that any arguments for your recipe run will be added to the `inputs` hash
# The first-level `ref` below refers to which branch in this GH repositry we want to run things against 
'{"ref":"main", "inputs":{"repo":"https://github.com/pforgetest/gpcp-from-gcs-feedstock.git","ref":"0.10.3"}}'

Fire off a curl command to Github. Replace <your-PAT-here> with the one you created in step one above. And replace <your-JSON-snippet-here> with the one you created in step two above:

   curl -X POST \
     -H "Accept: application/vnd.github+json" \
     -H "X-GitHub-Api-Version: 2022-11-28" \
     -H "Authorization: token <your-PAT-here>" \
     https://api.github.com/repos/NASA-IMPACT/veda-pforge-job-runner/actions/workflows/job-runner.yaml/dispatches \
     -d <your-JSON-snippet-here>

  # INTEGRATION TEST EXAMPLE
  curl -X POST \
     -H "Accept: application/vnd.github+json" \
     -H "X-GitHub-Api-Version: 2022-11-28" \
     -H "Authorization: token blahblah" \
     https://api.github.com/repos/NASA-IMPACT/veda-pforge-job-runner/actions/workflows/job-runner.yaml/dispatches \
     -d '{"ref":"main", "inputs":{"repo":"https://github.com/pforgetest/gpcp-from-gcs-feedstock.git","ref":"0.10.3"}}'

Head to this repository's GH Action tab
If multiple jobs are running you can get help finding your job using the "Actor" filter

There are two subjobs to each job: A) name the job B) kick it off (send it to EMR serverless cluster)

If you have multiple running jobs then each GH subjob gets a unique name that describes the <repo>@<ref> that is running
The last step in the second job titled "echo job metadata" dumps all relevant information including AWS console links to EMR

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.github/workflows		.github/workflows
docs/img		docs/img
terraform		terraform
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

veda-pforge-job-runner

Getting Started

Manual Trigger Option:

Curl Trigger Option:

About

Releases

Packages

Contributors 2

Languages

NASA-IMPACT/veda-pforge-job-runner

Folders and files

Latest commit

History

Repository files navigation

veda-pforge-job-runner

Getting Started

Manual Trigger Option:

Curl Trigger Option:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages