Control plane environment where all model simulations occur.
This repository is meant to be used as a template to create similar repositories for use as a control plane when executing simulations.
The web-ui initiates model runs by sending a repository_dispatch event to the control plane repo.
The configuration for the model run is passed as JSON in the client_payload
portion of this event when it is triggered from the UI.
The build matrix Actions functionality is used to run all of the models in parallel (if there is free runner capacity) as independent jobs. Each model run can be tracked as an independent job within the workflow and will report back to and become available in the UI independently.
There is a small but important templating step in the workflow that uses jq
to transform the list of models to run into a single model to run.
This is required for the matrix execution to work properly.
Note that the full field value is first extracted from the input payload in the matrix configuration.
There are a number of job steps that can upload artifacts back to GitHub Actions to be stored with the job run. Given that the data is copied by the model runner into an Azure Storage container, these steps are purely optional. These are not run by default, but can be enabled through the use of secrets (below).
The docker-compose.yaml
file is used as an easy way to store the complete configuration that is required to execute the model-runner container.
The most import parts are the volume mounts for the input and output container as well as the Docker socket because the runner requires being able to use docker-in-docker.
While it is possible to use the standard GitHub Action runners in order to execute the models, some models may require more resources (CPU and memory) than they have available. As a result the example job is configured to use self-hosted runners so that the runners can be sure to have enough resources to run all the models.
Because we are mounting local storage in to the containers, it is very important that when using self-hosted runners the Cleanup
step of the workflow is present and always executed.
This ensures that data from previous runs is not able to pollute the current run.
The control-plane is also responsible for starting scheduled tasks on the infrastructure.
At present, this is limited to running the daily fetch of case and intervention data.
This is initiated by the fetch-recorded-data
workflow.
If you are setting up a control-plane for a development environment, (i.e. without a dedicated web-ui), you should disable this workflow.
The workflow uses secrets as a mechanism to inject both credentials and configuration information would be burdensome or risky to store in the workflow file itself. The infrastructure-template contains configuration for setting these values, but you can set them manually if you choose not to use that.
This is more of an environment variable than a secret, but allows for easily updating the version of the model runner that will be used during job execution.
RUNNER_VERSION
- Go to Settings > Secrets.
- Set the value of
RUNNER_VERSION
to the version of themodel-runner
package that you want this control plane to use when running simulations.- The available package versions can be found here.
- Note that package versions are taken from the last segment of branch names in the
model-runner
repo (e.g.master
corresponds to themaster
branch,0.3.0
corresponds to the tagged versionv0.3.0
, andmy-branch
corresponds to any branch namedsome-prefix/my-branch
). - The
model-runner
repo is configured to automatically publish Docker images on updates to the different model packages it contains. For any additional models, you may need to set up similar automation, or manually publish their Docker images to a registry.
The shared secret for the instance of the web-ui that you want to coordinate with.
API_SHARED_SECRET
For the scheduled tasks, you also need to specify the endpoint of the web-ui (unless you have disabled the relevant workflows).
API_URL
All of the Docker images that we have used to date are stored in GitHub Packages and require credentials with appropriate (read-only) permissions for all the images.
More about how to authenticate can be found in the documentation.
Note that the built-in Actions token (secrets.GITHUB_TOKEN
) can access packages stored on this same repository; to access packages stored in other repositories, such as the model-runner
repo, you may need to create a GitHub bot user with access to the repo and obtain a Personal Access Token for it.
It is possible to use additional/other Docker container registries. If they require credentials, the appropriate login commands will need to be added to the "Log into registry" set of the "run" job.
GPR_USER
GPR_PAT
Azure storage credentials for storing the results of each model run.
AZURE_STORAGE_ACCOUNT
AZURE_STORAGE_CONTAINER
To enable artifact storage within GitHub Actions, set the following to a comma-separated list containing the types of artifact you want to retain.
KEEP_ARTIFACTS=input,output,log
We have used three environments (dev
, staging
, and prod
) and corresponding control planes setup for testing new models and code.
You may wish to use a different set of environments, but we expect having more than one will be quite common.
Each environment will need to have its own distinct control plane repo within your organization.
The majority of common changes (upgrading model and runner versions) can be done by simply change the corresponding Secret in each environment when you have deemed the change ready for promotion.
Any changes that we made to the workflow or control plane repo itself can be migrated between environments using a normal Git workflow.
Check out the three control plane repositories. Sibling directories are useful.
Merging dev into staging:
- Navigate to your checkout of
control-plane-dev
. - Add a remote for
staging
:git remote add staging [email protected]:<your_org>/control-plane-staging.git
- Update
master
and push to a branch onstaging
:git checkout master git pull origin master git push staging master:merge/dev-staging
- Open a PR, get it approved, and merge it.
Merging staging into prod:
- Navigate to your checkout of
control-plane-staging
. - Add a remote for prod:
git remote add prod [email protected]:<your_org>/control-plane-prod.git
- Update
master
and push to a branch onprod
:git checkout master git pull origin master git push prod master:merge/staging-prod
- Open a PR, get it approved and, merge it.