generated from ecohealthalliance/container-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated from template to actual project description
- Loading branch information
1 parent
e72efd6
commit 2c5d0c7
Showing
1 changed file
with
32 additions
and
337 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,360 +1,55 @@ | ||
--- | ||
editor_options: | ||
markdown: | ||
wrap: 72 | ||
--- | ||
|
||
<!-- README.md is generated from README.Rmd. Please edit that file --> | ||
|
||
# Containerised R workflow template | ||
# South Caucasus Drivers of Zoonotic Disease | ||
|
||
<!-- badges: start --> | ||
|
||
[![Project Status: WIP – Initial development is in progress, but there | ||
[![Project Status: WIP -- Initial development is in progress, but there | ||
has not yet been a stable, usable release suitable for the | ||
public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip) | ||
[![Lifecycle: | ||
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) | ||
[![container-workflow-template](https://github.com/ecohealthalliance/container-template/actions/workflows/container-workflow-template.yml/badge.svg)](https://github.com/ecohealthalliance/container-template/actions/workflows/container-workflow-template.yml) | ||
<!-- badges: end --> | ||
|
||
This is a template repository of a containerised R workflow built on the | ||
`targets` framework, made portable using `renv`, and ran manually or | ||
automatically using `GitHub Actions`. To use this template click on the | ||
“use this template button” and then select create a new repository. | ||
|
||
Check out the | ||
[`containerTemplateUtils`](https://github.com/ecohealthalliance/containerTemplateUtils) | ||
package for handling common tasks related to this repo (sending emails, | ||
uploading files to AWS, etc. ) | ||
|
||
Note that `git-crypt` is not part of the template repo. See the [EHA M&A | ||
handbook](https://ecohealthalliance.github.io/eha-ma-handbook/16-encryption.html#set-up-encryption-for-a-repo-that-did-not-previously-use-git-crypt) | ||
for how to add git-crypt. | ||
|
||
Follow the links for more information about: | ||
|
||
- [`targets`](https://ecohealthalliance.github.io/eha-ma-handbook/3-projects.html#targets) | ||
- [`renv`](https://ecohealthalliance.github.io/eha-ma-handbook/3-projects.html#package-management-with-renv) | ||
- [git-crypt](https://ecohealthalliance.github.io/eha-ma-handbook/16-encryption.html) | ||
- [Reproducible | ||
workflows](https://github.com/ecohealthalliance/building-blocks-of-reproducibility) | ||
|
||
Recommendations: | ||
- One function per file in R/ | ||
- Non-function R scripts in another directory like `scripts/` | ||
- Use the same names for targets and function arguments for those | ||
targets unless a function | ||
- Nouns for targets, verbs for functions | ||
- Use common suffixes for target types: `_file` for files, `_raw` for | ||
read-in but unprocessed data | ||
- Use `fnmate` and `tflow` RStudio Add-Ins to make this easy, create | ||
shortcuts for these add-ins | ||
([talk](https://www.youtube.com/watch?v=jU1Zv21GvT4)), or the `usethis` | ||
package | ||
|
||
## Quick start | ||
|
||
- Create repo from template | ||
- rename .Rproj file | ||
- streamline packages in `packages.R` | ||
- modify `.gitattributes` to include any files that may need encryption | ||
- initialize `git-crypt` for repo | ||
- add relevant environment variables to `.env` file | ||
- rename github actions workflows | ||
|
||
## GitHub Actions | ||
|
||
[GitHub Actions](https://docs.github.com/en/actions) allows automation, | ||
customisation, and execution of your research project workflows right in | ||
your GitHub repository. | ||
|
||
In gist, [GitHub Actions](https://docs.github.com/en/actions) is a | ||
*workflow* composed of a *job* or a number of *jobs*. The *job/s* are | ||
then composed of *steps* that control the order in which *actions* are | ||
run in order to complete a *job/s*. This *workflow* is scheduled or | ||
triggered by a specific *event* and runs on what is called a *runner* - | ||
a server that has the [GitHub | ||
Actions](https://docs.github.com/en/actions) runner application | ||
installed - that is either hosted by GitHub, or self-hosted on your own | ||
machines. | ||
|
||
This whole **workflow** including the **event** trigger and the | ||
**runner** on which the **workflow** will run in are specified and | ||
detailed using a workflow `.yml` file that is saved inside a directory | ||
named `.github` within your GitHub repository in which you want to use | ||
[GitHub Actions](https://docs.github.com/en/actions) on. | ||
|
||
<img src=https://miro.medium.com/max/2617/1*8mUtip6z_oydfLi4P86KUw.png /> | ||
|
||
This repository, contains a template [GitHub | ||
Actions](https://docs.github.com/en/actions) workflow with its | ||
corresponding `.yml` file that illustrates how [GitHub | ||
Actions](https://docs.github.com/en/actions) can be used to run and | ||
maintain an R workflow that uses `targets` and `renv`. | ||
|
||
## Using containers in GitHub Actions workflow | ||
|
||
A **container** is a standard unit of software that packages up code and | ||
all its dependencies so the application runs quickly and reliably from | ||
one computing environment to another. | ||
|
||
**Containers** can be used within a [GitHub | ||
Actions](https://docs.github.com/en/actions) workflow and can be | ||
specified either at the **job** level or at the **step** level. If | ||
specified at the **job** level, all the **steps** within that **job** | ||
will be run inside that container. When specified at the **steps** | ||
level, different containers can be used for each **step**. | ||
|
||
The example/template workflow can be found inside the `.github` folder | ||
and is shown below: | ||
|
||
``` yaml | ||
name: container-workflow-template | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
- master | ||
pull_request: | ||
branches: | ||
- main | ||
- master | ||
workflow_dispatch: | ||
branches: | ||
- '*' | ||
#schedule: | ||
# - cron: "0 8 * * *" | ||
|
||
jobs: | ||
container-workflow-tempalte: | ||
runs-on: ubuntu-latest # Run on GitHub Actions runner | ||
#runs-on: [self-hosted, linux, x64, onprem-aegypti] # Run the workflow on EHA aegypti runner | ||
#runs-on: [self-hosted, linux, x64, onprem-prospero] # Run the workflow on EHA prospero runner | ||
container: | ||
image: rocker/verse:4.1.2 | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- name: Install system dependencies | ||
run: | | ||
apt-get update && apt-get install -y --no-install-recommends \ | ||
libcurl4-openssl-dev \ | ||
libssl-dev | ||
- name: Restore R packages | ||
run: | | ||
renv::restore() | ||
shell: Rscript {0} | ||
|
||
- name: Run targets workflow | ||
run: | | ||
targets::tar_make() | ||
shell: Rscript {0} | ||
``` | ||
|
||
In this example, we show a data quality check workflow report for a | ||
nutrition survey of children 6-59 months old. | ||
### The trigger | ||
The trigger for GitHub Actions is specified in these lines in the | ||
workflow YAML file: | ||
``` yaml | ||
on: | ||
push: | ||
branches: | ||
- main | ||
- master | ||
pull_request: | ||
branches: | ||
- main | ||
- master | ||
workflow_dispatch: | ||
branches: | ||
- '*' | ||
#schedule: | ||
# - cron: "0 8 * * *" | ||
``` | ||
|
||
This workflow automatically runs when there is a **push** or **pull | ||
request** event to the main branch of the repository. This workflow has | ||
also been set to have the option to be run manually from the GitHub | ||
Actions page for any branch of the repository through the | ||
`workflow-dispatch` specification in the workflow YAML file. | ||
|
||
GitHub Actions can also be scheduled to run at specific times and | ||
frequency using the `schedule` specification in the workflow YAML file | ||
using [POSIX cron syntax](https://en.wikipedia.org/wiki/Cron). Scheduled | ||
workflows run on the latest commit on the default or base branch. The | ||
shortest interval you can run scheduled workflows is once every 5 | ||
minutes. In the example workflow, the `schedule` specification has been | ||
set to run at 8 am everyday but this has been hashed out. If you would | ||
like to schedule your workflow runs, remove the hash and then set the | ||
POSIX cron syntax to the frequency that you require. *Note while github | ||
actions is highly reliable Github does not guarantee that a scheduled | ||
job will run if you’re using github servers and jobs are less likely to | ||
run if you choose a popular run time (generally on the hour).* | ||
|
||
### The job | ||
|
||
The job for GitHub Actions is specified in these lines in the workflow | ||
YAML file: | ||
|
||
``` yaml | ||
jobs: | ||
container-workflow-template: | ||
runs-on: ubuntu-latest # Run on GitHub Actions runner | ||
#runs-on: [self-hosted, linux, x64, onprem-aegypti] # Run the workflow on EHA aegypti runner | ||
#runs-on: [self-hosted, linux, x64, onprem-prospero] # Run the workflow on EHA prospero runner | ||
container: | ||
image: rocker/verse:4.1.2 | ||
``` | ||
The job named `container-workflow-template` is specified to run on | ||
runners hosted by GitHub Actions. These runners can be identified | ||
through a tag that specifies the operating software followed by the | ||
version. In the example workflow, the line specifying | ||
`runs-on: ubuntu-latest` runs the workflow on a machine hosted by GitHub | ||
Actions with the latest Ubuntu operating software. | ||
|
||
The job can also be run on a self-hosted GitHub Actions runner that is | ||
installed on EHA’s high performance computing machines using the | ||
`runs-on` workflow YAML specification. Tags unique to this GitHub runner | ||
are used to identify the specific machine to use. Syntax on how to | ||
specify these runners are shown but hashed out. | ||
|
||
To further make the GitHub Actions workflow more robust and | ||
reproducible, we setup a container at the **job** level. The container | ||
specified is a versioned R image that has `tidyverse` and other R | ||
publishing tools installed. This container image would generally be | ||
adequate for most workflows that require data wrangling and manipulation | ||
using the `tidyverse` tools and reporting using `rmarkdown`. Some | ||
projects/workflows (like those using spatial packages such as `sf`) may | ||
benefit from using a different R image so change the container | ||
specification accordingly. To read more about available R images, see | ||
<https://www.rocker-project.org/images/>. | ||
|
||
## Using this GitHub Actions workflow template | ||
|
||
This repository has been set as a private template repository. This | ||
means that this can be used by EHA staff for creating new repositories | ||
with the same filesystem. | ||
|
||
This can be done as follows: | ||
|
||
1. In your GitHub account, go to the EcoHealth Alliance organisation | ||
(<https://github.com/ecohealthalliance>) then click on the green | ||
button labeled `New`. | ||
|
||
2. You will now be directed to the `Create new repository` page. Here, | ||
right at the top, you will see the `Repository template` heading. | ||
Click on the drop down button right below this that says | ||
`No template`. You will then see all the available templates within | ||
EHA. Select the template named | ||
`ecohealthalliance/container-template`. | ||
|
||
3. Give your new repository a name, set the appropriate repository | ||
visibility, and then click on `Create repository`. | ||
|
||
4. You will now have a new repository the contents of which are the | ||
same files and structure as this template repository. | ||
|
||
5. You can now make the necessary changes and additions that are | ||
specific to your workflow. | ||
|
||
## Using `git-crypt` to encrypt files in your workflow | ||
<!-- badges: end --> | ||
|
||
Your project may contain a mix of public and private content. Being able | ||
to encrypt the private contents of your project is very useful. It is | ||
recommended that you use PGP (Pretty Good Privacy) encryption, | ||
implemented by the program | ||
[`git-crypt`](https://github.com/AGWA/git-crypt). It takes a bit to set | ||
up but once activated makes sharing secure and seamless. To setup PGP | ||
and `git-crypt` on your project that is based on this template, see the | ||
[*Encryption* chapter of the EHA Modeling and Analytics | ||
Handbook](https://ecohealthalliance.github.io/eha-ma-handbook/14-encryption.html). | ||
This repository is used to map global data sets on drivers of zoonotic | ||
disease spillover, emergence, and spread downscaled to the South | ||
Caucasus region for presentation and comparison across countries. | ||
|
||
Once you have enabled `git-crypt` on your project, you will need to make | ||
the following edits to the `container-workflow-template.yml` file to be | ||
able to perform symmetric key decryption described | ||
[here](https://ecohealthalliance.github.io/eha-ma-handbook/14-encryption.html#extra-use-a-symmetric-key-for-automated-processes). | ||
Here is the `container-workflow-template.yml` file updated to allow and | ||
perform symmetric key decryption: | ||
The repo uses a containerised R workflow built on the `targets` | ||
framework, made portable using `renv`. | ||
|
||
``` yaml | ||
name: container-workflow-encrypted-template | ||
## Data | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
- master | ||
pull_request: | ||
branches: | ||
- main | ||
- master | ||
workflow_dispatch: | ||
branches: | ||
- '*' | ||
#schedule: | ||
# - cron: "0 8 * * *" | ||
All data is publicly available and can be directly downloaded online. | ||
Data sources and direct download links are in the '\_targets.R' file. | ||
|
||
env: | ||
GIT_CRYPT_KEY64: ${{ secrets.GIT_CRYPT_KEY64 }} | ||
jobs: | ||
container-workflow-encrypted-tempalte: | ||
runs-on: ubuntu-latest # Run on GitHub Actions runner | ||
#runs-on: [self-hosted, linux, x64, onprem-aegypti] # Run the workflow on EHA aegypti runner | ||
#runs-on: [self-hosted, linux, x64, onprem-prospero] # Run the workflow on EHA prospero runner | ||
container: | ||
image: rocker/verse:4.1.2 | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Install system dependencies | ||
run: | | ||
apt-get update && apt-get install -y --no-install-recommends \ | ||
git-crypt \ | ||
libcurl4-openssl-dev \ | ||
libssl-dev | ||
- name: Decrypt repository using symmetric key | ||
run: | | ||
echo $GIT_CRYPT_KEY64 > git_crypt_key.key64 && base64 -di git_crypt_key.key64 > git_crypt_key.key && git-crypt unlock git_crypt_key.key | ||
rm git_crypt_key.key git_crypt_key.key64 | ||
- name: Restore R packages | ||
run: | | ||
renv::restore() | ||
shell: Rscript {0} | ||
- name: Run targets workflow | ||
run: | | ||
targets::tar_make() | ||
shell: Rscript {0} | ||
``` | ||
## Functions | ||
|
||
Once you have edited your worklfow YAML file and before you push the | ||
changes to GitHub, you will then have to add the symmetric key to your | ||
GitHub repository as a secret. | ||
Individual functions for manipulating the data are available in the 'R' | ||
folder. | ||
|
||
First, generate a symmetric key by running this in your project | ||
directory. | ||
## Related Outputs | ||
|
||
``` bash | ||
git-crypt export-key git_crypt_key.key | ||
``` | ||
Maps generated through this repo are embedded in in-depth country | ||
reports that take stock of and assess One Health operations in the South | ||
Caucasus region. These reports highlight the progress that Armenia, | ||
Azerbaijan, and Georgia have made in implementing the One Health | ||
approach, while identifying continued needs for One Health system | ||
strengthening. | ||
|
||
`git_crypt_key.key` can now be used to decrypt the repository, and you | ||
can provide it to GitHub Actions as a secret environment variable (see | ||
<https://docs.github.com/en/actions/security-guides/encrypted-secrets>). | ||
However, since it is binary data, you’ll need to convert it to base64 | ||
first. So run something like: | ||
- Armenia report (English and | ||
Armenian): <https://zenodo.org/doi/10.5281/zenodo.10094792> | ||
|
||
``` bash | ||
cat git_crypt_key.key | base64 | pbcopy | ||
``` | ||
- Azerbaijan report (English and | ||
Azerbaijani): <https://zenodo.org/doi/10.5281/zenodo.10048711> | ||
|
||
to convert this file to base64 data, then paste it in GitHub’s secret | ||
environment variable field as `GIT_CRYPT_KEY64`. | ||
- Georgia report (English and | ||
Georgian): <https://zenodo.org/doi/10.5281/zenodo.10048349> |