-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
129 lines (85 loc) · 7.63 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: "A non-destructive method to quantify starch content in red clover"
author: Lea Frey, Philipp Baumann, Helge Aasen, Bruno Studer, Roland Kölliker
output: rmarkdown::github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# The science behind
# Technical description
This is the code repository that produces the outputs of the manuscript with the above title.
The directory is self-contained and is designed to run reproducibly, either on your host operating system (local or remote) or in a Docker container (local or remote; relying on kernel of the host). The practical instructions to deploy this Docker image and run all analyses within this project can be found below. Attribution is given to Thomas Knecht aka Mr. Propper, who encouraged me do dockerize and proceed with the orchestration.
# Project structure
:notebook_with_decorative_cover: `Dockerfile`: Docker recipe that will pull the operating system, pull system dependencies, install R v3.6.0, and install all required R packages.
:notebook_with_decorative_cover: `R/`: Custom R functions required for the analysis.
:notebook_with_decorative_cover: `_convert-images.R`: Produce `.eps` outputs for the manuscript submission.
:notebook_with_decorative_cover: `_drake.R`: Load packages, load functions, define the {drake} plan. The script runs {drake} make via `_make.R`.
:notebook_with_decorative_cover: `_make.R`: Invoke {drake} make via callr for sanity using `drake::r_make()`.
:notebook_with_decorative_cover: `code/`: R scripts for the analyis. They will be run in sequential order. `drake::code_to_plan()` in `_drake.R` will invoke them.
# Rerun all analyses (1. or 2.)
## 1. Reproduce the analysis within the host operating system
First, download this repository or clone it with git. ([Git](https://git-scm.com/) is a popular free and open source version control software. Simply download to feel it.)
```{bash, eval=FALSE}
git clone https://github.com/philipp-baumann/leaf-starch-spc
```
Windows users probably want to download the R 3.6.3 or older version of [rtools](https://cran.r-project.org/bin/windows/Rtools/history.html) to build packages from source. MacOS users will require [XCode](https://developer.apple.com/xcode/) for the compiler toolchain. To restore all required packages at versions defined in the file [`renv.lock`](https://github.com/philipp-baumann/leaf-starch-spc/blob/master/renv.lock) based on the [renv](https://github.com/rstudio/renv) R package, execute the following in the project directory. You might first want to set up the project directory in RStudio (see [here](https://r4ds.had.co.nz/workflow-projects.html)) unless you work in a terminal.
```{r, eval=FALSE}
install.packages("remotes")
remotes::install_github("rstudio/[email protected]")
# Automatically installs packages from CRAN and github
# as specified in `renv.lock`
renv::restore()
```
You can manually run the scripts in sequential order, but we recommend to deploy the entire workflow in automated manner using [drake](https://books.ropensci.org/drake/) R package. This gives you tangible evidence of reproducibility.
```{r, eval=FALSE}
# Make drake plan (targets and expressions in scripts: see ./code/:
# Starts a separate R process from R for safe interactivity
source("_make.R")
```
## 2. Reproduce the analysis within Docker container (remote server or local machine)
Docker provides an open-source solution to create an isolated software environment that captures the entire computational environment. This makes the data analysis scalable and reproducible, independent of the host operating system. To get started with Docker, there is an [rOpenSci R Docker tutorial](https://ropenscilabs.github.io/r-docker-tutorial/) that explains the motivation and basics of using Docker for reproducible research. However, you can also just follow the steps outlined below. The only caveat at the moment is that the Docker image is a bit large (make sure you maybe have 150GB of disk space). I currently don't know the reason for this.
A `Dockerfile` is a text file that contains a recipe to build an image with a layered approach. A docker container is a running instance of an image. [This `Dockerfile`](https://github.com/philipp-baumann/leaf-starch-spc/blob/master/Dockerfile) is based on the [`rocker/rstudio:3.6.0`](https://hub.docker.com/r/rocker/rstudio/) image, which bases itself on [`rocker/r-ver`](https://hub.docker.com/r/rocker/r-ver/) with debian 9 (stretch) including version stable base R (v3.6.0) and the source build tools. The RStudio image provides RStudio server within a Docker image, that you can access via your browser. Basic instructions are given here, but for getting started you can additionally consider [this resource](https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image). The image is version-stable and uses the MRAN snapshot of the last day that the R version 3.6.0 was the most recent release.
The workflow deployed here is fueled by the [`{renv}`](https://rstudio.github.io/renv/articles/renv.html) package, which manages the installation of specific package versions and sources, and the [`{drake}`](https://docs.ropensci.org/drake/) package to keep track of R code and data that produce the results.
The [drake manual lists two examples](https://ropenscilabs.github.io/drake-manual/index.html#with-docker) in section 1.5 that combine `{drake}` workflows with Docker. This can give you some more detail of how everything works under the hood.
### Docker recipe
The following docker bash commands generates the computational environment, runs all computations, and let you grab the results of the entire analysis done in R.
1. Build the docker image with instructions from the [`Dockerfile`](https://github.com/philipp-baumann/leaf-starch-spc/blob/master/Dockerfile).
```{bash, eval=FALSE}
# Cache configuration: https://github.com/rstudio/renv/issues/362
# https://github.com/rstudio/renv/issues/400
docker build -t leaf-starch-spc .
```
2. Check whether the image is built.
```{bash, eval=FALSE}
docker images
```
3. Launch the container from the built image. Share two local paths as volumes (host) with the container. The analysis worflow orchestrated by {drake} will write output files (Figures) explained in the accompanying manuscript.
```{bash, eval=FALSE}
# https://www.rocker-project.org/use/managing_users/
# https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine
docker run --rm -d -p 8787:8787 \
-e PASSWORD=spcclover \
-v "$(pwd)/out:/home/rstudio/out" \
-v "$(pwd)/pub:/home/rstudio/pub" \
-e USERID=$UID -e GROUPID=$GID leaf-starch-spc
```
4. Open RStudio server and kick-off the workflow. There are two suggestions deployment, one via docker running on your computer (4. i.), and the other via docker on a virtual machine tunnelled via ssh (4. ii.)
```{bash}
cat _make.R
```
`drake::r_make()` invokes `_drake.R`, calling `drake::make()` in a separate processs in the operating system to sanitize the make process.
```{r, eval=FALSE}
# Run in the R console in RStudio Server
source("_make.R")
```
4.i. **Local port-forwarding via ssh**: The RStudio server service running within the docker image on the remote VM can be tunneled into your local browser session using ssh port forwarding. This is extremely convenient because you one can do interactive data analysis with "local feel".
```{bash, eval=FALSE}
ssh -f -N -L 8787:localhost:8787 <your_user>@<host_ip_address>
```
Simply open RStudio Server in your browser on [localhost:8787](localhost:8787). Then, login with user `rstudio` and password `spcclover`
# File overview
The files in this project are organized as follows (only 2 folder levels are shown):
```{r, echo=FALSE}
fs::dir_tree(recurse = 1)
```