-
Notifications
You must be signed in to change notification settings - Fork 40
/
14-reproducible_research_3.Rmd
445 lines (329 loc) · 21.2 KB
/
14-reproducible_research_3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
# Reproducible research #3
[Download](https://github.com/geanders/RProgrammingForResearch/raw/master/slides/CourseNotes_Week14.pdf) a pdf of the lecture slides covering this topic.
```{r echo = FALSE, message = FALSE, warning = FALSE}
library(dplyr)
```
## Overview of R packages
### What is an R package?
- From [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html): "A directory of files which extend R".
- Files bundled together using `tar` and compressed using `gzip`. The file extension is `.tar.gz`. These are the *source files* for the package, which then must be installed from this source code locally prior to use.
- Sometimes also called an *extension* of R.
Example R package:
```{r echo = FALSE, fig.align = "center", out.width = "\\textwidth"}
knitr::include_graphics("figures/weathermetrics_package.png")
```
You can also have "binary packages" for a certain operating system. From [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html):
> A binary package is "a zip file or tarball containing the files of an installed package which can be unpacked rather than installing from sources."
Consider developing software when:
- You have developed a new method you want to share
- You have data you'd like to make publicly available
- You find yourself doing the same task repeatedly
Why create an R package?
- Share some functions broadly
- Share some functions with a small group
- Create a version of code for yourself that's more organized and easier to use
+ Includes documentation (vignettes, help files)
+ Function names linked to package namespace
+ Once library is installed, can load easily
### Example: NMMAPS package
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/NMMAPS.png")
```
Source: www.ihapss.jhsph.edu
Contents of NMMAPS package:
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/NMMAPSPackageStructure.jpg")
```
Research impacts of NMMAPS package (*Source: Barnett, Huang, and Turner, "Benefits of Publicly Available Data", Epidemiology 2012*):
- As of November 2011, 67 publications had been published using this
data, with 1,781 citations to these papers
- Research using NMMAPS has been used by the US EPA in creating
regulatory impact statements for air pollution (particulates and
ozone)
- "Thanks to NMMAPS, there is probably no other country in the
world with a greater understanding of the health effects of air
pollution and heat waves in its population."
### Sharing an R package
If you want to share your R package, there are a number of ways you can do that:
- CRAN
- GitHub
- [Bioconductor](https://www.bioconductor.org): "Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data." (from the Bioconductor website.)
- Other repositories
+ Private(-ish) repositories: e.g., ROpenSci's repository (for more, see https://ropensci.org/blog/blog/2015/08/04/a-drat-repository-for-ropensci)
+ `drat` repository: Make your own R package repository, including through GitHub pages.
- Compressed file: You can save a source tarball or binary package file with others without posting to a repository.
Sharing on CRAN:
- Traditional way to share an R package widely
- Easiest way for others to get your package (`install.packages`)
- Some barriers:
+ Size constraint on packages (5 MB)
+ Must follow CRAN policies
+ All packages must pass a submission process. This is not a guarantee that a package does what it says, just a check that required files are where they should be and that the package more or less doesn't break things.
Sharing on GitHub:
GitHub is becoming more and more common as a place to share R packages, both development packages that eventually are posted to CRAN and packages that are never submitted to CRAN.
- No restrictions / submission requirements
- GitHub repository size restrictions (1 GB, no files over 100 MB) much larger than CRAN package size restrictions (5 MB)
- GitHub packages can be installed using `install_github` from the `devtools` package
+ Requires `devtools` package, which has some set-up requirements (`XCode` for Mac, `Rtools` for Windows)
- Packages on CRAN cannot depend on packages available only on GitHub
### Package names
The format requirements for a package name are, based on [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html):
> "This should contain only (ASCII) letters, numbers and dot, have at least two characters and start with a letter and not end in a dot."
Hadley Wickham's [additional guidelines](http://r-pkgs.had.co.nz/package.html):
- Make it easy to Google.
- Make it all uppercase or all lower case
- Base it on a word that's easy to remember, but then tweak the spelling to make it unique (and easier to Google).
- Abbreviate.
- Add an "r".
### Package maintainer
A package can have many authors, but only one maintainer. The maintainer is in charge of fixing any problems that come up with CRAN checks over time to keep the package on CRAN. The maintainer is also the person who will be emailed about bugs, etc., by other users.
The package can have other authors, as well as people in other roles (e.g., contributor). See the helpfile for the `person` function for more on the codes used for different roles.
### Find out more
To find out more about writing R packages, useful sources are:
- [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html): Guidelines for R packages from the R Core Team.
- [R Packages](http://r-pkgs.had.co.nz) by Hadley Wickham
- [R package development cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/06/devtools-cheatsheet.pdf)
## Basic example package: `weathermetrics`
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/weathermetrics_cran.png")
```
The key functions in this package are:
- `convert_temperature`: Convert between temperature metrics
- `convert_precip`: Convert between precipitation metrics
- `convert_wind_speed`: Convert between wind speed metrics
- `heat.index`: Calculates heat index from air temperature and a measure of air moisture (dew point temperature or relative humidity)
### Contents of weathermetrics package
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/WeathermetricsPackageStructure.pdf")
```
Equation to convert from Celsius to Fahrenheit:
$$
T_F = \frac{9}{5} T_C + 32
$$
Here is the associated R function:
```{r echo = 2}
library(weathermetrics)
celsius.to.fahrenheit
```
Here is part of the help file for this function:
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/convert_temperature_helpfile.png")
```
Here is the start of the function to calculate the heat index:
```{r}
head(heat.index.algorithm, 10)
```
Here is an example of using this function:
```{r}
data(suffolk)
suffolk %>%
mutate(heat_index = heat.index(t = TemperatureF,
rh = Relative.Humidity)) %>%
slice(1:5)
```
## Elements of an R package
### Basic elements
Things you edit directly:
- DESCRIPTION file: The package's "Title page". Metadata on the package, including names and contacts of authors, package name, and description. This file also lists all the package *dependencies* (other packages with functions this package uses).
- `R` folder: R code defining functions in the package. All code is included in one or more R scripts. If you use Roxygen for help with documentation, all of that is also included in these files.
Things that are automatically written:
- `man` folder: Help documentation for each function. This files are automatically rendered if you use Roxygen.
- NAMESPACE file: Helps R find functions in your package you want others to use.
### DESCRIPTION file
Required elements:
- Package: Name of the package
- Version: Number of the current version of the package (e.g., 0.1.0)
- Title: Short title for the package, in title case and in 65 characters or less.
- Author and Maintainer (these two sections can be replaced with Authors@R section that uses the `person` function)
- Description: Paragraph describing the package
- License: Name of the license the package is under. If necessary, you can also refer to a LICENSE file included as another file in the package. Only some licenses are easily accepted by CRAN.
Other elements that are common but not required:
- Date: Release date of this version of the package.
- Imports: A list of the packages on which this package depends: other packages with functions used by the code in this package.
- URL: If there is a webpage associated with the package, the address for it. Often, this is the web address of the package's GitHub repository.
- BugReports: Where users can submit problems they've had. Often, the web address of the "Issues" page of the GitHub repository for the package.
```
Package: weathermetrics
Type: Package
Title: Functions to Convert Between Weather Metrics
Version: 1.2.2
Date: 2016-05-19
Authors@R: c(person("Brooke", "Anderson",
email = "[email protected]",
role = c("aut", "cre")),
person("Roger", "Peng",
email = "[email protected]", role = c("aut")),
person("Joshua", "Ferreri",
email = "[email protected]", role = c("aut")))
Description: Functions to convert between weather metrics,
including conversions for metrics of temperature, air
moisture, wind speed, and precipitation. This package also
includes functions to calculate the heat index from
air temperature and air moisture.
URL: https://github.com/geanders/weathermetrics/
BugReports: https://github.com/geanders/weathermetrics/issues
License: GPL-2
LazyData: true
RoxygenNote: 5.0.1
Depends:
R (>= 2.10)
Suggests: knitr,
rmarkdown
VignetteBuilder: knitr
```
### R folder
The `R` folder of the package includes:
- R scripts with code defining all functions for the package
- Help documentation for each function (if using Roxygen)
- Help documentation for the package data in "data.R"
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/weathermetrics_package.png")
```
You define functions in the R scripts just as you would anytime you want to define a function in R. For example, "temperature_conversions.R" includes the following code to define converting from Celsius to Fahrenheit:
```{r eval = FALSE}
celsius.to.fahrenheit <- function (T.celsius, round = 2) {
T.fahrenheit <- (9/5) * T.celsius + 32
T.fahrenheit <- round(T.fahrenheit, digits = round)
return(T.fahrenheit)
}
```
Only exception: use `package::function` syntax to call functions from other packages (e.g., `dplyr::mutate()`).
Using `roxygen2`, you put all information for the help files directly into a special type of code comments right before defining the function.
- Start each line with `#'`.
- To render into help files, use the `document` function from the `devtools` package.
- This will write out help files in the `man` folder of the package.
- Use these comments to specify which functions should be *exported* from the package using the `@export` tag. This information will be used to render the NAMESPACE file for the package.
```
#' Convert from Celsius to Fahrenheit.
#'
#' \code{celsius.to.fahrenheit} creates a numeric vector of
#' temperatures in Fahrenheit from a numeric vector of
#' temperatures in Celsius.
#'
#' @param T.celsius Numeric vector of temperatures in Celsius.
#' @inheritParams convert_temperature
#'
#' @return A numeric vector of temperature values in Fahrenheit.
#'
#' @note Equations are from the source code for the US National
#' Weather Service's
#' \href{http://www.wpc.ncep.noaa.gov/html/heatindex.shtml}
#' {online heat index calculator}.
#' @author
#' Brooke Anderson \email{brooke.anderson@@colostate.edu},
#' Roger Peng \email{rdpeng@@gmail.com}
#'
#' @seealso \code{\link{fahrenheit.to.celsius}}
#'
#' @examples # Convert from Celsius to Fahrenheit.
#' data(lyon)
#' lyon$TemperatureF <- celsius.to.fahrenheit(lyon$TemperatureC)
#' lyon
#'
#' @export
```
Some of the most common tags you'll use for `roxygen2` are:
- `@param`: Use to explain parameters for the function.
- `@inheritParam`: If you have already explained a parameter for the help file
for a different function, you can use this tag to use the same
definition for this function.
- `@return`: Explanation of the object returned by the function.
- `@examples`: One or more examples of using the function.
- `@export`: Export the function, so it's available when users load the package.
By default, the first line in the `roxygen2` comments is the function title and the next section is the function description. For more on `roxygen2`, see: https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html
Once you run `document`, this is all rendered as a help file. Now, when you run `?celsius.to.fahrenheit`, you'll get:
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/c_to_f_helpfile.png")
```
The start of the NAMESPACE file will be automatically written when you run `document` and will look like:
```
# Generated by roxygen2: do not edit by hand
export(celsius.to.fahrenheit)
export(celsius.to.kelvin)
export(convert_precip)
export(convert_temperature)
export(convert_wind_speed)
export(dewpoint.to.humidity)
```
If you are automating helpfile documentation, you must also include an R script with the doumentation for each data set that comes with the package.
This file will include `roxygen2` documentation for each data set, followed by the name of the dataset in quotation marks.
As an example, the next slide has the documentation in the "data.R" file for the "lyon" data set.
```
#' Weather in Lyon, France
#'
#' Daily values of mean temperature (Celsius) and mean dew
#' point temperature (Celsius) for the week of June 18, 2000,
#' in Lyon, France.
#'
#' @source \href{http://www.wunderground.com/}
#' {Weather Underground}
#'
#' @format A data frame with columns:
#' \describe{
#' \item{Date}{Date of weather observation}
#' \item{TemperatureC}{Daily mean temperature in Celsius}
#' \item{DewpointC}{Daily mean dewpoint temperature in
#' Celsius}
#' }
"lyon"
```
### Other common elements
Some other elements, while not required, are common in many R packages:
- `data` folder: R objects with data that goes with the package. Often, these are small-ish data files for examples of how to use package functions. However, more "scientific" packages may include more substantive data in this folder. Some packages are created solely to deliver data.
- `vignettes` folder: One or more tutorials on why the package was created and how to use it. These can be written in RMarkdown.
- NEWS file: Information about changes in later versions of the package.
- .Rbuildignore file: Lists files and directories that should not be included in the package build
- LICENSE file: With certain licenses (MIT is a common example), you need a separate LICENSE file, to supplement the license information in the DESCRIPTION file.
### Less common elements
- `src` folder: Sources and headers for compiled code (e.g., C++).
- `demo` folder: R scripts that give demonstrations of using the package.
- `tests` folder: Test code for the package. Currently, the best way to create tests for a package are with the `testthat` package.
- `inst` folder: Various and sundries, including a CITATION file to tell others how to cite your package and executable scripts not in R (e.g., shell scripts, Perl or Python code).
## Creating an R package
Invaluable tools when creating an R package:
- The `devtools` package: Various utility functions that help you develop an R package.
- *R Packages* by Hadley Wickham. Available from O'Reilly or free online at http://r-pkgs.had.co.nz
- GitHub: When in doubt of how to structure something, look for examples in code for other R packages. GitHub is currently the easiest way to browse through the code for many R packages.
### Initializing an R package
The easiest way to start a new R package project is through R Studio. Go to "File" -> "New Project" -> "Empty Directory". One of the options is "R Package".
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/initialize_r_package.png")
```
You'll need to specify where you want to save the directory and the package name. You can also select if you'd like to use git (you'll still need to set-up and sync with GitHub if you want to post the package to GitHub).
```{r echo = FALSE, out.width = "0.6\\textwidth", fig.align = "center"}
knitr::include_graphics("figures/package_setup.png")
```
Once you choose this, R Studio will create a new "skeleton" directory for you, with some of the default files and directories you need (kind of like how it starts with a template for RMarkdown documents). You can add and edit files within this structure to create your package.
```{r echo = FALSE, fig.align = "center"}
knitr::include_graphics("figures/example_package_skeleton.png")
```
### Working on an R package
Once you set-up the package, most of your work will be in writing the code for the package's functions and creating documentation. The `devtools` package has some functions that are very useful for this process:
- `load_all`: Loads the last saved version of all functions in the package. You can use this to change and check functions without rebuilding the whole package and restarting R each time.
- `document`: Parse all `roxygen2` comments to create the helpfiles in the `man` directory and the NAMESPACE file. As soon as you've loaded an documented the last saved version of your package, you can access the help file for each function using `?`, as with other R functions.
- Control-. : This is a keyboard shortcut rather than a function, but it allows you to search the package for the code where a certain function is defined. As a package grows larger, this functionality is very useful for navigating the R code in the package.
The `devtools` package also has some functions that set up useful infrastructure for the package. For example, if you want to include a vignette written in RMarkdown, you need to do a few things:
1. Add a new folder called `vignettes`.
2. Add `inst/doc` to the `.gitignore` file. (The built pdf is written into this folder, but typically you don't want to include rendered files in git, just the code with which they were generated.)
3. Make a few changes to the DESCRIPTION file.
Rather than having to remember how to do all this yourself, you can use the `use_vignette` function, which adds this infrastructure to the package at once.
Typically, you will only use these infrastructure calls once per package. Other useful infrastructure functions are:
- `use_cran_comments`: Add a text file with comments for the people who check the package when it's submitted to CRAN.
- `use_readme_rmd`: Create an RMarkdown "README" file that you can use to provide information on the package (similar to the vignette, but this will show up on the first page of the GitHub repo if you have one for the package).
- `use_news_md`: Create a text file to provide details of changes in later package versions.
- `use_travis`: Add the infrastructure needed to check the package on Travis when you push to GitHub.
- `use_rcpp`: Add an `src` directory and other infrastructure needed to use C++ code within the package.
- `use_testthat`: Add infrastructure for using package tests based on `testthat`.
There are a few infrastructure-type functions you might use more often:
- `use_data`: Save data currently in an R object in your working session to use as data within the package. This function saves that data as an `.rda` file in the `data` folder.
- `use_build_ignore`: Add a file or files to the ".Rbuildignore" file, so they won't cause an error with CRAN checks (one of the checks is that there aren't any unrecognized files or directories in the top level of the package).
- `use_package`: Add a package that your package depends on to the DESCRIPTION file.
### Finalizing an R package
Once you have included all the functions and documentation for a package, there are a few more steps before that version is ready to be shared:
- Create a vignette and / or README file to explain how others can use the package.
- Run the package through CRAN checks and resolve all ERRORS, WARNINGS, and NOTES. This is required if you are submitting to CRAN. It's usually a good idea and improves the package even if you're not.
- Change the version number to a stable version (typically, development versions end in .9000, like 0.0.0.9000). When you have a stable version of the package, you'll change this to a three-part number (e.g., 0.1.0).
- Build the package locally. For this, you can use the "Build" tab in the upper right RStudio tab (it will show up once you have a package project open).
- Build the package on other systems. You can use Travis (Unix / Linux) and `build_win` (Windows) to do this for those systems.
- Create a pdf of all help files to proofread. To do this, open a bash shell in the parent directory of the package and run `R CMD Rd2pdf <packagename>` (for example, if the package were `csupackage`, you'd run `R CMD Rd2pdf csupackage`). This will create a pdf in that directory with all helpfiles for all functions.
- If you want the package to be on CRAN, submit to CRAN. There is a function called `submit_cran` that will build the package and submit it to CRAN.