forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.rmd
206 lines (146 loc) · 14.6 KB
/
intro.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
---
layout: default
title: Introduction
output: oldbookdown::html_chapter
---
# Introduction {#intro-why}
In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. As of January 2015, there were over 6,000 packages available on the **C**omprehensive **R** **A**rchive **N**etwork, or CRAN, the public clearing house for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem that you're working on, and you can benefit from their work by downloading their package.
If you're reading this book, you already know how to use packages:
* You install them from CRAN with `install.packages("x")`.
* You use them in R with `library("x")`.
* You get help on them with `package?x` and `help(package = "x")`.
The goal of this book is to teach you how to develop packages so that you can write your own, not just use other people's. Why write a package? One compelling reason is that you have code that you want to share with others. Bundling your code into a package makes it easy for other people to use it, because like you, they already know how to use packages. If your code is in a package, any R user can easily download it, install it and learn how to use it.
But packages are useful even if you never share your code. As Hilary Parker says in her [introduction to packages](http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/): "Seriously, it doesn't have to be about sharing your code (although that is an added benefit!). It is about saving yourself time." Organising code in a package makes your life easier because packages come with conventions. For example, you put R code in `R/`, you put tests in `tests/` and you put data in `data/`. These conventions are helpful because:
* They save you time --- you don't need to think about the best way to organise
a project, you can just follow a template.
* Standardised conventions lead to standardised tools --- if you buy into
R's package conventions, you get many tools for free.
It's even possible to use packages to structure your data analyses, as Robert M Flight discusses in a [series of blog posts](http://rmflight.github.io/posts/2014/07/analyses_as_packages.html).
## Philosophy {#intro-phil}
This book espouses my philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure.
This philosophy is realised primarily through the devtools package, a suite of R functions that I wrote to automate common development tasks. The goal of devtools is to make package development as painless as possible. It does this by encapsulating all of the best practices of package development that I've learned over the years. Devtools protects you from many potential mistakes, so you can focus on the problem you're interested in, not on developing a package.
Devtools works hand-in-hand with RStudio, which I believe is the best development environment for most R users. The only real competitor is [ESS](http://ess.r-project.org/), emacs speaks statistics, which is a rewarding environment if you're willing to put in the time to learn emacs and customise it to your needs. The history of ESS stretches back over 20 years (predating R!), but it's still actively developed and many of the workflows described in this book are also available there.
Together, devtools and RStudio insulate you from the low-level details of how packages are built. As you start to develop more packages, I highly recommend that you learn more about those details. The best resource for the official details of package development is always the official [writing R extensions][r-ext] manual. However, this manual can be hard to understand if you're not already familiar with the basics of packages. It's also exhaustive, covering every possible package component, rather than focussing on the most common and useful components, as this book does. Writing R extensions is a useful resource once you've mastered the basics and want to learn what's going on under the hood.
## In this book {#intro-outline}
You'll start by learning about the basic structure of a package, and the forms it can take, in [R packages](#package-structure). Then each of the next ten chapters of the book goes into more details about each component. They're roughly organised in order of importance:
* [R code](#r): the most important directory is `R/`, where your R code lives.
A package with just this directory is still a useful package. (And indeed,
if you stop reading the book after this chapter, you'll have still learned
some useful new skills.)
* [Package metadata](#description): the `DESCRIPTION` lets you describe
what your package needs to work. If you're sharing your package, you'll also
use the `DESCRIPTION` to describe what it does, who can use it
(the license) and who to contact if things go wrong.
* [Documentation](#man): if you want other people (including future-you!) to
understand how to use the functions in your package, you'll need to document
them. I'll show you how to use roxygen2 to document your functions. I
recommend roxygen2 because it lets you write code and documentation together
while continuing to produce R's standard documentation format.
* [Vignettes](#vignettes): function documentation describes the nit-picky
details of every function in your package. Vignettes give the big picture.
They're long-form documents that show how to combine multiple parts of your
package to solve real problems. I'll show you how to use Rmarkdown and knitr
to create vignettes with a minimum of fuss.
* [Tests](#tests): to ensure your package works as designed (and continues to
work as you make changes), it's essential to write unit tests which define
correct behaviour, and alert you when functions break. In this chapter, I'll
teach you how to use the testthat package to convert the informal
interactive tests that you're already doing to formal, automated tests.
* [Namespace](#namespace): to play nicely with others, your package needs
to define what functions it makes available to other packages and what
functions it requires from other packages. This is the job of the `NAMESPACE`
file and I'll show you how to use roxygen2 to generate it for you.
The `NAMESPACE` is one of the more challenging parts of developing an R
package but it's critical to master if you want your package to work reliably.
* [External data](#data): the `data/` directory allows you to include
data with your package. You might do this to bundle data
in a way that's easy for R users to access, or just to provide compelling
examples in your documentation.
* [Compiled code](#src): R code is designed for human efficiency, not computer
efficiency, so it's useful to have a tool in your back pocket that allows you
to write fast code. The `src/` directory allows you to include speedy
compiled C and C++ code to solve performance bottlenecks in your package.
* [Other components](#misc): this chapter documents the handful of other
components that are rarely needed: `demo/`, `exec/`, `po/` and `tools/`.
The final three chapters describe general best practices not specifically tied to one directory:
* [Git and github](#git): mastering a version control system is vital to easily
collaborate with others, and is useful even for solo work because it allows
you to easily undo mistakes. In this chapter, you'll learn how to use the
popular git and GitHub combo with RStudio.
* [Automated checking](#check): R provides very useful automated quality checks
in the form of `R CMD check`. Running them regularly is a great way to avoid
many common mistakes. The results can sometimes be a bit cryptic, so I
provide a comprehensive cheatsheet to help you convert warnings to
actionable insight.
* [Release](#release): the life-cycle of a package culminates with release
to the public. This chapter compares the two main options (CRAN and github)
and offers general advice on managing the process.
This is a lot to learn, but don't feel overwhelmed. Start with a minimal subset of useful features (e.g. just an `R/` directory!) and build up over time. To paraphrase the Zen monk Shunryu Suzuki: "Each package is perfect the way it is --- and it can use a little improvement".
## Getting started {#intro-get}
To get started, make sure you have the latest version of R (at least `r paste0(version$major, ".", version$minor)`, which is the version that the code in this book uses), then run the following code to get the packages you'll need:
```{r, eval = FALSE}
install.packages(c("devtools", "roxygen2", "testthat", "knitr"))
```
Make sure you have a recent version of RStudio. You can check you have the right version by running:
```{r, eval = FALSE}
install.packages("rstudioapi")
rstudioapi::isAvailable("0.99.149")
```
If not, you may need to install the preview version from <http://www.rstudio.com/products/rstudio/download/preview/>. This gives you access to the latest and greatest features, and only slightly increases your chances of finding a bug.
Use the following code to access new devtools functions as I develop them. This is particularly important during the development of the book.
```{r, eval = FALSE}
devtools::install_github("hadley/devtools")
```
You'll also need a C compiler and a few other command line tools. If you're on Windows or Mac and you don't already have them, RStudio will install them for you. Otherwise:
* On Windows, download and install [Rtools](http://cran.r-project.org/bin/windows/Rtools/).
NB: this is not an R package!
* On Mac, make sure you have either XCode (available for free in the App Store)
or the ["Command Line Tools for Xcode"](http://developer.apple.com/downloads).
You'll need to have a (free) Apple ID.
* On Linux, make sure you've installed not only R, but also the R development
tools. For example, on Ubuntu (and Debian) you need to install the
`r-base-dev` package.
You can check that you have everything installed and working by running the following code:
```{r, eval = FALSE}
library(devtools)
has_devel()
#> '/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD SHLIB foo.c
#>
#> clang -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG
#> -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include
#> -fPIC -Wall -mtune=core2 -g -O2 -c foo.c -o foo.o
#> clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup
#> -single_module -multiply_defined suppress -L/usr/local/lib -o foo.so foo.o
#> -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework
#> -Wl,CoreFoundation
[1] TRUE
```
This will print out some code which I use to help diagnose problems. If everything is ok, it will return `TRUE`. Otherwise, it will throw an error and you'll need to investigate what the problem is.
## Acknowledgments {#intro-ack}
The tools in this book wouldn't be possible without many open source contributors. [Winston Chang](https://github.com/wch/), my co-author on devtools, spent hours debugging painful S4 and compiled code problems so that devtools can quickly reload code for the vast majority of packages. [Kirill Müller](https://github.com/krlmlr) contributed great patches to many of my package development packages including devtools, testthat, and roxygen2. [Kevin Ushey](http://github.com/kevinushey), [JJ Allaire](https://github.com/jjallaire) and [Dirk Eddelbuettel](http://dirk.eddelbuettel.com) tirelessly answered all my basic C, C++ and Rcpp questions. [Peter Danenburg](https://github.com/klutometis) and [Manuel Eugster](http://www.statistik.lmu.de/~eugster/) wrote the first version of roxygen2 during a Google Summer of Code. [Craig Citro](https://github.com/craigcitro) wrote much of the code to allow travis to work with R packages.
Often the only way I learn how to do it the right way is by doing it the wrong way first. For suffering many of package development errors, I'd like to thank all the CRAN maintainers, especially Brian Ripley, Uwe Ligges and Kurt Hornik.
This book was [written in the open](https://github.com/hadley/r-pkgs/) and it is truly a community effort: many people read drafts, fixed typos, suggested improvements, and contributed content. Without those contributors, the book wouldn't be nearly as good as it is, and I'm deeply grateful for their help. A special thanks goes to Peter Li, who read the book from cover-to-cover and provided many fixes. I also deeply appreciate the time the reviewers ([Duncan Murdoch](http://www.stats.uwo.ca/faculty/murdoch/), [Karthik Ram](http://karthik.io), [Vitalie Spinu](http://vitalie.spinu.info) and [Ramnath Vaidyanathan](https://ramnathv.github.io)) spent reading the book and giving me thorough feedback.
```{r, results = "asis", echo = FALSE, eval = TRUE}
# git --no-pager shortlog -ns > contribs.txt
contribs <- read.delim("contribs.txt", header = FALSE,
stringsAsFactors = FALSE)[-1, ]
names(contribs) <- c("n", "name")
contribs <- contribs[order(contribs$name), ]
contribs$uname <- ifelse(!grepl(" ", contribs$name),
paste0("@", contribs$name), contribs$name)
cat("Thanks go to all contributors who submitted improvements via github (in alphabetical order): ")
cat(paste0(contribs$uname, collapse = ", "))
cat(".\n")
```
## Conventions {#intro-conventions}
Throughout this book, I write `foo()` to refer to functions, `bar` to refer to variables and function parameters, and `baz/` to paths.
Larger code blocks intermingle input and output. Output is commented so that if you have an electronic version of the book, e.g., <http://r-pkgs.had.co.nz>, you can easily copy and paste examples into R. Output comments look like `#>` to distinguish them from regular comments.
## Colophon {#intro-colophon}
This book was written in [Rmarkdown](http://rmarkdown.rstudio.com/) inside [RStudio](http://www.rstudio.com/ide/). [knitr](http://yihui.name/knitr/) and [pandoc](http://johnmacfarlane.net/pandoc/) converted the raw Rmarkdown to html and pdf. The [website](http://r-pkgs.had.co.nz) was made with [jekyll](http://jekyllrb.com/), styled with [bootstrap](http://getbootstrap.com/), and automatically published to Amazon's [S3](http://aws.amazon.com/s3/) by [travis-ci](https://travis-ci.org/). The complete source is available from [github](https://github.com/hadley/r-pkgs).
This version of the book was built with:
```{r}
library(roxygen2)
library(testthat)
devtools::session_info()
```
[r-ext]:http://cran.r-project.org/doc/manuals/R-exts.html#Creating-R-packages