forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Reproducibility.rmd
69 lines (47 loc) · 2.64 KB
/
Reproducibility.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
title: Reproducibility
layout: default
---
# How to write a reproducible example.
You are most likely to get good help with your R problem if you provide a reproducible example. A reproducible example allows someone else to recreate your problem by just copying and pasting R code.
There are four things you need to include to make your example reproducible: required packages, data, code, and a description of your R environment.
* **Packages** should be loaded at the top of the script, so it's easy to
see which ones the example needs.
* The easiest way to include **data** in an email is to use dput() to generate
the R code to recreate it. For example, to recreate the mtcars dataset in R,
I'd perform the following steps:
1. Run `dput(mtcars)` in R
2. Copy the output
3. In my reproducible script, type `mtcars <- ` then paste.
* Spend a little bit of time ensuring that your **code** is easy for others to
read:
* make sure you've used spaces and your variable names are concise, but
informative
* use comments to indicate where your problem lies
* do your best to remove everything that is not related to the problem.
The shorter your code is, the easier it is to understand.
* Include the output of sessionInfo() as a comment. This summarises your **R
environment** and makes it easy to check if you're using an out-of-date
package.
You can check you have actually made a reproducible example by starting up a fresh R session and pasting your script in.
Before putting all of your code in an email, consider putting it on http://gist.github.com/. It will give your code nice syntax highlighting, and you don't have to worry about anything getting mangled by the email system.
## Example
Here's an illustration of how to create a reproducible example. First, have R print out your data in a format that can be copy-pasted:
```R
# For this example, use the built-in BOD data set. Replace this with your data.
dput(BOD)
# structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3,
# 19, 16, 15.6, 19.8)), .Names = c("Time", "demand"), row.names = c(NA,
# -6L), class = "data.frame", reference = "A1.4, p. 270")
```
Then you can use that output to create a reproducible example:
```R
library(ggplot2)
# Save the data structure in variable BOD
BOD <- structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3,
19, 16, 15.6, 19.8)), .Names = c("Time", "demand"), row.names = c(NA,
-6L), class = "data.frame", reference = "A1.4, p. 270")
# Some example code that uses the data
ggplot(BOD, aes(x=Time, y=demand)) + geom_line()
```
Check that others can run this code by simply copying and pasting it in a **new** R sesion.