forked from batpigandme/tidynomicon
-
Notifications
You must be signed in to change notification settings - Fork 1
/
python.Rmd
149 lines (113 loc) · 3.59 KB
/
python.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# Using Python {#python}
```{r setup, include=FALSE}
source("etc/common.R")
```
## Questions
```{r, child="questions/reticulate.md"}
```
## Learning Objectives
```{r, child="objectives/reticulate.md"}
```
As the previous lessons have shown,
you can do a lot with R,
but sometimes you might feel a cold, serpentine tug on your soul
pulling you back to Python.
You can put Python code in RMarkdown documents:
```{python}
print("Hello R")
```
but how can those chunks interact with your R and vice versa?
The answer is a package called [reticulate][reticulate] that provides two-way communication between Python and R.
To use it,
run `install.packages("reticulate")`.
By default,
it uses the system-default Python:
```{r}
Sys.which("python")
```
but you can configure it to use different versions,
or to use `virtualenv` or a Conda environment---see [the document][reticulate-configure] for details.
## How can I access data across languages?
The most common way to use reticulate is to do some calculations in Python and then use the results in R
or vice versa.
To show how this works,
let's read our infant HIV data into a Pandas data frame:
```{python}
import pandas
data = pandas.read_csv('results/infant_hiv.csv')
print(data.head())
```
All of our Python variables are available in our R session as part of the `py` object,
so `py$data` is our data frame inside a chunk of R code:
```{r}
library(reticulate)
head(py$data)
```
reticulate handles type conversions automatically,
though there are a few tricky cases:
for example,
the number `9` is a float in R,
so if you want an integer in Python,
you have to add the trailing `L` (for "long") and write it `9L`.
On the other hand,
reticulate translates between 0-based and 1-based indexing.
Suppose we create a character vector in R:
```{r}
elements = c('hydrogen', 'helium', 'lithium', 'beryllium')
```
Hydrogen is in position 1 in R:
```{r}
elements[1]
```
but position 0 in Python:
```{python}
print(r.elements[0])
```
Note our use of the object `r` in our Python code:
just `py$whatever` gives us access to Python objects in R,
`r.whatever` gives us access to R objects in Python.
## How can I call functions across languages?
We don't have to run Python code,
store values in a variable,
and then access that variable from R:
we can call the Python directly (or vice versa).
For example,
we can use Python's random number generator in R as follows:
```{r}
pyrand <- import("random")
pyrand$gauss(0, 1)
```
(There's no reason to do this---R's random number generator is just as strong---but it illustrates the point.)
We can also source Python scripts.
For example,
suppose that `countries.py` contains this function:
```{python code=readLines('countries.py'), eval=FALSE}
```
We can run that script using `source_python`:
```{r}
source_python('countries.py')
```
There is no output because all the script did was define a function.
By default,
that function and all other top-level variables defined in the script are now available in R:
```{r}
get_countries('results/infant_hiv.csv')
```
There is one small pothole in this.
When the script is run,
the special Python variable `__name__` is set to `'__main__'"'`,
i.e.,
the script thinks it is being called from the command line.
If it includes a conditional block to handle command-line arguments like this:
```{python eval=FALSE}
if __name__ == '__main__':
input_file, output_files = sys.argv[1], sys.argv[2:]
main(input_file, output_files)
```
then that block will be executed,
but will fail because `sys.argv` won't include anything.
## Key Points
```{r, child="keypoints/reticulate.md"}
```
```{r, child="etc/links.md"}
```