-
Notifications
You must be signed in to change notification settings - Fork 2
/
03_piping.Rmd
84 lines (54 loc) · 2.88 KB
/
03_piping.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: "Piping"
output: github_document
---
```{r, results='hide', echo = FALSE, warning=FALSE, message=FALSE}
library(tidyverse)
```
```{r, results='hide', echo = FALSE, warning=FALSE, message=FALSE}
spotify <- read_csv("data/spotify.csv")
```
[<<< Previous](02_isolating-data.md) | [Next >>>](04_summarizing-data.md)
-----
## %>%
### Steps
Notice how each dplyr function takes a data frame as input and returns a data frame as output. This makes the functions easy to use in a step by step fashion. For example, you could:
1. Filter `spotify` to just Rap songs, then
2. Select the `loudness` and `energy` columns from the result
```{r echo = TRUE}
rap <- filter(spotify, genre == "Rap")
rap <- select(rap, loudness, energy)
rap
```
### Redundancy
The result shows us the loudest rap songs and their energy levels in our data set. But take a look at the code. Do you notice how we re-create `rap` at each step so we will have something to pass to the next step? This is an inefficient way to write R code.
You could avoid creating `rap` by nesting your functions inside of each other, but this creates code that is hard to read:
```{r echo = TRUE, eval = FALSE}
select(filter(spotify, genre == "Rap"), loudness, energy)
```
The dplyr package provides a third way to write sequences of functions: the pipe.
### %>%
![Source: dplyr cheatsheet][5]
[5]: images/pipes.png
The pipe operator `%>%` performs an extremely simple task: it passes the result on its left into the first argument of the function on its right. Or put another way, `x %>% f(y)` is the same as `f(x, y)`. This piece of code punctuation makes it easy to write and read series of functions that are applied in a step by step way. For example, we can use the pipe to rewrite our code above:
```{r echo = TRUE}
spotify %>%
filter(genre == "Rap") %>%
select(loudness, energy)
```
As you read the code, pronounce `%>%` as "then". You'll notice that dplyr makes it easy to read pipes. Each function name is a verb, so our code resembles the statement, "Take spotify, _then_ filter it by genre, _then_ select the loudness and energy."
dplyr also makes it easy to write pipes. Each dplyr function returns a data frame that can be piped into another dplyr function, which will accept the data frame as its first argument. In fact, dplyr functions are written with pipes in mind: each function does one simple task. dplyr expects you to use pipes to combine these simple tasks to produce sophisticated results.
**Exercise 1**
We'll use pipes for the remainder of the tutorial. Let's practice a little by writing a new pipe. The pipe should:
1. Filter spotify to just the songs that are above 0.50 danceability
2. Select the `tempo` and `energy` columns
-----
## Answers
**Exercise 1**
```{r}
spotify %>%
filter(danceability > 0.50) %>%
select(tempo, energy)
```
-----
[<<< Previous](02_isolating-data.md) | [Next >>>](04_summarizing-data.md)