-
Notifications
You must be signed in to change notification settings - Fork 0
/
exercises.Rmd
592 lines (394 loc) · 15.1 KB
/
exercises.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
---
title: "ACOM - ACCORD Fire Workshop 2017"
author: "Nicholas Good - [email protected]"
output:
rmarkdown::html_document:
toc: true
toc_float: true
theme: yeti
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, results = 'hide', eval = FALSE)
```
---
# Getting started
## Exercises
Download the .html [exercises](https://tinyurl.com/R-for-fire-data) file.
## R projects
You can create an *R Studio* project to organize your work. An **R Studio** project has its own working directory, workspace, history, and source files.
* To create an **R Studio** project navigate to: `File -> New Project -> Create Project` (giving the new folder a name via the dialog).
* Select either `New directory -> Empty Project` or `Existing Directory -> Create Project` (selecting an existing folder via the dialog)
## R packages
The base functionality of R is expanded upon by installing additional packages. In this section we'll install all the packages we're going to use today. You can install packages using the `install.packages()` function or via the **R Studio** user interface (`tools -> install packages`).
* The [rmarkdown](http://rmarkdown.rstudio.com) package allows you to combine: code, its output and text into a single document in **R Studio** (like the one you are reading now). **R Markdown** supports R and other languages (e.g. Python and SQL). You can output in numerous formats such as *html*, *word* and *pdf*. We'll output to **html**.
```{r,}
install.packages("rmarkdown")
install.packages("knitr")
```
* The [tidyverse](http://tidyverse.org) is a collection of [data science](http://r4ds.had.co.nz) focused packages underpinned by common (and evolving) approaches to data manipulation.
* The `install.packages("tidyverse")` command will install the core *tidyverse* packages
```{r,}
install.packages("tidyverse")
```
* Some additional tidyverse packages we'll use are `magrittr` that adds operators to make code more readable and `lubridate` for date-time manipulation.
```{r,}
install.packages("magrittr")
install.packages("lubridate")
```
* The `tidyverse` includes the powerful `ggplot2` library. For spatial analysis we'll include install the `ggmap` and `plotly` packages. We'll also install the `ggthemes` package.
```{r,}
install.packages("ggmap")
install.packages("plotly")
install.packages("ggthemes")
```
---
## Data
We're going to look at some data from the [Arctas](https://www-air.larc.nasa.gov/cgi-bin/arcstat-c) project. You'll need to download the [dataset](https://tinyurl.com/R-for-fire-data). Navigate to `data -> arctas` download the data and save it to a **folder inside your project folder** `project folder/arctas/`.
---
## Source file
To speed things up we'll use some code I've already written. You'll need to download the [scripts folder](https://tinyurl.com/R-for-fire-data). Download the folder and save it in your **project folder**.
---
# R Markdown
## Resources
* R Studio's [r markdown website](http://rmarkdown.rstudio.com/lesson-1.html)
* The [r markdown cheatsheet](https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)
* The [r markdown reference](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf)
---
## Creating a markdown document
* Navigate to `File -> New File -> R Markdown -> ok` making sure the `document` and `html` options are selected. You'll also want to give your project a name.
* Save your **R markdown** file in your project file (`File Save`).
* Press the **Knit** button, what happens?
* Take a moment to relate the text in your markdown document to the output.
* Remove all the code from the **R Markdown** heading down.
* Use the `#` symbol to create a new section header **Load Data**
* Press the **Knit** button again.
You'll use this file to document your code in the next steps.
---
## Code chunks
You can add a *chunk* of code to your *markdown* document using the *insert button*.
* Add several carriage returns below your title and press `insert -> R`.
* Copy the code below into the body of the *chunk*.
* Update the file path to work with your `working directory`. The `getwd()` function wil be useful.
* Use green arrow in the top right of the chunk box to run the code.
```{r, cache=TRUE, eval=TRUE}
library(tidyverse)
library(magrittr)
source("scripts/r_functions.R")
arctas_data <- lapply(list.files("../data/arctas",
pattern = "_R14.ict$",
full.names = TRUE),
read_ict_file) %>%
bind_rows() %>%
mutate_all(funs(na_if(., -999999999))) %>%
mutate_all(funs(na_if(., -888888888)))
```
* Take a look at your **Environment** tab, what do you see?
---
## Inline code
```{r, echo=FALSE, eval=TRUE}
sourcecode <- "\\`r code here\\`"
sourcecode2 <- "\\`r length(unique(arctas_data$flight))\\`"
```
You can write code that is evaluated inline with your text using `r sourcecode`. For example we can count the number flights in the data set using `r sourcecode2`. So that we print: ***There are `r length(unique(arctas_data$flight))` flights in the dataset***.
* Add a sentence to your markdown document that includes the number of flights in the data set.
---
## Appearance
The appearance of your **R Markdown** document can be formatted.
```
e.g. surround text with:
* to make italic
** to make bold
$ to add latex style equations
[text](url) to add a web link to your markdown document
```
* add some bold and italic text to your markdown document
* add a link to the class github page
---
## Tables
There are multiple packages for adding tables to *R Markdown* documents for example *knitr*.
* add the code chunk below to your document and alter the caption to something meaniningful.
```{r}
library(knitr)
kable(arctas_data[1:5,1:4], caption = "Table caption")
```
---
# Data manipulation
---
## Resources
* The [tidyverse website](http://tidyverse.org)
* The [R for data science book](http://r4ds.had.co.nz)
* The [Advanced R book](http://adv-r.had.co.nz)
---
## Packages
We'll use some of the **tidyverse** packages to manipulate our data:
You can include core **tidyverse** packages using:
```{r}
library(tidyverse)
library(magrittr)
```
Or you can include them explicitly:
```{r}
library(dplyr)
library(tidyr)
library(purrr)
```
---
## Viewing data
* You can view data in **R Studio** using the `View()` function.
* You can use the `glimpse()` function to print a condensed summary of your data.
* Data of class `tbl` display in a convenient manner when the object is passed to the console
```
* Use the function class(arctas_data) to dertermine the class of the arctas_data object
* Use View(arctas_data) to display the data in R Studio
* Use glimpse(arctas_data) to look at your data in the console
```
---
## Selecting data
Often you'll want to select a subset of your data. If your data is in a table you'll likely want to select by rows or columns. The `dplyr` package contains the tools you'll need to do this:
* the filter function allows you to select rows based on logical criteria for the values they contain
For example we can filter by column values:
```{r}
data_filtered <- arctas_data %>%
filter(!is.na(no)) %>%
filter(flight == 4) %>%
filter(no > quantile(no, 0.05), no < quantile(no, 0.95))
```
```
* what occurs at each filter step?
```
The `|` character is the logical `OR` operator e.g. `a == b | a == c`.
```
Use the OR operator to select flight numbers 7, 11 and 21 from the dataset
```
* the select function allows you to select based on properties of the column names
For example:
```{r}
data_select <- arctas_data %>%
select(flight, utc, no, noy, o3, no2_ncar, no2_ucb)
```
```
* try selecting a different set of columns
```
---
## Reshape data
Inevitably you will encounter data in inconvenient layouts. The `tidyr` package contains tools for reshaping your data.
* the `gather()` function converts data to a longer format
```
* Add the code chunk below to your markdown
```
```{r}
data_longer <- arctas_data %>%
select(flight, utc, no, noy, o3, no2_ncar, no2_ucb) %>%
gather("var", "val", 3:7)
```
```
* What is the %>% doing? Trying typing ?"%>%" into the console.
* Use the head() function to look at the first 5 lines of the data object
```
* the `spread()` function converts data to a wider format
```
* Add the code chunk below to your markdown
```
```{r}
data_wider <- data_longer %>%
spread(var, val)
```
---
## Combine data
Often data sets need to be combined. The `dplyr` package contains various functions function types for combining data.
* Perhaps the most simple case is adding rows or columns to a data set. You can use `bind_rows()` to add new rows and `bind_cols()` to add columns.
For example to combine data with the same column names:
```{r}
flight_3 <- arctas_data %>%
filter(flight == 3)
flight_4 <- arctas_data %>%
filter(flight == 4)
flight_3_4 <- bind_rows(flight_3, flight_4)
```
For example to combine data with the same number of rows:
```{r}
data_n <- arctas_data %>%
select(starts_with("n"))
data_a <- arctas_data %>%
select(starts_with("a"))
data_n_a <- bind_cols(data_n, data_a)
```
* To join data by a common variable or variables use a `_join` function.
For example some notes pertaining to given flights
```{r}
notes <- data_frame(flight = unique(arctas_data$flight)) %>%
mutate(note = sample(letters, 22))
data_with_note <- left_join(arctas_data, notes, by = "flight")
```
---
## New variables
The `mutate` functions in the `dplyr` can be used to create or modify columns in a data frame.
* The `mutate()` function can create a new variable like so:
* The `mutate_all()` function can be used to operate on all columns in a data frame.
* The `mutate_if()` function can be used to operate on columns that meet certain criteria.
---
## Grouping and summarizing data
The `group_by()` and `summarize` functions are useful for data reduction. Often you'll want to calculate statistics by groups within your data
* The `group_by()` function groups data by one or more variables. The data will look the same, but the results of subsequent operations on the data will be applied by groups.
* We can use `summarize()` to calculate statistics by group for a given column (variable)
* We can use `summarize_all()` to calculate the statistic for all the columns in a data frame.
* The data can be ungrouped using the `ungroup()` function.
```
* group the data by flight and calculate the mean and standard deviation of the ozone mixing ratio
```
```{r}
data_summary_o3 <- arctas_data %>%
group_by(flight) %>%
summarise(mean_o3 = mean(o3, na.rm = TRUE),
sd_o3 = sd(o3, na.rm = TRUE))
```
```
* You can operate on all the columns
```
```{r}
data_summary_all <- arctas_data %>%
group_by(flight) %>%
summarise_all(funs(mean_val = mean), na.rm = TRUE)
```
---
# Data visualization
We'll look at three related packages for visualizing data. The `ggplot2` package can be used to produce customized plots. The `ggmap` package extends `ggplot` for use with spatial data. The `plotly` package can be used to create interactive graphics.
* include the data visualization libraries in your markdown document:
```{r}
library(ggplot2)
library(ggmap)
library(plotly)
library(tidyverse)
library(ggthemes)
```
---
## ggplot2
As a minimum, each `ggplot` graphic requires at least three elements:
* data
* aesthetic
* layer (geom)
Let's start with a time series plot.
```{r basics,}
ggplot(data = filter(arctas_data,
flight == 10,
!is.na(o3)),
mapping = aes(x = utc, y = o3)) +
geom_point()
```
Let's plot multiple pollutants
* First convert the data to long format
```{r}
data_plot <- arctas_data %>%
filter(flight == 10,
!is.na(o3)) %>%
select(flight, utc, o3, no, no2_ncar, no3) %>%
gather("var", "val", 3:6)
```
* Now we can plot the data
```{r,}
ggplot(data_plot, aes(utc, val, color = var)) +
geom_point() +
scale_y_log10()
```
* All can control detailed aspects of you graphic's appearance using the `theme()` function. A quick way to change appearance is to use a theme function:
```
* try applying the following themes to your plot:
```
```{r, eval = FALSE}
+ theme_economist()
+ theme_excel() # as long as you promise never to use it again
+ theme_tufte()
```
* It can be useful to separate into small-multiples (facets)
```{r, fig.width= 12, fig.height=20}
data_plot <- arctas_data %>%
select(flight, utc, o3) %>%
filter(!is.na(o3))
ggplot(data_plot, aes(utc, o3)) +
geom_point() +
facet_wrap(~flight, ncol = 3)
```
---
## spatial data (ggmap)
The `ggmap` package builds on `ggplot` to map spatial data.
```{r}
library(ggmap)
library(ggplot2)
```
```
* start by plotting a map of your favorite location:
```
```{r, eval = FALSE}
qmap('favorite location', zoom = 13)
```
```
* now plot the arctas flight tracks on a world map
* prepare the data
* extract world map information
* use ggplot to create a plot object
* view the plot
```
* convert the longitude units:
```{r}
data <- mutate(arctas_data,
longitude = ((longitude + 180) %% 360) - 180)
```
* a quick way to get some map data
```{r}
map_world <- borders("world", colour="gray50", fill= "gray50")
```
* add flight track and plot
```{r,}
map <- ggplot() +
map_world +
geom_point(aes(x = longitude, y = latitude, colour = o3),
size = 1,
data = data) +
scale_colour_gradientn(colours = terrain.colors(10))
map
```
* The `ggmaps` package contains tools for plotting spatial maps with greater control:
```
* extract a single flight
```
```{r}
data <- filter(arctas_data, flight == 8) %>%
mutate(longitude = ((longitude + 180) %% 360) - 180)
```
```
* calculate the required zoom
```
```{r}
zoom <- calc_zoom(longitude, latitude, data = data) - 1
```
```
* download map data
```
```{r}
map <- get_map(location = c(lon = mean(data$longitude),
lat = mean(data$latitude)),
zoom = zoom,
maptype = "satellite",
source = "google")
```
```
* combine the map with the flight track
```
```{r}
map <- ggmap(map) +
geom_path(aes(x = longitude, y = latitude, colour = o3),
size = 2,
data = data) +
scale_colour_gradientn(colours = terrain.colors(10))
```
---
## Interactive maps
The [plotly](https://plot.ly/r/) package allows you to create interactive graphics.
```{r}
library(plotly)
```
It works simply with `ggplot` objects:
* use the `ggplotly()` function to convert a `ggplot` object to an interactive `plotly` graphic.
```{r}
ggplotly(map)
```