Skip to content

Commit

Permalink
chapter 2
Browse files Browse the repository at this point in the history
  • Loading branch information
b-rodrigues committed Oct 25, 2023
1 parent b786928 commit f5fc51f
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 0 deletions.
97 changes: 97 additions & 0 deletions 02-intro_R.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Introduction to R

<div style="text-align:center;">
```{r, echo = F}
knitr::include_graphics("img/abacus.png")
```
</div>

What you'll have learned by the end of the chapter: reading and writing,
exploring (and optionally visualising) data.

## Reading in data with R

Your first job is to actually get the following datasets into an R session.

First install the `{rio}` package (if you don't have it already), then download
the following datasets:

- [mtcars.csv](https://raw.githubusercontent.com/b-rodrigues/modern_R/master/datasets/mtcars.csv)
- [mtcars.dta](https://github.com/b-rodrigues/modern_R/raw/master/datasets/mtcars.dta)
- [mtcars.sas7bdat](https://github.com/b-rodrigues/modern_R/raw/master/datasets/mtcars.sas7bdat)
- [multi.xlsx](https://github.com/b-rodrigues/modern_R/raw/master/datasets/multi.xlsx)

Also download the following 4 `csv` files and put them in a directory called
`unemployment`:

- [unemp_2013.csv](https://raw.githubusercontent.com/b-rodrigues/modern_R/master/datasets/unemployment/unemp_2013.csv)
- [unemp_2014.csv](https://raw.githubusercontent.com/b-rodrigues/modern_R/master/datasets/unemployment/unemp_2014.csv)
- [unemp_2015.csv](https://raw.githubusercontent.com/b-rodrigues/modern_R/master/datasets/unemployment/unemp_2015.csv)
- [unemp_2016.csv](https://raw.githubusercontent.com/b-rodrigues/modern_R/master/datasets/unemployment/unemp_2016.csv)

Finally, download this one as well, but put it in a folder called `problem`:

- [mtcars.csv](https://raw.githubusercontent.com/b-rodrigues/modern_R/master/datasets/problems/mtcars.csv)

and take a look at chapter 3 of my other book, [Modern R with the
{tidyverse}](https://b-rodrigues.github.io/modern_R/reading-and-writing-data.html)
and follow along. This will teach you to import and export data.

`{rio}` is some kind of wrapper around many packages. You can keep using
`{rio}`, but it is also a good idea to know which packages are used under the
hood by `{rio}`. For this, you can take a look at this
[vignette](https://cran.r-project.org/web/packages/rio/vignettes/rio.html).

If you need to import very large datasets (potentially several GBs), you might
want to look at packages like `{vroom}` ([this
benchmark](https://vroom.r-lib.org/articles/benchmarks.html#reading-delimited-files)
shows a 1.5G csv file getting imported in seconds by `{vroom}`. For even larger
files, take a look at `{arrow}` [here](https://arrow.apache.org/docs/r/). This
package is able to efficiently read very large files (`csv`, `json`, `parquet`
and `feather` formats).

## A little aside on pipes

Since R version 4.1, a forward pipe `|>` is included in the standard library of
the language. It allows to do this:

```{r}
4 |>
sqrt()
```

Before R version 4.1, there was already a forward pipe, introduced with the
`{magrittr}` package (and automatically loaded by many other packages from the
*tidyverse*, like `{dplyr}`):

```{r}
library(dplyr)
4 %>%
sqrt()
```

Both expressions above are equivalent to `sqrt(4)`. You will see why this is
useful very soon. For now, just know this exists and try to get used to it.

## Exploring and cleaning data with R

Take a look at [chapter
4](https://b-rodrigues.github.io/modern_R/descriptive-statistics-and-data-manipulation.html#a-first-taste-of-data-manipulation-with-dplyr)
of my other book, ideally you should study the entirety of the chapter, but for
our purposes you should really focus on sections 4.3, 4.4, 4.5.3, 4.5.4,
(optionally 4.7) and 4.8.


## Data visualization

We're not going to focus on visualization due to lack of time. If you need to
create graphs, read [chapter
5](https://b-rodrigues.github.io/modern_R/graphs.html).

## Further reading

[R for Data Science](https://r4ds.had.co.nz/)
1 change: 1 addition & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ book:
downloads: [pdf, epub]
chapters:
- index.qmd
- 02-intro_R.qmd
page-navigation: true

bibliography: references.bib
Expand Down

0 comments on commit f5fc51f

Please sign in to comment.