cleaning.Rmd

---
title: "Data cleaning and raw data management"
output:
  html_document:
    toc: yes
    toc_float: yes
  html_notebook:
    toc: yes
    toc_float: yes
---
[<<BACK](https://remi-daigle.github.io/2017-CHONe-Data/)

# The reproducible workflow

My unofficial mantra (again):

!["Adam Savage via Pintrest"](https://s-media-cache-ak0.pinimg.com/564x/e9/f2/19/e9f219dce30f13670158832310b0e42c.jpg)

In  [Data organization with spreadsheets](https://remi-daigle.github.io/2017-CHONe-Data/organization.nb.html) we learned how to write down our data, avoid common errors, and how to save your data so that you and others can read and understand it later.

Now let's learn how to write down **ALL** the steps needed to take you from your raw data to publication quality figures. That's the key to reproducibility, if every step is written down, then others can reproduce your findings! Below, You will learn how to clean your data in a reproducible way using OpenRefine and R.

> **Pro-tip**
>
> After spending all that time entering your raw data, **NEVER** change it again.
>
> - Do not edit the data
> - Do not edit the column headers
> - Do not remove 'outliers'
> - Do not do calculations directly on the raw data
>
> Store your data in a `raw_data` directory (folder) in your project directory, and never save/write over it!

I advise you archive your data immediately upon collection to reduce risk of data loss. [Zenodo](https://zenodo.org/) and [Figshare](https://figshare.com/) both offer different **free** options for archiving your raw data **permanently** and have some awesome options for embargoes, restricted access, or even private storage to suite your privacy needs. More detail can be found in the [Data Archiving & version controls](https://remi-daigle.github.io/2017-CHONe-Data/versioncontrol.nb.html) lesson.

![[](https://twitter.com/TrevorABranch/status/466780827985002496)](pictures/Branch.png)

By now I've probably said the words reproducible and reproducibility so often that it's starting to lose meaning. Trust me, this is important! (or don't trust me and read ["A manifesto for reproducible Science"](https://www.nature.com/articles/s41562-016-0021), and ["Reproducible Data Science with R"](https://www.r-bloggers.com/reproducible-data-science-with-r/)). By not 'hiding' you workflow, you help:

- **yourself** in the future since:
    - you can easily re-run your analysis after 'reviewer 3' makes you add one data point
    - you will be able to easily adapt your method
    - you will receive higher citations since someone may cite you for your methods/data
    - you will appear like an honest/better scientist
- **other scientists** since:
    - they will be able to reuse/adapt your methods/data
    - they will be better able to judge your methods as appropriate (for your data during review, or their data in the future)
- **'Science'** since
    - discoveries are published more efficiently
    - findings are 'correct' more often
    - there is increased trust in 'Science', which we badly need right now


# OpenRefine

OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another. It's effectively a reproducible way to work in a spreadsheet, no coding is required on your part since it generates a script that details what you did to the data, step by step.

First, let's open OpenRefine. You'll notice it opens in the browser, but it's running locally (does not require internet connection).
![](pictures/OR.png)

The first step is to load your data and to create a project. If you haven't done so already, please download the [zip file](https://github.com/remi-daigle/2017-CHONe-Data/archive/gh-pages.zip) the entire  github [project repository](https://github.com/remi-daigle/2017-CHONe-Data/) (contains data files, and all the R scripts used to make this very website!) and extract it somewhere convenient.

In OpenRefine, click `Choose Files` and find the `larval abundance.csv` which is in the `rawdata` directory of the project folder. Then click on the `Create Project` button (you may want to rename the project).

## Data Cleaning
You are now working on a copy of the raw data and changes you make in OpenRefine will not 'break' your original raw data. In here you can do all the regular 'spreadsheet-y' things. You can edit specific cells, sort, undo/redo, view subsets of your data (facet), etc. But the more powerful functions of OpenRefine are:

- **Cluster** (click on column header arrow, then `Edit cells > Cluster and Edit...`) which means “finding groups of different values that might be alternative representations of the same thing”. For example, the two strings “New York” and “new york” are very likely to refer to the same concept and just have capitalization differences.

- **Whitespace management** (click on column header arrow, then `Edit cells > Common transforms > Trim leading and trailing whitespace.` and `Edit cells > Common transforms > Collapse consecutive whitespace.`) Strings with spaces at the beginning or end are particularly hard for we humans to tell from strings without, but the blank characters will make a difference to the computer. We usually want to remove these.

![](pictures/ORwhitespace.png)

## Reproducibility

OpenRefine saves every change, every edit you make to the larvalAbundance in a file you can save on your machine. If you had 20 files to clean, and they all had the same type of errors, and all files had the same columns, you could save the script, open a new file to clean, paste in the script and run it. Voila, clean data.

- In the `Undo / Redo section`, click `Extract`, save the bits desired using the check boxes.
- Copy the code and paste it into a text editor. Save it as a `.txt` file.
- To run these steps on a new larvalAbundance, import the new larvalAbundance into OpenRefine, open the `Extract / Apply` section, paste in the `.txt` file, click `Apply`.

For more information and tutorials on OpenRefine, please see [Data Carpentry](http://www.datacarpentry.org/OpenRefine-ecology-lesson/)

# R and RStudio

You can do everything mentioned above in [R](http://cran.utstat.utoronto.ca/), it may at first appear more difficult to do it in R, but in my opinion, you will save time by streamlining your workflow using just one tool. I'm not at all disouraging the use of OpenRefine,it is open source and reproducible, such much kudos is due.

For this and future lessons, we will focus on achieving reproducibility by using [R](http://cran.utstat.utoronto.ca/) which is an open source language and environment for statistical computing and graphics. There are many other such languages (e.g. [Python](https://www.python.org/), [Julia](https://julialang.org/), [MATLAB](https://www.mathworks.com/products/matlab.html),etc) used by conservationists, biologist, and oceanographers; however, we believe R is currently the most widely adopted among our colleagues and also has the most convenient set of statistical tools developed for our field.

## Intro to R

In "the olden days" we ~~had to walk to school uphill both ways~~ used R in the terminal or using the built in graphical use interface (GUI). Yes, before 2011, RStudio did not exist and yes, R and RStudio are not the same thing!

R is accessible in the terminal (that thing that looks like DOS, and in case my 'old' is showing, the thing that is usually a black screen, a blinking cursor and you can only type in commands) by typing `R.exe` on Windows, or just `R` on Mac or Linux. In this way, you can type commands in one by one, or similar to what we just saw with OpenRefine, you save your steps/instuctions/commands in a plain text file (with a `.R` extension instead of `.txt`) and you can run those in the terminal by typing `Rscript.exe scriptname.R` on Windows, or just `Rscript scriptname.R` on Mac or Linux. While I don't often work in this way anymore, but this is the only option when using [Compute Canada](https://www.computecanada.ca/research-portal/national-services/compute/)'s awesome resources. Using the commands above and a little server specific magic, you can run your scripts on 100's of processors instead of the one lonely processor on your computer! I've used hundreds of years of computer time in a matter of weeks, all for free! If you are affiliated with any Canadian university, you can do this too!

!["This is what 'plain vanilla' R looks like"](pictures/Rterm.png)

R also come with it's own GUI, in which you can have a script editor, which is essentially a plain text editor, to write/develop your script and an interactive R console where you can actually execute commands. The advantage of the GUI is that you can execute the entire script ('source') or run it line by line all while recording your commands in the script file.

![](pictures/Rgui.png)

## Intro to RStudio

[RStudio](https://www.rstudio.com/products/RStudio/#Desktop) takes this GUI concept a bit further and provides you with several extra support window. If the idea of having windows for your environment, your files, your plots, as well as packages and a help tab all at hand does not excite you, hold on tight, you'll get there.

![](pictures/RStudio.png)

There's also a lot more information about RStudio on their [cheatsheet](https://www.rstudio.com/wp-content/uploads/2016/01/rstudio-IDE-cheatsheet.pdf). On the subject of cheatsheets, RStudio has developed several **super useful** [cheatsheets](https://www.rstudio.com/resources/cheatsheets/); seriously, you probably will want to print most of these and put them on the wall in your office.

## Enough talk! Let's get coding! The Fundamentals
You read my mind! However, before we get to cleaning the data, we need to cover a few R fundamentals so that what we do in later steps makes sense.

Go back to the project folder you downloaded during the [OpenRefine lesson](https://remi-daigle.github.io/2017-CHONe-Data/cleaning.html#openrefine) and open the `2017-CHONe-Data.Rproj` file. This is an R project file that allows you to set a number of options for the project (see [here](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects)), but for our purposes just know that the project file is setting the 'working directory', it tells R where all your files are. We'll get back to that later.

In R you can do math, type the command below in your R console and hit enter:
```{r}
1+1
```

You can also assign values to variables, or in R parlance an 'object' using the `<-` symbol (shortkey `Alt` + `-`).
```{r}
a <- 2
b <- 1+2
```

You'll notice that there was no output this time there was no output. That's because the value on the right side of the `<-` symbol is assigned to a object, it goes to your environment (top right window) instead of being output to the console. In the 1+1 example above, there was no object to go to, so it defaulted to printing in the console.

You can see the contents of a object in the environment window, or by typing the object into the console. You can also use these objects like algebra
```{r}
a
b
a/b
```

Up to now, we've been dealing with numbers, but R can also deal with character string if surrounded by single or double quotation marks. According to R help (I learned this today!): "Single and double quotes delimit character constants. They can be used interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes."

```{r}
f <- "This is a character string, you can tell because of the quotation marks"
```

A object can also contain multiple values, this is called a vector. The `:` symbol essentially means 'to'

```{r}
x <- 1:3
```

Another way to do that, with more flexibility is using the `c()`; the c is short for concatenate and the round brackets indicate that it's a function. So this concatenate function will concatenate all the 'arguments' (things inside the round brackets) which are separated by commas. You can also combine these strategies

```{r}
x <- c(1,3,5)
y <- c(1:4,6,8)
```

There are many functions, but they all follow the format `functionName(argument1,argument2,argument3,...)` where the 'arguments' are the input to the function. Some are fairly straightforward:

```{r}
mean(y)
```
But even then there are some surprises, let's look at the help file for `mean()`. To do that you can:
- if you are on the active line in the console or anywhere in a script, put your cursor on the function and press `F1`
- in the console, type `?mean` (or `??mean` if your not so sure `mean` is the name of the function)
- find the help window (one of the tabs for the bottom right window) and use the search bar
- also, when all else fails, Google is your friend!

Any method should get you to something like this:
![](pictures/help.png)
In R in most cases you could use `=` instead of the `<-` symbol with no problems when you are assigning a value to a object. However, it is best practice to use `<-` when assigning environment objects and `=` when defining function arguments. Oh, and `NA` in R means ‘Not Available’ / Missing Values. Like so:
```{r}
x <- c(1,2,5,7,88,3,4,2,4,6,7,NA)

mean(x)

mean(x, na.rm = TRUE)
```

I also snuck a `TRUE` in there; `TRUE` and `FALSE` are called logical and are distinct from numeric or character strings. They are sometimes used as arguments values, but they can also used to test things. The `==` asks if both sides are equal (since the single `=` is already used for other things), and the `!=` asks if both sides are not equal.

```{r}
2==1
2==2
2!=1
```
## Creating your own scripts
Up until now, we've been playing in the console which means the 'instructions' we need to save to reproduce our science are lost (well not really, they can be retrieved from the History tab in the top right, or the console if it hasn't rolled off the screen). It is a good idea to develop your analysis using a script file (those simple text files with the `.R` extension I was talking about earlier) because you can save your code easily.

To create a new script, your can click on the little paper with the plus symbol (see below), or you can hit `Ctrl`+`Shift`+`N` (Windows), or `Command`+`Shift`+`N` (Mac), and if that's not enough options, you can click `File > New File > New Script`

![](pictures/newscript.png)

These scripts are designed to read by R from top to bottom when you hit the ![](pictures/source.png) button, or `Ctrl`+`Shift`+`S` (Windows), or `Command`+`Shift`+`S` (Mac). Alternatively, you can run portions of your code with `Ctrl`+`Enter` (Windows), or `Command`+`Enter` (Mac) and either putting your cursor on a line to run the entire line, or highlighting a subsection of code to run just that portion. This will allow us to build multi-step data processing and analysis scripts.

>**Pro-tip**
>If you don't want R to read something, us the `#`. Anything that is preceded by a `#` is regarded as a 'comment' by R and it does not try to execute those lines (i.e. R ignores anything after a `#`).
> This is also useful if you want to avoid running a few lines of code when you are developing your script. Instead of typing a `#` in front of each line of code, you can highlight the lines you want commented out and hit `Ctrl`+`Shift`+`C` (Windows), or `Command`+`Shift`+`C` (Mac). Magic!

Commenting is super useful to include human readable instructions/documentation in your code. Let's give this a try, write this chunk of code into your script, then run it line by line.
```{r}
x <- 1

x <- 2

# x <- 3
```
What is the value of `x` after running all the lines and why?

## Indexing and dimensions

We already mentioned that we can have multiple values in an object, but so far this has been in only 1 dimension, but using `matrix()` (2D) and `array()` (>2D) we can store numbers in multiple dimensions

```{r}
# make a matrix
x <- matrix(data = c(1,1,2,5,3,4), nrow = 2, ncol = 3)

x

# make an array

y <- array(data = c(1:8), dim = c(2,2,2))

y
```

>**Exercise**
>
> Try storing numeric and character strings in a matrix (or an array). What happens to the numerics?

That's great that we can store this data in multiple dimensions, but how do I get it back? That's what `[]` are for!

```{r}
# in 1D
x <- c(1,2,3,4,6,34,2,1,5,6,7)
# if we want the 7th value
x[7]
# if we want the 2nd and 7th value
x[c(2,7)]
# if we want only values greater than 10
x[x>10]
# whoa, that blew my mind! How did that work?
# well indexing works by giving the numeric index, or a logical vector the length of the vector we're working with
# so x>10 produces a logical vector the length of x
x>10

# in 2D
x <- matrix(data = c(1,1,2,5,3,4), nrow = 2, ncol = 3)
# this works similar, but you need to provide 2 numbers (or 2 vectors of numbers), for row number and for column number
x[1,3]
x[c(1,2),1]
# if you leave one dimension blank, the whole row or column will be returned
x[,1]
x[2,]

# in 3D, add another dimension!
x <- array(data = c(1:24), dim = c(2,3,4))
x[1,2,3]
x[,,3]
```


Data frames (`data.frame()`) are a special type of matrix that can hold numeric and character strings (and logicals, factors, geometries, etc) without having to convert them all to a single type. It's effectively a collection of vectors of the same length.

```{r}
# make a data frame
x <- data.frame(nums = c(1,2,3),
                chars = c("one","two","three"),
                logis = c(TRUE, FALSE, TRUE))

# print the whole thing to the console
x

# or for a very big dataframe, you may want to use head to see the top 5 rows
head(x)

# use the str function to see it's structure
str(x)

```
>**Pro-tip**
>
> Did you notice that the structure of chars was `Factor` and not `chr`? Factors are a special type of of character string vector that conserves information about the vector's 'levels' (and you can also order those levels). Many functions, such as `data.frame()` have an argument called `stringsAsFactors` and is usually set to `TRUE` by default, you may want to set this to `FALSE`. Often, these are interchangeable, but I have had **many frustrating errors** because I mistakenly had a character string where I needed a factor and vice versa. Be aware that they are different and not knowing which one you have can lead to errors. You can easily convert back and forth with `as.factor` (or `as.ordered()`) and `as.character()`

```{r}
# let's try having both strings and factors in the data frame
# make a data frame
x <- data.frame(nums = c(1,2,3),
                chars = c("one","two","three"),
                facts = as.factor(c("one","two","three")),
                logis = c(TRUE, FALSE, TRUE),
                stringsAsFactors = FALSE)

# print the whole thing to the console
x

# or for a very big dataframe, you may want to use head to see the top 5 rows
head(x)

# use the str function to see it's structure
str(x)

```

Indexing in dataframes can be the same as for a 2D matrix, or you can use `$` to access columns by name

```{r}
# make a data frame
x <- data.frame(nums = c(1,2,3),
                chars = c("one","two","three"),
                facts = as.factor(c("one","two","three")),
                logis = c(TRUE, FALSE, TRUE),
                stringsAsFactors = FALSE)

x[1,2]
x[,2]
x$chars

# and you can also treat these columns as 1D vectors
x$chars[1]

```

## Reading in data
That's all great, but our data is store in a `.csv` file, how do we get that? The simplest way is to use the `read.csv()` function. But we need to know where on the computer the file is stored.

The first step is to load your data and to create a project. If you haven't done so already, please download the [zip file](https://github.com/remi-daigle/2017-CHONe-Data/archive/gh-pages.zip) the entire  github [project repository](https://github.com/remi-daigle/2017-CHONe-Data/) (contains data files, and all the R scripts used to make this very website!) and extract it somewhere convenient.

Go back to the project folder you extracted and open the `2017-CHONe-Data.Rproj` file. This is an R project file that allows you to set a number of options for the project (see [here](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects)), but for our purposes just know that the project file is setting the 'working directory', it tells R where all your files are.

```{r}
# you can hard code the whole file path, but try to never do that!
# your computer may not have a C drive, and your name almost certainly is not Remi, so this will not work for you

# larvalAbundance <- read.csv("C:/Users/Remi-Work/Desktop/2017-CHONe-Data/rawdata/larval abundance.csv")

# The above is unsurprisingly not reproducible! Always use relative paths!
# Relative paths are a shortened version of the above, you only need to type what comes after the project directory
# REMINDER: The project directory is where the .Rproj file is stored

larvalAbundance <- read.csv("rawdata/larval abundance.csv", stringsAsFactors = FALSE)

# writing it this way (relative path) means that your project is reproducible since you can move this whole directory to another location on your computer OR ANY OTHER COMPUTER!!!

```

This data comes from one of Remi's PhD thesis chapters[^1] that was part of the first CHONe. I modified it from the original version archived on [Dryad](http://datadryad.org/handle/10255/dryad.59482) so we would have cleaning to do!

[^1]: Daigle RM, Metaxas A, deYoung B (2014) Bay-scale patterns in the distribution, aggregation and spatial variability of larvae of benthic invertebrates. Marine Ecology Progress Series 503:139-156. http://dx.doi.org/10.3354/meps10734

## Cleaning data in R

Common errors in data are:

- trailing or leading white spaces in character strings
- data entered in different formats (e.g. numbers and character strings in one column)
- data entered with wrong units


The first step I always take is to make sure that the data loaded in as expected. Head over to the Environment window and click on `data`. You can do (temporary) sorting and filtering of the data in the data viewer. I also use the `str()` function to make sure all the variables (columns) are of the correct type.
```{r}
str(larvalAbundance)
```

If I wanted to make a correction manually, we can use indexing:
```{r}
# say for example, I knew that the second observation was in fact taken at 12 m depth
larvalAbundance$depth[2]

# the value in the data frame is indeed 3 WRONG! Let,s correct it
larvalAbundance$depth[2] <- 12
head(larvalAbundance)

# The other issue you may notice is that my months values are mostly numbers, but there's at least 1 "August" in there
# We could correct just that one we see on line 4
larvalAbundance$month[4] <- 8

# Or we could get rid of all the "August"'s in one pass
larvalAbundance$month[larvalAbundance$month=="August"] <- 8

# but that column is still a character string, let's convert it to numeric
larvalAbundance$month <- as.numeric(larvalAbundance$month)

str(larvalAbundance)

```


If you were wondering how many years I had sampled, you could do:
```{r}
unique(larvalAbundance$year)

# or for time
unique(larvalAbundance$time)

# as you can see there are a few trailing white spaces, to do text substitutions, let's use gsub()
# but first let's see how this works; it matches a pattern in x and replaces it.

gsub(pattern = "ABC",replacement = "XYZ",x = "TUVWABC")

gsub(pattern = "doesn't work",replacement = "works",x = "If this sentence no longer contains the pattern, then gsub doesn't work")

# So now, let's use gsub to remove those spaces. The patter we are matching is just a space and the replacement is nothing, so that will remove white spaces

larvalAbundance$time <- gsub(pattern = " ",replacement = "",x = larvalAbundance$time)

```

Feel the power! The `gsub()` function is very powerful and the pattern matching works based on 'regular expressions' (which is a nearly universal pattern matching language/protocol). For example, if you had spaces you wanted to keep and only wanted to remove white spaces at the end, you could use `pattern = " +$"` since the dollar sign in regex means 'ends with' and the plus means 'one or more', so gsub would match one or more spaces at the end of a character string. You can practice your 'regex' with [regexpal](http://www.regexpal.com/) and this [cheatsheet](http://www.rexegg.com/regex-quickstart.html)

Other useful things you should check are the minimum and maximum values for each column to make sure things were entered all in the same order of magnitude, or a even a quick histogram

```{r}
min(larvalAbundance$Margarites.spp.)
max(larvalAbundance$Margarites.spp.)
hist(larvalAbundance$Margarites.spp.)

```

Lastly, there are some column names that are not 'up to code', so to avoid a stern talking to from CHONe's data manager, let's fix that now! (Also, you may have noticed that latitude and longitude were swapped!)

```{r}
names(larvalAbundance)

names(larvalAbundance)[names(larvalAbundance)=="long"] <- "decimalLatitude"
names(larvalAbundance)[names(larvalAbundance)=="lat"] <- "decimalLongitude"
names(larvalAbundance)[names(larvalAbundance)=="site"] <- "locationID"

```


Everything now seems reasonable to me, but it is your responsibility to check that each column of your data 'makes sense'. But for sake of time, lets move on!

Now we have a choice, we can either run a data cleaning script every time we load the raw data, or we can save a 'clean' data product. Let's do the latter.

```{r}
# let's create a new folder for intermediate data products
dir.create("data")

# Then let's save the cleaned data in that folder
write.csv(larvalAbundance, file = "data/larvalAbundanceClean.csv", row.names = FALSE)
```

>**Pro-tip**
>
> Notice that:
> - we are not writing over the raw data
> - we are not writing in the same folder as the raw data
> - we are naming our new data file informatively

## Making your own functions and packages

Part of R's awesomeness is that it already comes with a lot of functions that are very useful for everyday science. Additionally, there are many `packages` which are essentially collections of new functions and help files generated by users like you and me that add to the already broad functionality of R.

*Warning: shameless self promotion below!*

Here is a package I created called [BESTMPA](https://github.com/remi-daigle/BESTMPA) and the peer-reviewed paper describing it: [An adaptable toolkit to assess commercial fishery costs and benefits related to marine protected area network design](https://f1000research.com/articles/4-1234/v2)

Making your own function follows the particular format below, let's make one called `custommean()`
```{r}

custommean <- function(x){
    m <- sum(x)/length(x)
    return(m)
}

# so we defined custommean as a function with arguments 'x', and it does what is inside the curly brackets
# it will return m which is the mean of x.

x <- c(1,2,3,6)

# does it work?
custommean(x = x)
```

If you make a few of those and write some help files to go along with them, you can make your own package. If you're interested see the "[R packages](http://r-pkgs.had.co.nz/)" book by Hadley Wickham (Chief Scientist at RStudio, not the last time I will mention him), but making packages is beyond the scope of what I can cover in this workshop

Anyway, there is a central organization called 'Comprehensive R Archive Network' or CRAN which houses all the official package, but there are also other packages (like mine), as well as the development version of many of the official packages on github that are worth taking a look at.

## Install packages for tomorrow
To install packages, you can either use the Packages window at the bottom right, or you can do it with written commands (my preference). Here are a few we will use tomorrow, try installing them now and let us know if you get any errors.

```{r, eval=FALSE}

install.packages('tidyverse') # The tidyverse is a collection of R packages that share common philosophies and are designed to work together. (e.g. ggplot2, dplyr, tidyr)

install.packages('marmap') # Import xyz data from the NOAA (National Oceanic and Atmospheric Administration, <http://www.noaa.gov>), GEBCO (General Bathymetric Chart of the Oceans, <http://www.gebco.net>) and other sources, plot xyz data to prepare publication-ready figures

install.packages('raster') # Reading, writing, manipulating, analyzing and modeling of gridded spatial data (and also getting access to GADM basemaps)

install.packages('devtools') # Allows you to install packages from github

# the 'robis' package on CRAN does not work with the latest version of R (yet), so we need to get the latest version from github
devtools::install_github("iobis/robis")

# you already have the CRAN version of ggplot2, but that version is not compatible with the sf package yet
# So, we need the github version of ggplot2 as well!
devtools::install_github("tidyverse/ggplot2")

install.packages('gridExtra') # to be able to arrange multiple ggplot plots

install.packages('taxize') # To extract and validate species taxonomy

install.packages('rfishbase') # To access resources available on Fishbase and SeaLifeBase

install.packages('rglobi') # To access interactions data

devtools::install_github("ropensci/rnoaa") # To access environmental data from the NOAA databases

install.packages('knitr')

install.packages('biomod2') # to perform species distribution models

install.packages('iGraph') # to produce network plots

install.packages('networkD3') # to produce html (a.k.a. interactive) network plots!

devtools::install_github("guiblanchet/HMSC") # package to perform hierarchical modeling of species communities

install.packages('coda') # package to summarize and plot outputs from Markov Chains

install.packages('corrplot') # visualization of correlation matrices

install.packages('circlize') # circular visualization of data

install.packages('ModelMetrics') # collection of metrics coded for efficiency in C++ using Rcpp

install.packages("pdftools") # to extract pdf content

install.packages('stringr') # Simple, Consistent Wrappers for Common String Operations

install.packages('tidytext') # text analysis package

install.packages('viridis') # Port of the new 'matplotlib' color maps

install.packages('tibble') # Simple Data Frames

devtools::install_github("dgrtwo/widyr") # Widen, process, and re-tidy a dataset

install.packages('ggraph') # An Implementation of Grammar of Graphics for Graphs and Networks

install.packages('wordcloud2') # wordle generator

install.packages('leaflet') # Create and customize interactive maps

install.packages('mapview') # Interactive Viewing of Spatial Objects in R

install.packages('scales') # Graphical scales map data to aesthetics


# to gain access to the functions in a package, you can use library(), eg:
library(tidyverse)
```

## Keeping R up to date

Every so often, R releases a new update. Many of you had install issues last night that were resolved by installing the newest version of R. To keep things running smoothly in the future, here is a great trick:

```{r, eval=FALSE}
# install the package called installr
install.packages("installr")

# load the library
library(installr)

# run the updater function (this is best done outside Rstudio in the R gui)
updater()

```

This will prompt you to install the latest version of R and copy over all of you packages if you choose to do so (the alternative is installing them all by hand again), and it can also update your packages for you. It's really a great time saver!


[<<BACK](https://remi-daigle.github.io/2017-CHONe-Data/)