Skip to content
This repository has been archived by the owner on Aug 4, 2020. It is now read-only.

Typos and comments #87

Open
wants to merge 1 commit into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 5 additions & 10 deletions 01-intro-to-R.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ knitr::opts_chunk$set(results='hide', fig.path='img/r-lesson-')
>
> * Assign names to objects in R with <- and =.
> * Solve mathematical operations in R.
> * Describe what a function is in R.
> * Describe what a function in R is.
> * Describe what vectors are and how they can be manipulated in R.
> * Inspect the content of vectors in R and describe their content with class and str.

Expand Down Expand Up @@ -66,10 +66,8 @@ doesn't work. Now we're stuck over in the console. The
`+` sign means that it's still waiting for input, so we
can't type in a new command. To get out of this press the `Esc` key. This will work whenever you're stuck with that `+` sign.

It's great that R is a glorified caluculator, but obviously
we want to do more interesting things.

To do useful and interesting things, we need to assign _values_ to
It's great that R is a glorified calculator, but obviously
we want to do more interesting things, e.g. we can assign _values_ to
_objects_. To create objects, we need to give it a name followed by the
assignment operator `<-` and the value we want to give it.

Expand Down Expand Up @@ -97,10 +95,8 @@ be read as 3 **goes into** `x`. You can also use `=` or `->`for assignments but
all contexts so it is good practice to use `<-` for assignments. `=` should only
be used to specify the values of arguments in functions, see below.

In RStudio, typing <kbd>Alt</kbd> + <kbd>-</kbd> (push <kbd>Alt</kbd> at the
-same time as the <kbd>-</kbd> key) will write ` <- ` in a single keystroke
in a PC, while typing <kbd>Option</kbd> + <kbd>-</kbd> (push <kbd>Option</kbd> at the
+same time as the <kbd>-</kbd> key) does the same in a Mac.
In RStudio, typing <kbd>Alt</kbd> + <kbd>-</kbd> (push <kbd>Alt</kbd> at the same time as the <kbd>-</kbd> key) will write ` <- ` in a single keystroke
in a PC, while typing <kbd>Option</kbd> + <kbd>-</kbd> (push <kbd>Option</kbd> at the same time as the <kbd>-</kbd> key) does the same in a Mac.

### Exercise

Expand Down Expand Up @@ -510,4 +506,3 @@ sessionInfo()
mailing lists.
* [How to ask for R help](http://blog.revolutionanalytics.com/2014/01/how-to-ask-for-r-help.html)
useful guidelines

5 changes: 3 additions & 2 deletions 02-starting-with-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ Wow... that was a lot of output. At least it means the data loaded properly. Let
head(metadata)
```

We've just done two very useful things.
We've just done two very useful things:

1. We've read our data in to R, so now we can work with it in R
2. We've created a data frame (with the read.csv command) the
standard way R works with data.
Expand Down Expand Up @@ -172,7 +173,7 @@ When we read in a file, any column that contains text is automatically
assumed to be a factor. Once created, factors can only contain a pre-defined set values, known as
*levels*. By default, R always sorts *levels* in alphabetical order.

For instance, we see that `cit` is a Factor w/ 3 levels, `minus`, `plus` and `unknown`.
For instance, we see that `cit` is a Factor w/ 3 levels, `minus`, `plus` and `unknown`. The vector `3 3 3 3 3 3 ... ` indicates that the first samples have `unknown` (third level) citrate status.

<!--

Expand Down
5 changes: 1 addition & 4 deletions 04-dplyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,7 @@ metadata <- read.csv("data/Ecoli_metadata.csv")

Bracket subsetting is handy, but it can be cumbersome and difficult to read, especially for complicated operations.

Enter `dplyr`.

`dplyr` is a package for
making data manipulation easier.
`dplyr` is a package for making data manipulation easier.

Packages in R are basically sets of additional functions that let you do more
stuff in R. The functions we've been using, like `str()`, come built into R;
Expand Down
11 changes: 5 additions & 6 deletions 05-data-visualization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ metadata <- read.csv('./data/Ecoli_metadata.csv')

The mathematician Richard Hamming once said, "The purpose of computing is insight, not numbers", and the best way to develop insight is often to visualize data. Visualization deserves an entire lecture (or course) of its own, but we can explore a few features of R's plotting packages.

When we are working with large sets of numbers it can be useful to display that information graphically. R has a number of built-in tools for basic graph types such as hisotgrams, scatter plots, bar charts, boxplots and much [more](http://www.statmethods.net/graphs/). We'll test a few of these out here on the `genome_size` vector from our metadata.
When we are working with large sets of numbers it can be useful to display that information graphically. R has a number of built-in tools for basic graph types such as histograms, scatter plots, bar charts, boxplots and much [more](https://www.statmethods.net/graphs/index.html). We'll test a few of these out here on the `genome_size` vector from our metadata.


```{r simplestats}
Expand All @@ -35,8 +35,7 @@ Let's start with a **scatterplot**. A scatter plot provides a graphical view of
plot(genome_size)
```

Each point represents a clone and the value on the x-axis is the clone index in the file, where the values on the y-axis correspond to the genome size for the clone. For any plot you can customize many features of your graphs (fonts, colors, axes, titles) through [graphic options](http://www.statmethods.net/advgraphs/parameters.html)
For example, we can change the shape of the data point using `pch`.
Each point represents a clone and the value on the x-axis is the clone index in the file, where the values on the y-axis correspond to the genome size for the clone. For any plot you can customize many features of your graphs (fonts, colors, axes, titles) through [graphic options](http://www.statmethods.net/advgraphs/parameters.html). For example, we can change the shape of the data point using `pch`.

```{r, fig.align='center'}
plot(genome_size, pch=8)
Expand Down Expand Up @@ -68,7 +67,7 @@ Similar to the scatterplots above, we can pass in arguments to add in extras lik

```{r, fig.align='center'}
boxplot(genome_size ~ cit, metadata, col=c("pink","purple", "darkgrey"),
main="Average expression differences between celltypes", ylab="Expression")
main="Average expression differences between cell types", ylab="Expression")
```


Expand All @@ -93,8 +92,8 @@ ggplot(metadata)
Geometric objects are the actual marks we put on a plot. Examples include:

* points (`geom_point`, for scatter plots, dot plots, etc)
* lines (`geom_line`, for time series, trend lines, etc)
* boxplot (`geom_boxplot`, for, well, boxplots!)
* lines (`geom_line`, for time series, trend lines, etc)

A plot **must have at least one geom**; there is no upper limit. You can add a geom to a plot using the + operator

Expand Down Expand Up @@ -199,4 +198,4 @@ dev.off()

Resources:
---------
We have only scratched the surface here. To learn more, see the [ggplot2 reference site](http://docs.ggplot2.org/), and Winston Chang's excellent [Cookbook for R](http://wiki.stdout.org/rcookbook/Graphs/) site. Though slightly out of date, [ggplot2: Elegant Graphics for Data Anaysis](http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis/dp/0387981403) is still the definative book on this subject. Much of the material here was adpapted from [Introduction to R graphics with ggplot2 Tutorial at IQSS](http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html).
We have only scratched the surface here. To learn more, see the [ggplot2 reference site](http://ggplot2.tidyverse.org/), and Winston Chang's excellent [Cookbook for R](http://www.cookbook-r.com/Graphs/) site. Though slightly out of date, [ggplot2: Elegant Graphics for Data Anaysis](http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis/dp/0387981403) is still the definitive book on this subject. Much of the material here was adpapted from [Introduction to R graphics with ggplot2 Tutorial at IQSS](http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html).
6 changes: 2 additions & 4 deletions index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,7 @@ experience. These lessons can be taught in a day (~ 6 hours). They start with
some basic information about R syntax, the RStudio interface, and move through
how to import CSV files, the structure of data frames, how to deal with factors,
how to add/remove rows and columns, how to calculate summary statistics from a
data frame, and a brief introduction to plotting. The last lesson demonstrates
how to work with databases directly from R.

data frame, and a brief introduction to plotting.

## Chapters

Expand All @@ -47,7 +45,7 @@ the data and install everything *before* working through this lesson.

### Data

Data for the lesson is available [here](https://raw.githubusercontent.com/datacarpentry/R-genomics/gh-pages/data/Ecoli_metadata.csv).
Data for the lesson are available [here](https://raw.githubusercontent.com/datacarpentry/R-genomics/gh-pages/data/Ecoli_metadata.csv).

We will download this file directly from R during the lessons when we need
it.
Expand Down