Skip to content

Commit

Permalink
day 30
Browse files Browse the repository at this point in the history
  • Loading branch information
Nics-Github committed Nov 11, 2024
1 parent de6efdc commit 3f25435
Show file tree
Hide file tree
Showing 26 changed files with 3,976 additions and 943 deletions.
7 changes: 2 additions & 5 deletions _freeze/site_libs/revealjs/dist/theme/quarto.css

Large diffs are not rendered by default.

13 changes: 7 additions & 6 deletions chapter 7 examples.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,13 @@ library(tidyverse)
library(mosaicData)
my_func <- function(data,variable2){
my_plot <- data |>
ggplot(aes(x = cesd))+
aes_string(y = variable2)+
data |>
ggplot(aes(x = cesd, y = .data[[variable2]]))+
geom_point()
}
my_func(HELPrct, age)
numeric_names_only <- HELPrct |>
select("age":"d1")|>
names()
Expand All @@ -69,12 +70,12 @@ library(tidyverse)
library(mosaicData)
my_func <- function(data,variable2){
my_plot <- data |>
ggplot(aes(x = cesd, y = {{variable2}}))+
data |>
ggplot(aes(x = cesd, y = variable2 ))+
geom_point()
}
my_func(HELPrct, age)
```

Expand Down
16 changes: 16 additions & 0 deletions chapter 7 examples_files/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"hash": "037e0265372063b7a6475c0e653169e0",
"result": {
"markdown": "---\ntitle: \"map() practice\"\nauthor: \"Schwab\"\neditor: visual\n---\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──\n✔ dplyr 1.1.4 ✔ readr 2.1.5\n✔ forcats 1.0.0 ✔ stringr 1.5.1\n✔ ggplot2 3.5.1 ✔ tibble 3.2.1\n✔ lubridate 1.9.3 ✔ tidyr 1.3.1\n✔ purrr 1.0.2 \n── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──\n✖ dplyr::filter() masks stats::filter()\n✖ dplyr::lag() masks stats::lag()\nℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors\n```\n:::\n\n```{.r .cell-code}\nlibrary(mosaicData)\n```\n:::\n\n\nDoing problems 1 and 3 from [here](https://mdsr-book.github.io/mdsr3e/07-iteration.html#iteration-exercises)\n\n**Problem 1 (Easy)**: Use the `HELPrct` data from the `mosaicData` to calculate the mean of all numeric variables (be sure to exclude missing values).\\\n\n\n::: {.cell}\n\n```{.r .cell-code}\nHELPrct |>\n summarise(\n across(.cols = where(is.numeric),\n .fns = ~mean(. , na.rm = TRUE))\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n age anysubstatus cesd d1 daysanysub dayslink drugrisk e2b\n1 35.65342 0.7723577 32.84768 3.059603 75.30738 255.6056 1.887168 2.504673\n female i1 i2 id indtot linkstatus mcs pcs\n1 0.2362031 17.90728 24.54746 233.4018 35.72848 0.3781903 31.67668 48.04854\n pss_fr sexrisk avg_drinks max_drinks hospitalizations\n1 6.706402 4.642384 17.90728 24.54746 3.059603\n```\n:::\n:::\n\n\n**Problem 3 (Medium)**: Use routines from the `purrr` package and the `HELPrct` data frame from the `mosaicData` package to fit a regression model predicting `cesd` as a function of `age` separately for each of the levels of the `substance` variable. Generate a formatted table (with suitable caption) of results\n(estimates and confidence intervals) for the slope parameter for each level of the grouping variable.\n\n(Hint: Use `group_by()` and `group_modify()` to fit the regression model on each part. Note that `broom::tidy()` is useful in having the output of the model be a data frame.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nHELPrct |> select(substance, cesd, age) |>\ngroup_by(substance)|>\n group_modify(.f = ~broom::tidy( lm(cesd~age, data = .))) |> \n filter(term ==\"age\") |>\n mutate(CI_low = estimate - 2*std.error, CI_high = estimate + 2*std.error) |>\n select(-statistic, - p.value, - std.error) |>\n kableExtra::kable(caption = \"The likelyhood someone uses a substance increates with age.\", digits = 1)\n```\n\n::: {.cell-output-display}\nTable: The likelyhood someone uses a substance increates with age.\n\n|substance |term | estimate| CI_low| CI_high|\n|:---------|:----|--------:|------:|-------:|\n|alcohol |age | 0.3| 0.1| 0.6|\n|cocaine |age | -0.3| -0.6| 0.0|\n|heroin |age | -0.2| -0.5| 0.0|\n:::\n:::\n\n\nActivity 2\n\nMake a function that takes a df and a variable then produces a scatter plot of that variable vs cesd. Use it to plot multiple graphs of the first 6 numeric variables HELPrct. [Hint: check out the extended example from 7.7](https://mdsr-book.github.io/mdsr3e/07-iteration.html#extended-example-factors-associated-with-bmi)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(mosaicData)\n\nmy_func <- function(data,variable2){\n my_plot <- data |>\n ggplot(aes(x = cesd))+\n aes_string(y = variable2)+\n geom_point()\n}\n\nnumeric_names_only <- HELPrct |>\n select(\"age\":\"d1\")|>\n names()\n\n\nmap(.x = numeric_names_only,\n .f = my_func,\n data= HELPrct\n )|>\n patchwork::wrap_plots(ncol = 2)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: `aes_string()` was deprecated in ggplot2 3.0.0.\nℹ Please use tidy evaluation idioms with `aes()`.\nℹ See also `vignette(\"ggplot2-in-packages\")` for more information.\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 207 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output-display}\n![](chapter-7-examples_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(mosaicData)\n\nmy_func <- function(data,variable2){\n my_plot <- data |>\n ggplot(aes(x = cesd, y = {{variable2}}))+\n geom_point()\n}\n\nmy_func(HELPrct, age)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(NHANES)\n\nbmi_plot <- function(.data, x_var) {\n ggplot(.data, aes(y = BMI)) +\n aes_string(x = x_var) + \n geom_jitter(alpha = 0.3) + \n geom_smooth() + \n labs(\n title = paste(\"BMI by\", x_var),\n subtitle = \"NHANES\",\n caption = \"US National Center for Health Statistics (NCHS)\"\n )\n}\nbmi_plot(NHANES, \"Age\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output-display}\n![](chapter-7-examples_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n\n```{.r .cell-code}\nc(\"Age\", \"HHIncomeMid\", \"PhysActiveDays\", \n \"TVHrsDay\", \"AlcoholDay\", \"Pulse\") |>\n map(bmi_plot, .data = NHANES) |>\n patchwork::wrap_plots(ncol = 2)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing non-finite outside the scale range\n(`stat_smooth()`).\nRemoved 366 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 1148 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 1148 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 5442 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Failed to fit group -1.\nCaused by error in `smooth.construct.cr.smooth.spec()`:\n! x has insufficient unique values to support 10 knots: reduce k.\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 5442 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 5113 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 5113 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 1502 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 1502 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output-display}\n![](chapter-7-examples_files/figure-html/unnamed-chunk-6-2.png){width=672}\n:::\n:::\n",
"supporting": [
"chapter-7-examples_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
177 changes: 177 additions & 0 deletions course-materials/lectures/24_data_masking.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
---
title: "data masking practice"
format:
revealjs:
theme: beige

editor: visual
execute:
echo: true
---

```{r}
library(tidyverse)
```

# Scope

Local vs. Global variables

```{r}
x <- 10
df <- tribble(
~x, ~y, ~ z ,
5, 0 , 0
)
```

x and df are global. They are referred to as environment variables.

df#x and df\$y are local to the dataframe. They are often referred to as stats-variables.

## The dollar sign operator

We don't use the dollar sign operator often, because we use the tidyverse which makes it unnecessary.

`$` selects a variable.

```{r}
df$x
# or
df[1:3]
```

We can select a variable with tidyr from the tidyverse.

```{r}
df |>
select(x,y,z)
```

This makes things easier to read and less redundant.

## Make a function

```{r}
my_printer <- function(variable){
print(variable)
}
```

To print out 5 we need to tell R where to find the variable 5.

```{r}
my_printer(x)
my_printer(df$x)
```

my_printer() looks through the environment first.

So my_printer(x)=10 and not 5.

## tidyverse problems

The tidyverse uses a process called masking to blur the lines between environment and data variables.

This is why we like it so much.

However it makes programming with the tidyverse functions more challenging.

## Use summarize to calculate a mean.

The code below works as expected because tidyr is working under the hood to select x from df.

```{r}
df |>
summarise(mean(x))
```

## A mean calculating function

We get into problems when using tidyverse functions within homemade functions:

```{r}
my_mean_maker <- function(data, variable){
data |>
summarise(mean(variable))
}
```

```{r}
my_mean_maker(data= df, variable = x)
```

We're still getting 10 in our output! It should be 5, because that is the value from the data frame.

my_mean_maker() goes to the global environment because it doesn't recognize that it should be looking in the data frame.

## When is tidyr working?

We can check to see if tidyr is working under the hood by checking the arguments of a function.

We are looking for the words "data-masking" or "tidy-select".

If these words are not present we will not need to inject the data (aka unmask the data).

```{r}
?summarise
```

## Solution

We need to inject the data into the sumarize function. We can do this by **unmasking** the data with {{}}.

This tells R to look for the variable within the dataframe, not the global environment.

```{r}
my_mean_maker <- function(data, variable){
data |>
summarise(mean({{variable}}))
}
```

```{r}
my_mean_maker(data= df, variable = x)
```

## Problem 1

Make a function that will calculate the mean, median, and standard deviation of any variable from a data frame using tidyverse functions.

Use `filter( !is.na() )` or `na.rm = TRUE` to remove missing variables.

Test your function on the `starwars` data `height` variable.

## Solution 1

```{r}
centers_of_data <- function(data, variable){
data|>
filter(!is.na({{variable}})) |>
summarise(mean = mean({{variable}}), median = median({{variable}}), sd= sd({{variable}}))
}
centers_of_data(starwars, height)
```

## Problem 2

Make a function that makes a bargraph of a categorical variable.

Test your function on the `starwars` `eye_color` variable.

## Solution 2

```{r}
my_bar_grapher <- function(data, variable){
data |>
ggplot(aes({{variable}}))+
geom_bar() +
theme_minimal()
}
my_bar_grapher(starwars, eye_color)
```
12 changes: 1 addition & 11 deletions course-materials/lectures/26_maps-ggplot.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ Coordinate Reference Systems (CRS) - Is the type of map projection employed To g
Which one will coord_sf default to?

```{r}
# ?coord_sf
?coord_sf
```

[This is the crs commonly used in gps.](https://epsg.io/4326)
Expand Down Expand Up @@ -132,13 +132,3 @@ ma_counties |> ggplot(aes(long,lat))+
Often times map data is saved as a shape file.

The US government has shape files [here](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html).

```{r}
map_url <- "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_region_500k.zip"
us_region <- download.file(url = map_url,
destfile = basename(map_url))
#regions <- read_sf("us_region_unzipped") |>
# sf::st_transform('+proj=longlat +datum=WGS84')
```
129 changes: 129 additions & 0 deletions course-materials/lectures/30_anonymous_functions.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
title: "Anonymous Functions"
author: "Schwab"
editor: visual
format:
revealjs:
theme: beige
execute:
echo: true
---

## Digression

Anonymous functions are functions without a name.

## A named function

```{r}
library(stringr)
#This function has a name, upper_case
upper_case <- function(word){
str_to_upper(word)
}
upper_case("hi")
```

## Anonymous functions

This is the same function without a name

```{r}
function(word) str_to_upper(word)
```

Its harder to use by itself, we often use them in combination with other functions.

## Using the anonymous function

The function after .f will only be used once, we don't need to name it.

```{r}
words <- c("I", "Like", "to", "eat", "cheese", "!")
purrr::map_chr(
.x = words,
.f = function(word) str_to_upper(word)
)
```

## Name an anonymous function

You can name an anonymous function.

```{r}
# This works with map_chhr() because str_to_upper is vectorized
upper_case_anon <- function(word) str_to_upper(word)
upper_case_anon(words)
```

## Does this work?

```{r}
# This works because str_to_upper is vectorized
upper_case_anon <- function(word)
str_to_upper(word)
upper_case_anon(words)
```

Yes, but it can get confusing.

## This does not work

```{r, eval=FALSE}
library(tidyverse)
confusing_function <- function(word)
upper_case_words<-
str_to_upper(word)
## This should lower the word's case, but it doesn't because its not part of the function.
str_to_lower(upper_case_words)
confusing_function(words)
```

## Last example fixed with

```{r}
confusing_function <- function(word){
upper_case_words<-
str_to_upper(word)
## Now it is part of the function.
str_to_lower(upper_case_words)
}
confusing_function(words)
```

## Summary

Use an anonymous function to do one thing.

If it get complicated give some {}

## More anonymous?

Since R version 4.1, May 2021.

Just use a `\(argument) operations`

```{r, eval=FALSE}
# This is also an anonymous function.
\(word) str_to_upper(word)
map_chr(.x = words,
.f = \(word) str_to_upper(word))
```

## Deep dive

There are actually lots of ways of making anonymous functions.

[Here are more.](https://stackoverflow.com/questions/52492462/define-an-anonymous-function-without-using-the-function-keyword) I like the idea of the lambda package.

## SDS 270 for more on functions in R
Loading

0 comments on commit 3f25435

Please sign in to comment.