day 30

Nics-Github · Nov 11, 2024 · 3f25435 · 3f25435
1 parent de6efdc
commit 3f25435
Show file tree

Hide file tree

Showing 26 changed files with 3,976 additions and 943 deletions.
diff --git a/_freeze/site_libs/revealjs/dist/theme/quarto.css b/_freeze/site_libs/revealjs/dist/theme/quarto.css
diff --git a/chapter 7 examples.qmd b/chapter 7 examples.qmd
@@ -45,12 +45,13 @@ library(tidyverse)
 library(mosaicData)
 
 my_func <- function(data,variable2){
-  my_plot <- data |>
-    ggplot(aes(x = cesd))+
-    aes_string(y = variable2)+
+  data |>
+    ggplot(aes(x = cesd, y = .data[[variable2]]))+
     geom_point()
 }
 
+my_func(HELPrct, age)
+
 numeric_names_only <- HELPrct |>
   select("age":"d1")|>
   names()
@@ -69,12 +70,12 @@ library(tidyverse)
 library(mosaicData)
 
 my_func <- function(data,variable2){
-  my_plot <- data |>
-    ggplot(aes(x = cesd, y = {{variable2}}))+
+  data |>
+    ggplot(aes(x = cesd, y = variable2 ))+
     geom_point()
 }
 
-my_func(HELPrct, age)
+
 
 ```
 

diff --git a/chapter 7 examples_files/execute-results/html.json b/chapter 7 examples_files/execute-results/html.json
@@ -0,0 +1,16 @@
+{
+  "hash": "037e0265372063b7a6475c0e653169e0",
+  "result": {
+    "markdown": "---\ntitle: \"map() practice\"\nauthor: \"Schwab\"\neditor: visual\n---\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──\n✔ dplyr     1.1.4     ✔ readr     2.1.5\n✔ forcats   1.0.0     ✔ stringr   1.5.1\n✔ ggplot2   3.5.1     ✔ tibble    3.2.1\n✔ lubridate 1.9.3     ✔ tidyr     1.3.1\n✔ purrr     1.0.2     \n── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──\n✖ dplyr::filter() masks stats::filter()\n✖ dplyr::lag()    masks stats::lag()\nℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors\n```\n:::\n\n```{.r .cell-code}\nlibrary(mosaicData)\n```\n:::\n\n\nDoing problems 1 and 3 from [here](https://mdsr-book.github.io/mdsr3e/07-iteration.html#iteration-exercises)\n\n**Problem 1 (Easy)**: Use the `HELPrct` data from the `mosaicData` to calculate the mean of all numeric variables (be sure to exclude missing values).\\\n\n\n::: {.cell}\n\n```{.r .cell-code}\nHELPrct |>\n  summarise(\n    across(.cols = where(is.numeric),\n           .fns = ~mean(. , na.rm = TRUE))\n  )\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n       age anysubstatus     cesd       d1 daysanysub dayslink drugrisk      e2b\n1 35.65342    0.7723577 32.84768 3.059603   75.30738 255.6056 1.887168 2.504673\n     female       i1       i2       id   indtot linkstatus      mcs      pcs\n1 0.2362031 17.90728 24.54746 233.4018 35.72848  0.3781903 31.67668 48.04854\n    pss_fr  sexrisk avg_drinks max_drinks hospitalizations\n1 6.706402 4.642384   17.90728   24.54746         3.059603\n```\n:::\n:::\n\n\n**Problem 3 (Medium)**: Use routines from the `purrr` package and the `HELPrct` data frame from the `mosaicData` package to fit a regression model predicting `cesd` as a function of `age` separately for each of the levels of the `substance` variable. Generate a formatted table (with suitable caption) of results\n(estimates and confidence intervals) for the slope parameter for each level of the grouping variable.\n\n(Hint: Use `group_by()` and `group_modify()` to fit the regression model on each part. Note that `broom::tidy()` is useful in having the output of the model be a data frame.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nHELPrct |> select(substance, cesd, age) |>\ngroup_by(substance)|>\n  group_modify(.f = ~broom::tidy( lm(cesd~age, data = .))) |> \n  filter(term ==\"age\") |>\n  mutate(CI_low = estimate - 2*std.error, CI_high = estimate + 2*std.error) |>\n  select(-statistic, - p.value, - std.error) |>\n  kableExtra::kable(caption = \"The likelyhood someone uses a substance increates with age.\", digits = 1)\n```\n\n::: {.cell-output-display}\nTable: The likelyhood someone uses a substance increates with age.\n\n|substance |term | estimate| CI_low| CI_high|\n|:---------|:----|--------:|------:|-------:|\n|alcohol   |age  |      0.3|    0.1|     0.6|\n|cocaine   |age  |     -0.3|   -0.6|     0.0|\n|heroin    |age  |     -0.2|   -0.5|     0.0|\n:::\n:::\n\n\nActivity 2\n\nMake a function that takes a df and a variable then produces a scatter plot of that variable vs cesd. Use it to plot multiple graphs of the first 6 numeric variables HELPrct. [Hint: check out the extended example from 7.7](https://mdsr-book.github.io/mdsr3e/07-iteration.html#extended-example-factors-associated-with-bmi)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(mosaicData)\n\nmy_func <- function(data,variable2){\n  my_plot <- data |>\n    ggplot(aes(x = cesd))+\n    aes_string(y = variable2)+\n    geom_point()\n}\n\nnumeric_names_only <- HELPrct |>\n  select(\"age\":\"d1\")|>\n  names()\n\n\nmap(.x = numeric_names_only,\n    .f = my_func,\n    data= HELPrct\n      )|>\n  patchwork::wrap_plots(ncol = 2)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: `aes_string()` was deprecated in ggplot2 3.0.0.\nℹ Please use tidy evaluation idioms with `aes()`.\nℹ See also `vignette(\"ggplot2-in-packages\")` for more information.\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 207 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output-display}\n![](chapter-7-examples_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(mosaicData)\n\nmy_func <- function(data,variable2){\n  my_plot <- data |>\n    ggplot(aes(x = cesd, y = {{variable2}}))+\n    geom_point()\n}\n\nmy_func(HELPrct, age)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(NHANES)\n\nbmi_plot <- function(.data, x_var) {\n  ggplot(.data, aes(y = BMI)) +\n    aes_string(x = x_var) + \n    geom_jitter(alpha = 0.3) + \n    geom_smooth() + \n    labs(\n      title = paste(\"BMI by\", x_var),\n      subtitle = \"NHANES\",\n      caption = \"US National Center for Health Statistics (NCHS)\"\n    )\n}\nbmi_plot(NHANES, \"Age\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output-display}\n![](chapter-7-examples_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n\n```{.r .cell-code}\nc(\"Age\", \"HHIncomeMid\", \"PhysActiveDays\", \n  \"TVHrsDay\", \"AlcoholDay\", \"Pulse\") |>\n  map(bmi_plot, .data = NHANES) |>\n  patchwork::wrap_plots(ncol = 2)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing non-finite outside the scale range\n(`stat_smooth()`).\nRemoved 366 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 1148 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 1148 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 5442 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Failed to fit group -1.\nCaused by error in `smooth.construct.cr.smooth.spec()`:\n! x has insufficient unique values to support 10 knots: reduce k.\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 5442 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 366 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 5113 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 5113 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 1502 rows containing non-finite outside the scale range\n(`stat_smooth()`).\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Removed 1502 rows containing missing values or values outside the scale range\n(`geom_point()`).\n```\n:::\n\n::: {.cell-output-display}\n![](chapter-7-examples_files/figure-html/unnamed-chunk-6-2.png){width=672}\n:::\n:::\n",
+    "supporting": [
+      "chapter-7-examples_files"
+    ],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
diff --git a/course-materials/lectures/24_data_masking.qmd b/course-materials/lectures/24_data_masking.qmd
@@ -0,0 +1,177 @@
+---
+title: "data masking practice"
+format: 
+  revealjs:
+    theme: beige
+
+editor: visual
+execute: 
+  echo: true
+---
+
+```{r}
+library(tidyverse)
+```
+
+# Scope
+
+Local vs. Global variables
+
+```{r}
+x <- 10
+
+df <- tribble(
+  ~x, ~y, ~ z ,
+  5, 0 , 0
+)
+```
+
+x and df are global. They are referred to as environment variables.
+
+df#x and df\$y are local to the dataframe. They are often referred to as stats-variables.
+
+## The dollar sign operator
+
+We don't use the dollar sign operator often, because we use the tidyverse which makes it unnecessary.
+
+`$` selects a variable.
+
+```{r}
+df$x
+# or 
+df[1:3]
+```
+
+We can select a variable with tidyr from the tidyverse.
+
+```{r}
+df |>
+  select(x,y,z)
+```
+
+This makes things easier to read and less redundant.
+
+## Make a function
+
+```{r}
+my_printer <- function(variable){
+  print(variable)
+}
+```
+
+To print out 5 we need to tell R where to find the variable 5.
+
+```{r}
+my_printer(x)
+my_printer(df$x)
+```
+
+my_printer() looks through the environment first.
+
+So my_printer(x)=10 and not 5.
+
+## tidyverse problems
+
+The tidyverse uses a process called masking to blur the lines between environment and data variables.
+
+This is why we like it so much.
+
+However it makes programming with the tidyverse functions more challenging.
+
+## Use summarize to calculate a mean.
+
+The code below works as expected because tidyr is working under the hood to select x from df.
+
+```{r}
+df |>
+    summarise(mean(x))
+```
+
+## A mean calculating function
+
+We get into problems when using tidyverse functions within homemade functions:
+
+```{r}
+my_mean_maker <- function(data, variable){
+  data |>
+    summarise(mean(variable))
+}
+```
+
+```{r}
+my_mean_maker(data= df, variable = x)
+```
+
+We're still getting 10 in our output! It should be 5, because that is the value from the data frame.
+
+my_mean_maker() goes to the global environment because it doesn't recognize that it should be looking in the data frame.
+
+## When is tidyr working?
+
+We can check to see if tidyr is working under the hood by checking the arguments of a function.
+
+We are looking for the words "data-masking" or "tidy-select".
+
+If these words are not present we will not need to inject the data (aka unmask the data).
+
+```{r}
+?summarise
+```
+
+## Solution
+
+We need to inject the data into the sumarize function. We can do this by **unmasking** the data with {{}}.
+
+This tells R to look for the variable within the dataframe, not the global environment.
+
+```{r}
+my_mean_maker <- function(data, variable){
+  data |>
+    summarise(mean({{variable}}))
+}
+```
+
+```{r}
+my_mean_maker(data= df, variable = x)
+```
+
+## Problem 1
+
+Make a function that will calculate the mean, median, and standard deviation of any variable from a data frame using tidyverse functions.
+
+Use `filter( !is.na() )` or `na.rm = TRUE` to remove missing variables.
+
+Test your function on the `starwars` data `height` variable.
+
+## Solution 1
+
+```{r}
+centers_of_data <- function(data, variable){
+  data|>
+    filter(!is.na({{variable}})) |>
+    summarise(mean = mean({{variable}}), median = median({{variable}}), sd= sd({{variable}}))
+}
+
+
+centers_of_data(starwars, height)
+```
+
+## Problem 2
+
+Make a function that makes a bargraph of a categorical variable.
+
+Test your function on the `starwars` `eye_color` variable.
+
+## Solution 2
+
+```{r}
+my_bar_grapher <- function(data, variable){
+  data |>
+    ggplot(aes({{variable}}))+
+    geom_bar() +
+    theme_minimal()
+}
+
+my_bar_grapher(starwars, eye_color)
+
+```
diff --git a/course-materials/lectures/26_maps-ggplot.qmd b/course-materials/lectures/26_maps-ggplot.qmd
@@ -78,7 +78,7 @@ Coordinate Reference Systems (CRS) - Is the type of map projection employed To g
 Which one will coord_sf default to?
 
 ```{r}
-# ?coord_sf
+?coord_sf
 ```
 
 [This is the crs commonly used in gps.](https://epsg.io/4326)
@@ -132,13 +132,3 @@ ma_counties |> ggplot(aes(long,lat))+
 Often times map data is saved as a shape file.
 
 The US government has shape files [here](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html).
-
-```{r}
-map_url <- "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_region_500k.zip"
-us_region <- download.file(url = map_url,
-                           destfile = basename(map_url))
-
-#regions <- read_sf("us_region_unzipped") |>
-#  sf::st_transform('+proj=longlat +datum=WGS84')
-
-```
diff --git a/course-materials/lectures/30_anonymous_functions.qmd b/course-materials/lectures/30_anonymous_functions.qmd
@@ -0,0 +1,129 @@
+---
+title: "Anonymous Functions"
+author: "Schwab"
+editor: visual
+format: 
+  revealjs:
+    theme: beige
+execute: 
+  echo: true
+---
+
+## Digression
+
+Anonymous functions are functions without a name.
+
+## A named function
+
+```{r}
+library(stringr)
+
+#This function has a name, upper_case
+
+upper_case <-  function(word){
+  str_to_upper(word)
+}
+
+upper_case("hi")
+```
+
+## Anonymous functions
+
+This is the same function without a name
+
+```{r}
+function(word) str_to_upper(word)
+```
+
+Its harder to use by itself, we often use them in combination with other functions.
+
+## Using the anonymous function
+
+The function after .f will only be used once, we don't need to name it.
+
+```{r}
+words <- c("I", "Like", "to", "eat", "cheese", "!")
+purrr::map_chr(
+  .x = words,
+  .f = function(word) str_to_upper(word)
+)
+```
+
+## Name an anonymous function
+
+You can name an anonymous function.
+
+```{r}
+# This works with map_chhr() because str_to_upper is vectorized
+
+upper_case_anon <- function(word) str_to_upper(word)
+upper_case_anon(words)
+```
+
+## Does this work?
+
+```{r}
+# This works because str_to_upper is vectorized
+
+upper_case_anon <- function(word) 
+  str_to_upper(word)
+
+upper_case_anon(words)
+```
+
+Yes, but it can get confusing.
+
+## This does not work
+
+```{r, eval=FALSE}
+library(tidyverse)
+
+confusing_function <- function(word) 
+  upper_case_words<- 
+    str_to_upper(word)
+## This should lower the word's case, but it doesn't because its not part of the function.  
+  str_to_lower(upper_case_words)
+
+confusing_function(words)
+```
+
+## Last example fixed with
+
+```{r}
+confusing_function <- function(word){ 
+  upper_case_words<- 
+    str_to_upper(word)
+  ## Now it is part of the function.  
+  str_to_lower(upper_case_words)
+}
+  
+confusing_function(words)
+```
+
+## Summary
+
+Use an anonymous function to do one thing.
+
+If it get complicated give some {}
+
+## More anonymous?
+
+Since R version 4.1, May 2021.
+
+Just use a `\(argument) operations`
+
+```{r, eval=FALSE}
+# This is also an anonymous function. 
+\(word) str_to_upper(word)
+
+map_chr(.x = words,
+    .f = \(word) str_to_upper(word))
+```
+
+## Deep dive
+
+There are actually lots of ways of making anonymous functions.
+
+[Here are more.](https://stackoverflow.com/questions/52492462/define-an-anonymous-function-without-using-the-function-keyword) I like the idea of the lambda package.
+
+## SDS 270 for more on functions in R