From 9981d79cd44ad9123b81a270b5af7afe1d3b977b Mon Sep 17 00:00:00 2001 From: Jenna Landy Date: Sun, 10 Dec 2023 00:11:42 -0500 Subject: [PATCH] compiling book --- docs/R/data-table.html | 49 ++++++---- docs/dataviz/dataviz-in-practice.html | 87 +++++++++--------- .../figure-html/bland-altman-1.png | Bin 83516 -> 83830 bytes .../figure-html/correct-transformation-1.png | Bin 91714 -> 91793 bytes .../installing-r-and-rstudio.html | 26 +++--- docs/sitemap.xml | 56 +++++------ docs/wrangling/data-table-wrangling.html | 36 ++++---- docs/wrangling/text-analysis.html | 49 +++++----- docs/wrangling/web-scraping.html | 2 +- 9 files changed, 161 insertions(+), 144 deletions(-) diff --git a/docs/R/data-table.html b/docs/R/data-table.html index 65bdda5..6b2ee04 100644 --- a/docs/R/data-table.html +++ b/docs/R/data-table.html @@ -2,7 +2,7 @@ - + Introduction to Data Science - 5  data.table @@ -88,7 +88,8 @@ "search-more-matches-text": "more matches in this document", "search-clear-button-title": "Clear", "search-detached-cancel-button-title": "Cancel", - "search-submit-button-title": "Submit" + "search-submit-button-title": "Submit", + "search-label": "Search" } } @@ -103,7 +104,7 @@ - @@ -113,7 +114,7 @@ -

y is actually referencing x, it is not an new opject: y just another name for x. Until you change y, a new object will not be made. However, the := function changes by reference so if you change x, a new object is not made and y continues to be just another name for x:

+

y is actually referencing x, it is not an new object: y just another name for x. Until you change y, a new object will not be made. However, the := function changes by reference so if you change x, a new object is not made and y continues to be just another name for x:

x[,a := 2]
 y
@@ -462,11 +463,11 @@ 

filter(murders, rate <= 0.7)

-

With data.table, we again use an approach similar to subsetting matrices, except data.table knows that rate refers to a column name and not an object in the R environment:

+

With data.table, we again use an approach similar to subsetting matrices, except like dplyr, data.table knows that rate refers to a column name and not an object in the R environment:

murders_dt[rate <= 0.7]
-

Notice that we can combine the filter and select into one succint command. Here are the state names and rates for those with rates below 0.7.

+

Notice that we can combine the filter and select into one succinct command. Here are the state names and rates for those with rates below 0.7.

murders_dt[rate <= 0.7, .(state, rate)]
 #>            state  rate
@@ -496,7 +497,7 @@ 

heights_dt <- as.data.table(heights)

-

In data.table, we can call functions inside .() and they will be applied to rows. So the equivalent of:

+

In data.table, we can call functions inside .() and they will be applied to columns So the equivalent of:

s <- heights |> summarize(avg = mean(height), sd = sd(height))
@@ -516,7 +517,7 @@

5.2.1 Multiple summaries

-

In Chapter 4, we defined the follwing function to permit multiple column summaries in dplyer:

+

In Chapter 4, we defined the following function to permit multiple column summaries in dplyr:

median_min_max <- function(x){
   qs <- quantile(x, c(0.5, 0, 1))
@@ -542,16 +543,26 @@ 

murders_dt[order(population)]

-

N To sort the table in descending order, we can order by the negative of population or use the decreasing argument:

+

To sort the table in descending order, we can order by the negative of population or use the decreasing argument:

murders_dt[order(population, decreasing = TRUE)] 

5.3.1 Nested sorting

-

Similarly, we can perform nested ordering by including more than one variable in order

+

Similarly, we can perform nested ordering by including more than one variable in order:

murders_dt[order(region, rate)] 
+
+
+
+ +
+
+

You are ready to do exercises 8-12.

+
+
+

5.4 Exercises

1. Load the data.table package and the murders dataset and convert it to data.table object:

@@ -576,7 +587,7 @@

murders_dt[state == "New York"]

You can use other logical vectors to filter rows.

-

Show the top 5 states with the highest murder rates. After we add murder rate and rank, do not change the murders dataset, just show the result. Remember that you can filter based on the rank column.

+

Show the top 5 states with the highest murder rates. From here on, do not change the murders dataset, just show the result. Remember that you can filter based on the rank column.

5. We can remove rows using the != operator. For example, to remove Florida, we would do this:

no_florida <- murders_dt[state != "Florida"]
@@ -596,16 +607,16 @@

library(NHANES)

-

8. We will provide some basic facts about blood pressure. First let’s select a group to set the standard. We will use 20-to-29-year-old females. AgeDecade is a categorical variable with these ages. Note that the category is coded like ” 20-29”, with a space in front! Use the data.table package to compute the average and standard deviation of systolic blood pressure as saved in the BPSysAve variable. Save it to a variable called ref.

+

8. We will provide some basic facts about blood pressure. First let’s select a group to set the standard. We will use 20-to-29-year-old females. AgeDecade is a categorical variable with these ages. Note that the category is coded like " 20-29", with a space in front! Use the data.table package to compute the average and standard deviation of systolic blood pressure as saved in the BPSysAve variable. Save it to a variable called ref.

9. Report the min and max values for the same group.

-

10. Compute the average and standard deviation for females, but for each age group separately rather than a selected decade as in question 1. Note that the age groups are defined by AgeDecade.

-

11. Repeat exercise 3 for males.

+

10. Compute the average and standard deviation for females, but for each age group separately rather than a selected decade as in exercise 8. Note that the age groups are defined by AgeDecade.

+

11. Repeat exercise 10 for males.

12. For males between the ages of 40-49, compare systolic blood pressure across race as reported in the Race1 variable. Order the resulting table from lowest to highest average systolic blood pressure.


    -
  1. https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html↩︎

  2. +
  3. https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html↩︎

@@ -104,7 +105,7 @@ - @@ -114,7 +115,7 @@ -

This is just a simple example of the many analyses one can perform with tidytext. To learn more, we again recommend the Tidy Text Mining book4.

+

This is just a simple example of the many analyses one can perform with tidytext. To learn more, we again recommend the Tidy Text Mining book5.

17.4 Exercises

Project Gutenberg is a digital archive of public domain books. The R package gutenbergr facilitates the importation of these texts into R.

@@ -713,7 +713,7 @@

gutenberg_metadata

1. Use str_detect to find the ID of the novel Pride and Prejudice.

-

2. We notice that there are several versions. The gutenberg_works() function filters this table to remove replicates and include only English language works. Read the help file and use this function to find the ID for Pride and Prejudice.

+

2. We notice that there are several versions. The gutenberg_works() function filters this table to remove replicates and include only English language works. Read the help file and use this function to find the ID for Pride and Prejudice.

3. Use the gutenberg_download function to download the text for Pride and Prejudice. Save it to an object called book.

4. Use the tidytext package to create a tidy table with all the words in the text. Save the table in an object called words

5. We will later make a plot of sentiment versus location in the book. For this, it will be useful to add a column with the word number to the table.

@@ -725,10 +725,11 @@


    -
  1. https://twitter.com/tvaziri/status/762005541388378112/photo/1↩︎

  2. -
  3. http://varianceexplained.org/r/trump-tweets/↩︎

  4. -
  5. https://www.tidytextmining.com/↩︎

  6. -
  7. https://www.tidytextmining.com/↩︎

  8. +
  9. https://twitter.com/tvaziri/status/762005541388378112/photo/1↩︎

  10. +
  11. http://varianceexplained.org/r/trump-tweets/↩︎

  12. +
  13. https://www.tidytextmining.com/↩︎

  14. +
  15. https://www.thetrumparchive.com/↩︎

  16. +
  17. https://www.tidytextmining.com/↩︎