diff --git a/.nojekyll b/.nojekyll index 522715c..f5c7d95 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -92c45738 \ No newline at end of file +ea187853 \ No newline at end of file diff --git a/search.json b/search.json index 4d1c257..a4f484d 100644 --- a/search.json +++ b/search.json @@ -46,42 +46,56 @@ "href": "slides/slide4.html#current-learning-objective", "title": "Joining two datasets", "section": " Current learning objective", - "text": "Current learning objective\n\n\n-Recognize the characteristics of tidy data\n\n\n-Differentiate between the Base and Tidyverse paradigms\n\n\n-Acquire the skills to add/modify columns, subset data by rows and columns, rename column names, and perform group operations using dplyr\n\n\n-Pivot data into longer or wider format using tidyr\n\n\nJoin datasets using dplyr" + "text": "Current learning objective\n\nPlease enable JavaScript to experience the dynamic code cell content on this page.\n\n\n-Recognize the characteristics of tidy data\n\n\n-Differentiate between the Base and Tidyverse paradigms\n\n\n-Acquire the skills to add/modify columns, subset data by rows and columns, rename column names, and perform group operations using dplyr\n\n\n-Pivot data into longer or wider format using tidyr\n\n\nJoin datasets using dplyr" }, { - "objectID": "slides/slide4.html#relational-data-models", - "href": "slides/slide4.html#relational-data-models", + "objectID": "slides/slide4.html#relational-model", + "href": "slides/slide4.html#relational-model", "title": "Joining two datasets", - "section": "Relational data models", - "text": "Relational data models\n\nRelational database consists of multiple, linked tables.\nIn tidy data, we considered each:\n\nvariable is a column, and\nobservation is a row.\n\nIn relational data model,\n\ncolumn is often referred to as attribute or field, and\nrow is often referred to as tuple or record." - }, - { - "objectID": "slides/slide4.html#portals-database", - "href": "slides/slide4.html#portals-database", - "title": "Joining two datasets", - "section": "Portals database", - "text": "Portals database\n\n\n\n\n\n\n\nErnest, Morgan; Brown, James; Valone, Thomas; White, Ethan P. (2018). Portal Project Teaching Database. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1314459.v10" + "section": "Relational model", + "text": "Relational model\n\n\n\n\n\n\n\n\n\n\nA relational model organizes data into one or more tables of columns and rows, with a unique key identifying each row based on Codd (1969, 1970).\nGenerally, each table represents one entity type.\nRelational database is a database based on the relational model of data.\n\n\n\n\n\nType\nTidy data\nRelational model\n\n\n\n\nvariable\ncolumn\nattribute/field\n\n\nobservation\nrow\ntuple/record\n\n\n\n\n\n\n\n\nCodd, E.F (1969), Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks, Research Report, IBM.\nCodd, E.F (1970). “A Relational Model of Data for Large Shared Data Banks”. Communications of the ACM. Classics. 13 (6): 377–87. doi:10.1145/362384.362685. S2CID 207549016. Archived from the original on 2007-06-12." }, { "objectID": "slides/slide4.html#primary-key", "href": "slides/slide4.html#primary-key", "title": "Joining two datasets", "section": "Primary key", - "text": "Primary key" + "text": "Primary key\n\n\n\nA primary key (sometimes called candidate key) is the smallest subset of columns that uniquely identifies each row in a table.\n\n\nPlease enable JavaScript to experience the dynamic code cell content on this page.\n\n\n\n\n\n\n\n\n\n\n\nData from Ernest, Morgan; Brown, James; Valone, Thomas; White, Ethan P. (2018). Portal Project Teaching Database. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1314459.v10" + }, + { + "objectID": "slides/slide4.html#simple-key-compound-key", + "href": "slides/slide4.html#simple-key-compound-key", + "title": "Joining two datasets", + "section": "Simple key & Compound key", + "text": "Simple key & Compound key\n\n\n\nIf only a single column then it is called a simple key.\nIf a key consists of more than one column then it is called a compound key.\nA table can also have no key (violating the relational model).\n\n\nPlease enable JavaScript to experience the dynamic code cell content on this page." }, { "objectID": "slides/slide4.html#foreign-key", "href": "slides/slide4.html#foreign-key", "title": "Joining two datasets", "section": "Foreign key", - "text": "Foreign key" + "text": "Foreign key\n\n\n\nA foreign key is a column in one table that uniquely identifies a row in another table.\n\n\nPlease enable JavaScript to experience the dynamic code cell content on this page." + }, + { + "objectID": "slides/slide4.html#joining-tables", + "href": "slides/slide4.html#joining-tables", + "title": "Joining two datasets", + "section": "Joining tables", + "text": "Joining tables\n\nTo join tables, we use the primary key in one table and the foreign key in another table.\nIn a relational model, a database has referential integrity if all relations between tables are valid. E.g.,\n\nAll primary key values must be unique and not missing.\nEach foreign key value must have a corresponding primary key value.\n\nIn a relational model, normalization aims to keep data organization as clean and simple as possible by avoiding redundant data entries." }, { "objectID": "slides/slide4.html#relationships", "href": "slides/slide4.html#relationships", "title": "Joining two datasets", "section": "Relationships", - "text": "Relationships\n\nOne-to-one\nOne-to-many\nMany-to-many" + "text": "Relationships\n\nOne-to-one\nOne-to-many\nMany-to-one\nMany-to-many" + }, + { + "objectID": "slides/slide4.html#joins-from-dplyr", + "href": "slides/slide4.html#joins-from-dplyr", + "title": "Joining two datasets", + "section": "Joins from dplyr", + "text": "Joins from dplyr\n\nPlease enable JavaScript to experience the dynamic code cell content on this page." }, { "objectID": "slides/slide4.html#summary", diff --git a/sitemap.xml b/sitemap.xml index 2225b84..048dc95 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -10,7 +10,7 @@ https://anu-bdsi.github.io/workshop-data-wrangling-R1/slides/slide4.html - 2024-04-01T04:44:24.990Z + 2024-04-01T08:13:15.615Z https://anu-bdsi.github.io/workshop-data-wrangling-R1/slides/slide2.html diff --git a/slides/images/chickweight.svg b/slides/images/chickweight.svg new file mode 100644 index 0000000..057eda2 --- /dev/null +++ b/slides/images/chickweight.svg @@ -0,0 +1,33 @@ + + + + + + +%0 + + + + + +ChickWeight + +ChickWeight + +weight + +Time + +Chick + +Diet + +Time, Chick + + + + + diff --git a/slides/images/chickwts.svg b/slides/images/chickwts.svg new file mode 100644 index 0000000..a556adb --- /dev/null +++ b/slides/images/chickwts.svg @@ -0,0 +1,27 @@ + + + + + + +%0 + + + + + +chickwts + +chickwts + +weight + +feed + + + + + diff --git a/slides/images/unnamed-chunk-3.svg b/slides/images/unnamed-chunk-3.svg index fba0f58..057eda2 100644 --- a/slides/images/unnamed-chunk-3.svg +++ b/slides/images/unnamed-chunk-3.svg @@ -4,33 +4,29 @@ - - + + %0 - - -plots - -plots - - - - -species - -species - - - - -surveys - -surveys - + + +ChickWeight + +ChickWeight + +weight + +Time + +Chick + +Diet + +Time, Chick + diff --git a/slides/images/unnamed-chunk-4.svg b/slides/images/unnamed-chunk-4.svg new file mode 100644 index 0000000..a556adb --- /dev/null +++ b/slides/images/unnamed-chunk-4.svg @@ -0,0 +1,27 @@ + + + + + + +%0 + + + + + +chickwts + +chickwts + +weight + +feed + + + + + diff --git a/slides/slide1.html b/slides/slide1.html index 20f3d2e..f21e4c8 100644 --- a/slides/slide1.html +++ b/slides/slide1.html @@ -6067,7 +6067,7 @@

Summary

Exercise time

-

20:00

+

20:00

diff --git a/slides/slide2.html b/slides/slide2.html index 9533fa7..da7188a 100644 --- a/slides/slide2.html +++ b/slides/slide2.html @@ -7528,7 +7528,7 @@ }; // Store cell data - globalThis.qwebrCellDetails = [{"code":"mtcars","id":1,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-1","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars[c(\"mpg\", \"cyl\")] # by column names\nmtcars[1:2] # by column names","id":2,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-2","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars[, c(\"mpg\", \"cyl\")] # by column names\nmtcars[, 1:2] # by index","id":3,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-3","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars[, \"mpg\"]\nmtcars[, \"mpg\", drop = FALSE] # to preserve the output as a `data.frame`","id":4,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-4","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars[3:1, ] # using index\nmtcars[c(\"Datsun 710\", \"Mazda RX4\"), ] # using row names (if it has row names)\nmtcars[mtcars$mpg > 31, ] # using a logical vector\nsubset(mtacars, mpg > 31) # using \"non-standard evaluation\"","id":5,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-5","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"df1 <- cbind(mtcars, gpm = 1 / mtcars$mpg)\ndf1$gpm <- 1 / df1$mpg\ndf1[[\"gpm\"]] <- 1 / df1$mpg\ndf1$wt[df1$cyl==6] <- 10 # modify only a part of it","id":6,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-6","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"df2 <- rbind(cars, data.frame(dist = 10, speed = 3))\ntail(df2, 3)\n\ndf2 <- rbind(cars, c(10, 3))\ntail(df2, 3)","id":7,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-7","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars[, sort(names(mtcars))]","id":8,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-8","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"order(mtcars$mpg)\nmtcars[order(mtcars$mpg),]","id":9,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-9","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"tapply(mtcars$wt, mtcars$gear, mean)","id":10,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-10","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"tapply(mtcars$wt, list(mtcars$gear, mtcars$vs), median)","id":11,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-11","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"rename(mtcars, miles_per_gallon = mpg)\narrange(mtcars, wt)","id":12,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-12","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> # take mtcars data, and then\n rename(miles_per_gallon = mpg) |> # rename mpg as miles_per_gallon, and then\n arrange(wt) # arrange row by wt","id":13,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-13","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> select(1:3) # by index\nmtcars |> select(mpg, cyl, disp) # by name\nmtcars |> select(mpg:disp) # by contiguous columns\nmtcars |> select(-mpg) # exclude mpg","id":14,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-14","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"help(language, package = \"tidyselect\")","id":15,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-15","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"as_tibble(mtcars)","id":16,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-16","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> select(mpg, cyl)\nmtcars |> select(\"mpg\", \"cyl\")\nmtcars |> select(mpg)\nmtcars |> pull(mpg)","id":17,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-17","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> slice(1:3)\nmtcars |> filter(mpg > 20)","id":18,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-18","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> \n bind_rows(data.frame(dist = 10, speed = 3))","id":19,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-19","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> select(sort(colnames(mtcars)))\nmtcars |> select(wt, gears, everything())\nmtcars |> relocate(am, carb, .before = cyl)","id":20,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-20","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> arrange(desc(mpg)) # sort by mpg in descending order","id":21,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-21","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> \n group_by(gear) |> \n summarise(avg_wt = mean(wt))\n\nmtcars |> summarise(avg_wt = mean(wt), .by = gear)\nmtcars |> summarise(avg_wt = mean(wt), .by = c(gear, vs))","id":22,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-22","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}},{"code":"mtcars |> summarise(across(everything(), mean), .by = gear)\nmtcars |> summarise(across(where(~n_distinct(.x) > 10), mean), .by = gear)\nmtcars |> rowwise() |> summarise(score = sum(c_across(disp:wt)))","id":23,"options":{"context":"interactive","message":"true","read-only":"false","label":"unnamed-chunk-23","fig-cap":"","results":"markup","warning":"true","output":"true","autorun":"false","out-width":"700px","comment":"","out-height":"","classes":"","fig-width":7,"dpi":72,"fig-height":5}}]; + globalThis.qwebrCellDetails = [{"id":1,"code":"mtcars","options":{"results":"markup","label":"unnamed-chunk-1","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":2,"code":"mtcars[c(\"mpg\", \"cyl\")] # by column names\nmtcars[1:2] # by column names","options":{"results":"markup","label":"unnamed-chunk-2","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":3,"code":"mtcars[, c(\"mpg\", \"cyl\")] # by column names\nmtcars[, 1:2] # by index","options":{"results":"markup","label":"unnamed-chunk-3","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":4,"code":"mtcars[, \"mpg\"]\nmtcars[, \"mpg\", drop = FALSE] # to preserve the output as a `data.frame`","options":{"results":"markup","label":"unnamed-chunk-4","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":5,"code":"mtcars[3:1, ] # using index\nmtcars[c(\"Datsun 710\", \"Mazda RX4\"), ] # using row names (if it has row names)\nmtcars[mtcars$mpg > 31, ] # using a logical vector\nsubset(mtacars, mpg > 31) # using \"non-standard evaluation\"","options":{"results":"markup","label":"unnamed-chunk-5","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":6,"code":"df1 <- cbind(mtcars, gpm = 1 / mtcars$mpg)\ndf1$gpm <- 1 / df1$mpg\ndf1[[\"gpm\"]] <- 1 / df1$mpg\ndf1$wt[df1$cyl==6] <- 10 # modify only a part of it","options":{"results":"markup","label":"unnamed-chunk-6","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":7,"code":"df2 <- rbind(cars, data.frame(dist = 10, speed = 3))\ntail(df2, 3)\n\ndf2 <- rbind(cars, c(10, 3))\ntail(df2, 3)","options":{"results":"markup","label":"unnamed-chunk-7","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":8,"code":"mtcars[, sort(names(mtcars))]","options":{"results":"markup","label":"unnamed-chunk-8","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":9,"code":"order(mtcars$mpg)\nmtcars[order(mtcars$mpg),]","options":{"results":"markup","label":"unnamed-chunk-9","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":10,"code":"tapply(mtcars$wt, mtcars$gear, mean)","options":{"results":"markup","label":"unnamed-chunk-10","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":11,"code":"tapply(mtcars$wt, list(mtcars$gear, mtcars$vs), median)","options":{"results":"markup","label":"unnamed-chunk-11","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":12,"code":"rename(mtcars, miles_per_gallon = mpg)\narrange(mtcars, wt)","options":{"results":"markup","label":"unnamed-chunk-12","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":13,"code":"mtcars |> # take mtcars data, and then\n rename(miles_per_gallon = mpg) |> # rename mpg as miles_per_gallon, and then\n arrange(wt) # arrange row by wt","options":{"results":"markup","label":"unnamed-chunk-13","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":14,"code":"mtcars |> select(1:3) # by index\nmtcars |> select(mpg, cyl, disp) # by name\nmtcars |> select(mpg:disp) # by contiguous columns\nmtcars |> select(-mpg) # exclude mpg","options":{"results":"markup","label":"unnamed-chunk-14","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":15,"code":"help(language, package = \"tidyselect\")","options":{"results":"markup","label":"unnamed-chunk-15","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":16,"code":"as_tibble(mtcars)","options":{"results":"markup","label":"unnamed-chunk-16","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":17,"code":"mtcars |> select(mpg, cyl)\nmtcars |> select(\"mpg\", \"cyl\")\nmtcars |> select(mpg)\nmtcars |> pull(mpg)","options":{"results":"markup","label":"unnamed-chunk-17","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":18,"code":"mtcars |> slice(1:3)\nmtcars |> filter(mpg > 20)","options":{"results":"markup","label":"unnamed-chunk-18","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":19,"code":"mtcars |> \n bind_rows(data.frame(dist = 10, speed = 3))","options":{"results":"markup","label":"unnamed-chunk-19","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":20,"code":"mtcars |> select(sort(colnames(mtcars)))\nmtcars |> select(wt, gears, everything())\nmtcars |> relocate(am, carb, .before = cyl)","options":{"results":"markup","label":"unnamed-chunk-20","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":21,"code":"mtcars |> arrange(desc(mpg)) # sort by mpg in descending order","options":{"results":"markup","label":"unnamed-chunk-21","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":22,"code":"mtcars |> \n group_by(gear) |> \n summarise(avg_wt = mean(wt))\n\nmtcars |> summarise(avg_wt = mean(wt), .by = gear)\nmtcars |> summarise(avg_wt = mean(wt), .by = c(gear, vs))","options":{"results":"markup","label":"unnamed-chunk-22","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}},{"id":23,"code":"mtcars |> summarise(across(everything(), mean), .by = gear)\nmtcars |> summarise(across(where(~n_distinct(.x) > 10), mean), .by = gear)\nmtcars |> rowwise() |> summarise(score = sum(c_across(disp:wt)))","options":{"results":"markup","label":"unnamed-chunk-23","comment":"","message":"true","autorun":"false","classes":"","dpi":72,"output":"true","warning":"true","out-width":"700px","out-height":"","fig-cap":"","context":"interactive","read-only":"false","fig-width":7,"fig-height":5}}]; + + + + + + + + + + + +
@@ -349,6 +1405,8 @@

Joining two datasets

Current learning objective

+
+
-
-

Relational data models

- +
+ + + + + + + + + + + + + + + + + + + + +
TypeTidy dataRelational model
variablecolumnattribute/field
observationrowtuple/record
+
+ + + +
+
+

Primary key

+
+
    -
  • column is often referred to as attribute or field, and
  • -
  • row is often referred to as tuple or record.
  • -
+
  • A primary key (sometimes called candidate key) is the smallest subset of columns that uniquely identifies each row in a table.
  • -
    -
    -

    Portals database

    +
    + + +
    -

    +

    +
    +
    -
    -

    Primary key

    +
    +

    Simple key & Compound key

    +
    +
    +
      +
    • If only a single column then it is called a simple key.
    • +
    • If a key consists of more than one column then it is called a compound key.
    • +
    • A table can also have no key (violating the relational model).
    • +
    +
    + +
    +
    +
    +
    +
    +

    +
    +
    +
    +
    +

    +
    +
    +
    +
    +

    Foreign key

    +
    +
    +
      +
    • A foreign key is a column in one table that uniquely identifies a row in another table.
    • +
    +
    + +
    +
    +
    +
    +

    +
    +
    +
    +
    +
    +
    +

    Joining tables

    +
      +
    • To join tables, we use the primary key in one table and the foreign key in another table.
    • +
    • In a relational model, a database has referential integrity if all relations between tables are valid. E.g., +
        +
      • All primary key values must be unique and not missing.
      • +
      • Each foreign key value must have a corresponding primary key value.
      • +
    • +
    • In a relational model, normalization aims to keep data organization as clean and simple as possible by avoiding redundant data entries.
    • +

    Relationships

    • One-to-one
    • One-to-many
    • +
    • Many-to-one
    • Many-to-many

    Mutating join

    - +

    from dplyr

    +
    +
    +

    Joins from dplyr

    +
    +

    Summary

    @@ -477,7 +1635,7 @@

    Summary

    Exercise time

    -

    20:00

    +

    20:00

    @@ -887,6 +2045,118 @@

    Exercise time

    } }); + \ No newline at end of file