Merge branch 'master' into master

b-rodrigues · Oct 2, 2023 · bca569e · bca569e
2 parents ab40a39 + e9ac3d7
commit bca569e
Show file tree

Hide file tree

Showing 15 changed files with 1,125 additions and 219 deletions.
diff --git a/fprog.qmd b/fprog.qmd
@@ -484,10 +484,10 @@ sqrt(-5)
 This only raises a warning and returns `NaN` (Not a Number). This can be quite
 dangerous, especially when working non-interactively, which is what we will be
 doing a lot later on. It is much better if a pipeline fails early due to an
-error, than dragging a `NaN` value. This also happens with `sqrt()`:
+error, than dragging a `NaN` value. This also happens with `log10()`:
 
 ```{r}
-sqrt(-10)
+log10(-10)
 ```
 
 So it could be useful to redefine these functions to raise an error instead, for
@@ -705,7 +705,6 @@ fact_iter <- function(n){
   result = 1
   for(i in 1:n){
     result = result * i
-    i = i + 1
   }
   result
 }

diff --git a/intro.qmd b/intro.qmd
diff --git a/lit_prog.qmd b/lit_prog.qmd
@@ -753,7 +753,7 @@ create this. This is this function:
 ```{r}
 return_section <- function(dataset, var){
   a <- knitr::knit_expand(text = c(
-               ## Frequency table for variable: {{variable}}",
+               "## Frequency table for variable: {{variable}}",
                 create_table(dataset, var)),
                 variable = var)
   cat(a, sep = "\n")
@@ -984,7 +984,7 @@ that I recommend tick the following two important boxes:
 - Work the same way regardless of output format (Word, PDF or Html);
 - Work for any type of table: summary tables, regression tables, two-way tables, etc.
 
-Let's start with the simplest type of table, which would is a table that simply
+Let's start with the simplest type of table, which would be a table that simply
 shows some rows of data. `{knitr}` comes with the `kable()` function, but this
 function generates a very plain looking output. For something
 publication-worthy, we recommend the `{flextable}` package, developed by

diff --git a/packages.qmd b/packages.qmd
@@ -39,7 +39,7 @@ But you could just as well start directly with an empty `{fusen}` package
 template, and then start your analysis from there. Package development with
 `{fusen}` is simply writing RMarkdown code.
 
-## Benefits of 
+## Benefits of packages
 
 Let’s first go over the benefits of turning your analysis into a package once
 again, as this is crucial.
@@ -466,7 +466,7 @@ to load the needed functions and data.
 Next step, move the functions `get_laspeyeres()` and `make_plot()` from
 `analyse_data.Rmd` to `save_data.Rmd`. Simply cut and paste these functions from
 one `.Rmd` to the other. Make sure `save_data.Rmd` looks something like
-[this](https://gist.githubusercontent.com/b-rodrigues/16952727d35355bf3b9cbd5f37843c20/raw/7e5dc76560f6cbe7281862dc7b6a2f00f79b485b/save_data.Rmd)^[https://is.gd/SpzL88],
+[this](https://raw.githubusercontent.com/b-rodrigues/rap4all/master/rmds/save_data_fusen.Rmd)^[https://is.gd/fusen_save_data],
 take a look at the end of the script to find the functions we’ve moved over. The
 `analyse_data.Rmd` script is exactly the same, minus the functions that we’ve
 just moved over.
@@ -478,8 +478,9 @@ like](https://is.gd/anRjt4)^[https://is.gd/anRjt4] (no worries, I’m going to
 explain how I got there). For consistency with your future use of {fusen}, you could also rename the `save_data.Rmd` to `flat_save_data.Rmd`, although this won't avoid {fusen} to work properly.
 
 Let’s start with the first function, `get_raw_data()`. If you compare the
-[before](https://gist.githubusercontent.com/b-rodrigues/16952727d35355bf3b9cbd5f37843c20/raw/7e5dc76560f6cbe7281862dc7b6a2f00f79b485b/save_data.Rmd)^[https://is.gd/n3m6In]
-and [after](https://is.gd/anRjt4)^[https://is.gd/anRjt4], the differences are
+[before](https://raw.githubusercontent.com/b-rodrigues/rap4all/master/rmds/save_data_fusen.Rmd)^[https://is.gd/fusen_save_data],
+and [after](https://raw.githubusercontent.com/b-rodrigues/rap4all/master/rmds/flat_save_data.Rmd
+QR code)^[https://is.gd/inflate_ready_save_data], the differences are
 that we have named the chunk containing the function, `function-get_raw_data`
 and added documentation in the form of `{roxygen2}` comments. Naming the chunks
 is essential: this is how `{fusen}` knows that this chunk contains a function
@@ -690,6 +691,7 @@ is that I’ve added examples:
 ````{verbatim}
 ```{r examples-get_laspeyeres, eval = FALSE}
 #' \dontrun{
+#' country_level_data_laspeyeres <- get_laspeyeres_index(country_level_data)
 #' commune_level_data_laspeyeres <- get_laspeyeres(commune_level_data)
 #' }
 ```
@@ -729,7 +731,7 @@ Something important to notice as well: my fusen-ready `.Rmd` file is simply
 called `save_data.Rmd`, while the generated, inflated file, that will be part of
 the package under the `vignettes/` folder is called `dev-save_data.Rmd`. 
 
-When you inflate you a flat file into a package, the R console will be verbose.
+When you inflate a flat file into a package, the R console will be verbose.
 This lists all files that are created or modified, but there is also a long list
 of checks that run automatically. This is the output of `devtools::check()` that
 is included inside `fusen::inflate()`. This function verifies that your package,
@@ -847,7 +849,7 @@ It is also possible to install the package from a specific branch:
 
 ```{r, eval = F}
 remotes::install_github(
-  "github_username/repository_name@repo_name"
+  "github_username/repository_name@branch_name"
 )
 ```
 
@@ -856,7 +858,7 @@ commit:
 
 ```{r, eval = F}
 remotes::install_github(
-  "github_username/repository_name@repo_name",
+  "github_username/repository_name@branch_name",
   ref = "commit_hash"
 )
 ```

diff --git a/project_rewrite.qmd b/project_rewrite.qmd
@@ -220,6 +220,7 @@ read_clean <- function(..., sheet){
    ) |>
     mutate(locality = str_trim(locality)) |>
     select(year, locality, n_offers, starts_with("average"))
+}
 
 ```
 
@@ -329,7 +330,7 @@ We’re now scraping data from Wikipedia of former Luxembourguish communes:
 
 ```{r}
 get_former_communes <- function(
-            url = "https://w.wiki/_wFe7",
+            url = "https://is.gd/lux_former_communes",
             min_year = 2009,
             table_position = 3
             ){
@@ -351,14 +352,18 @@ We can scrape current communes:
 
 ```{r}
 get_current_communes <- function(
-                 url = "https://w.wiki/6nPu",
-                 table_position = 1
+                 url = "https://is.gd/lux_communes",
+                 table_position = 2
                  ){
 
-  read_html(url) %>%
-    html_table() %>%
-    pluck(table_position) %>%
-    clean_names()
+  read_html(url) |>
+    html_table() |>
+    pluck(table_position) |>
+    clean_names() |>
+    filter(name_2 != "Name") |>
+    rename(commune = name_2) |>
+    mutate(commune = str_remove(commune, " .$"))
+
 }
 
 ```
@@ -388,7 +393,7 @@ get_test_communes <- function(former_communes, current_communes){
   communes[which(communes == "Clemency")] <- "Clémency"
   communes[which(communes == "Redange")] <- "Redange-sur-Attert"
   communes[which(communes == "Erpeldange-sur-Sûre")] <- "Erpeldange"
-  communes[which(communes == "Luxembourg-City")] <- "Luxembourg"
+  communes[which(communes == "Luxembourg City")] <- "Luxembourg"
   communes[which(communes == "Käerjeng")] <- "Kaerjeng"
   communes[which(communes == "Petange")] <- "Pétange"
 

diff --git a/project_start.qmd b/project_start.qmd
@@ -383,23 +383,35 @@ there. For this, we need a list of communes from Luxembourg. [Thankfully,
 Wikipedia has such a
 list](https://en.wikipedia.org/wiki/List_of_communes_of_Luxembourg)^[https://w.wiki/6nPu].
 
-Let's scrape and save this list:
+An issue with scraping tables off the web is that they might change in the
+future. It is therefore a good idea to save the page by right clicking on it and
+then selecting save as, and then re-hosting it. I use Github pages to re-host
+the Wikipedia page above
+[here](https://b-rodrigues.github.io/list_communes/)^[https://is.gd/lux_communes].
+I now have full control of this page, and won't get any bad surprises if someone
+decides to eventually update it. Instead of re-hosting it, you could simply save
+it as any other file of your project.
+
+So let's scrape and save this list:
 
 ```{r}
-current_communes <- "https://w.wiki/6nPu" |>
+current_communes <- "https://is.gd/lux_communes" |>
   rvest::read_html() |>
   rvest::html_table() |>
-  purrr::pluck(1) |>
-  janitor::clean_names()
+  purrr::pluck(2) |>
+  janitor::clean_names() |>
+  dplyr::filter(name_2 != "Name") |>
+  dplyr::rename(commune = name_2)
 ```
 
-We scrape the table from the Wikipedia page using `{rvest}`.
+We scrape the table from the re-hosted Wikipedia page using `{rvest}`.
 `rvest::html_table()` returns a list of tables from the Wikipedia table, and
-then we use `purrr::pluck()` to keep the first table from the website, which is
+then we use `purrr::pluck()` to keep the second table from the website, which is
 what we need (I made the calls to the packages explicit, because you might not
 be familiar with these packages). `janitor::clean_names()` transforms column
 names written for human eyes into machine-friendly names (for example `Growth
-rate in %` would be transformed to `growth_rate_in_percent`).
+rate in %` would be transformed to `growth_rate_in_percent`) and then I use
+the `{dplyr}` package for some further cleaning and renaming.
 
 Let’s see if we have all the communes in our data:
 
@@ -415,10 +427,11 @@ there’s also a less obvious reason; since 2010, several communes have merged
 into new ones. So there are communes that are in our data in 2010 and
 2011, but disappear from 2012 onwards. So we need to do several things: first,
 get a list of all existing communes from 2010 onwards, and then, harmonise
-spelling. Here again, we can use a list from Wikipedia:
+spelling. Here again, we can use a list from Wikipedia, and here again, I decide
+to re-host it on Github pages to avoid problems in the future:
 
 ```{r}
-former_communes <- "https://w.wiki/_wFe7" |>
+former_communes <- "https://is.gd/lux_former_communes" |>
   rvest::read_html() |>
   rvest::html_table() |>
   purrr::pluck(3) |>

diff --git a/repro_cont.qmd b/repro_cont.qmd
@@ -82,7 +82,7 @@ architecture with their Apple silicon CPUs (as of writing, the Mac Pro is the
 only computer manufactured by Apple that doesn't use an Apple silicon CPU and
 only because it was released in 2019) and it wouldn't surprise me if other
 manufacturers follow suit and develop their own ARM cpus. This means that
-projects written today may not run anymore in the future, because of this
+projects written today may not run anymore in the future, because of these
 architecture changes. Libraries compiled for current architectures would need to
 be recompiled for ARM, and that may be difficult.
 
@@ -534,7 +534,7 @@ Google search (but I'm giving it to you, dear reader, for free).
 
 Then come `RUN` statements. The first one uses Ubuntu's package manager to first
 refresh the repositories (this ensures that our local Ubuntu installation
-repositories are in synch with the latest software updates that were pushed to
+repositories are in sync with the latest software updates that were pushed to
 the central Ubuntu repos). Then we use Ubuntu's package manager to install
 `r-base`. `r-base` is the package that installs R. We then finish this
 Dockerfile by running `CMD ["R"]`. This is the command that will be executed
@@ -587,7 +587,7 @@ What is going on here? When you run a container, the command specified by `CMD`
 gets executed, and then the container quits. So here, the container ran the
 command `R`, which started the R interpreter, but then quit immediately. When
 quitting R, users should specify if they want to save or not save the workspace.
-This is what the message above is telling us. So, how can be use this? Is there
+This is what the message above is telling us. So, how can we use this? Is there
 a way to use this R version interactively?
 
 Yes, there is a way to use this R version boxed inside our Docker image
@@ -694,7 +694,7 @@ as a file. I’ll explain how later.
 The Rocker project offers many different images, which are described
 [here](https://rocker-project.org/images/)^[https://rocker-project.org/images/].
 We are going to be using the *versioned* images. These are images that ship
-specific versions of R. This way, it doesn't matter when the image gets build,
+specific versions of R. This way, it doesn't matter when the image gets built,
 the same version of R will be installed by getting built from source. Let me
 explain why building R from source is important. When we build the image from
 the Dockerfile we wrote before, R gets installed from the Ubuntu repositories.
@@ -882,7 +882,7 @@ and final step:
 
 This runs the `R` program from the Linux command line with the option `-e`. This
 option allows you to pass an `R` expression to the command line, which needs to
-be written between `""`. Using `R -e` will quickly become an habit, because this
+be written between `""`. Using `R -e` will quickly become a habit, because this
 is how you can run R non-interactively, from the command line. The expression we
 pass sets the working directory to `/home/housing`, and then we use
 `renv::init()` and `renv::restore()` to restore the packages from the
@@ -1086,7 +1086,7 @@ the R session in the right directory. So we move to the right directory, then we
 run the pipeline using `R -e "targets::tar_make()"`. Notice that we do both
 operations within a `RUN` statement. This means that the pipeline will run at
 build-time (remember, `RUN` statements run at build-time, `CMD` statements at
-run-time). In order words, the image will contain the outputs. This way, if the
+run-time). In other words, the image will contain the outputs. This way, if the
 build process and the pipeline take a long time to run, you can simply leave
 them running overnight for example. In the morning, while sipping on your
 coffee, you can then simply run the container to instantly get the outputs. This
@@ -1320,7 +1320,7 @@ By following these two rules, you should keep any issues to a minimum. When or
 if you need to update R and/or the package library on your machine, simply
 create a new Docker image that reflects these changes.
 
-However, if work in a field where operating system versions matter, then yes,
+However, if you work in a field where operating system versions matter, then yes,
 you should find a way to either use the dockerized environment for development,
 or you should install Ubuntu on your computer (the same version as in Docker of
 course).
@@ -1636,7 +1636,7 @@ needs mitigation, and thus a plan B. This plan B could be to host the images
 yourself, by saving them using `docker save`. Or you could even self-host an
 image registry (or lobby your employer/institution/etc to host a registry for
 its developers/data scientists/researchers). In any case, it's good to have
-options and now what potential risks using this technology entail.
+options and know what potential risks using this technology entail.
 
 ### Is Docker enough?
 

diff --git a/repro_intro.qmd b/repro_intro.qmd
@@ -216,7 +216,7 @@ can find in the `renv` folder. Let’s take a look at the contents of this folde
 
 ::: {.content-hidden when-format="pdf"}
 ```bash
-owner@localhost ➤ ls renv
+owner@localhost ➤ ls -la renv
 ```
 :::
 
@@ -575,7 +575,7 @@ The first problem, and I’m repeating myself here, is that `{renv}` only record
 the R version used for the project, but does not restore it when calling
 `renv::restore()`. You need to install the right R version yourself. On Windows
 this should be fairly easy to do, but then you need to start juggling R versions
-and know which scrips need which R version, which can get confusing.
+and know which scripts need which R version, which can get confusing.
 
 There is the `{rig}` package that makes it easy to install and switch between R
 versions that you could check