Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
b-rodrigues authored Oct 2, 2023
2 parents ab40a39 + e9ac3d7 commit bca569e
Show file tree
Hide file tree
Showing 15 changed files with 1,125 additions and 219 deletions.
5 changes: 2 additions & 3 deletions fprog.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -484,10 +484,10 @@ sqrt(-5)
This only raises a warning and returns `NaN` (Not a Number). This can be quite
dangerous, especially when working non-interactively, which is what we will be
doing a lot later on. It is much better if a pipeline fails early due to an
error, than dragging a `NaN` value. This also happens with `sqrt()`:
error, than dragging a `NaN` value. This also happens with `log10()`:

```{r}
sqrt(-10)
log10(-10)
```

So it could be useful to redefine these functions to raise an error instead, for
Expand Down Expand Up @@ -705,7 +705,6 @@ fact_iter <- function(n){
result = 1
for(i in 1:n){
result = result * i
i = i + 1
}
result
}
Expand Down
126 changes: 63 additions & 63 deletions intro.qmd

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions lit_prog.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -753,7 +753,7 @@ create this. This is this function:
```{r}
return_section <- function(dataset, var){
a <- knitr::knit_expand(text = c(
## Frequency table for variable: {{variable}}",
"## Frequency table for variable: {{variable}}",
create_table(dataset, var)),
variable = var)
cat(a, sep = "\n")
Expand Down Expand Up @@ -984,7 +984,7 @@ that I recommend tick the following two important boxes:
- Work the same way regardless of output format (Word, PDF or Html);
- Work for any type of table: summary tables, regression tables, two-way tables, etc.

Let's start with the simplest type of table, which would is a table that simply
Let's start with the simplest type of table, which would be a table that simply
shows some rows of data. `{knitr}` comes with the `kable()` function, but this
function generates a very plain looking output. For something
publication-worthy, we recommend the `{flextable}` package, developed by
Expand Down
16 changes: 9 additions & 7 deletions packages.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ But you could just as well start directly with an empty `{fusen}` package
template, and then start your analysis from there. Package development with
`{fusen}` is simply writing RMarkdown code.

## Benefits of
## Benefits of packages

Let’s first go over the benefits of turning your analysis into a package once
again, as this is crucial.
Expand Down Expand Up @@ -466,7 +466,7 @@ to load the needed functions and data.
Next step, move the functions `get_laspeyeres()` and `make_plot()` from
`analyse_data.Rmd` to `save_data.Rmd`. Simply cut and paste these functions from
one `.Rmd` to the other. Make sure `save_data.Rmd` looks something like
[this](https://gist.githubusercontent.com/b-rodrigues/16952727d35355bf3b9cbd5f37843c20/raw/7e5dc76560f6cbe7281862dc7b6a2f00f79b485b/save_data.Rmd)^[https://is.gd/SpzL88],
[this](https://raw.githubusercontent.com/b-rodrigues/rap4all/master/rmds/save_data_fusen.Rmd)^[https://is.gd/fusen_save_data],
take a look at the end of the script to find the functions we’ve moved over. The
`analyse_data.Rmd` script is exactly the same, minus the functions that we’ve
just moved over.
Expand All @@ -478,8 +478,9 @@ like](https://is.gd/anRjt4)^[https://is.gd/anRjt4] (no worries, I’m going to
explain how I got there). For consistency with your future use of {fusen}, you could also rename the `save_data.Rmd` to `flat_save_data.Rmd`, although this won't avoid {fusen} to work properly.

Let’s start with the first function, `get_raw_data()`. If you compare the
[before](https://gist.githubusercontent.com/b-rodrigues/16952727d35355bf3b9cbd5f37843c20/raw/7e5dc76560f6cbe7281862dc7b6a2f00f79b485b/save_data.Rmd)^[https://is.gd/n3m6In]
and [after](https://is.gd/anRjt4)^[https://is.gd/anRjt4], the differences are
[before](https://raw.githubusercontent.com/b-rodrigues/rap4all/master/rmds/save_data_fusen.Rmd)^[https://is.gd/fusen_save_data],
and [after](https://raw.githubusercontent.com/b-rodrigues/rap4all/master/rmds/flat_save_data.Rmd
QR code)^[https://is.gd/inflate_ready_save_data], the differences are
that we have named the chunk containing the function, `function-get_raw_data`
and added documentation in the form of `{roxygen2}` comments. Naming the chunks
is essential: this is how `{fusen}` knows that this chunk contains a function
Expand Down Expand Up @@ -690,6 +691,7 @@ is that I’ve added examples:
````{verbatim}
```{r examples-get_laspeyeres, eval = FALSE}
#' \dontrun{
#' country_level_data_laspeyeres <- get_laspeyeres_index(country_level_data)
#' commune_level_data_laspeyeres <- get_laspeyeres(commune_level_data)
#' }
```
Expand Down Expand Up @@ -729,7 +731,7 @@ Something important to notice as well: my fusen-ready `.Rmd` file is simply
called `save_data.Rmd`, while the generated, inflated file, that will be part of
the package under the `vignettes/` folder is called `dev-save_data.Rmd`.

When you inflate you a flat file into a package, the R console will be verbose.
When you inflate a flat file into a package, the R console will be verbose.
This lists all files that are created or modified, but there is also a long list
of checks that run automatically. This is the output of `devtools::check()` that
is included inside `fusen::inflate()`. This function verifies that your package,
Expand Down Expand Up @@ -847,7 +849,7 @@ It is also possible to install the package from a specific branch:

```{r, eval = F}
remotes::install_github(
"github_username/repository_name@repo_name"
"github_username/repository_name@branch_name"
)
```

Expand All @@ -856,7 +858,7 @@ commit:

```{r, eval = F}
remotes::install_github(
"github_username/repository_name@repo_name",
"github_username/repository_name@branch_name",
ref = "commit_hash"
)
```
Expand Down
21 changes: 13 additions & 8 deletions project_rewrite.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,7 @@ read_clean <- function(..., sheet){
) |>
mutate(locality = str_trim(locality)) |>
select(year, locality, n_offers, starts_with("average"))
}
```

Expand Down Expand Up @@ -329,7 +330,7 @@ We’re now scraping data from Wikipedia of former Luxembourguish communes:
```{r}
get_former_communes <- function(
url = "https://w.wiki/_wFe7",
url = "https://is.gd/lux_former_communes",
min_year = 2009,
table_position = 3
){
Expand All @@ -351,14 +352,18 @@ We can scrape current communes:
```{r}
get_current_communes <- function(
url = "https://w.wiki/6nPu",
table_position = 1
url = "https://is.gd/lux_communes",
table_position = 2
){
read_html(url) %>%
html_table() %>%
pluck(table_position) %>%
clean_names()
read_html(url) |>
html_table() |>
pluck(table_position) |>
clean_names() |>
filter(name_2 != "Name") |>
rename(commune = name_2) |>
mutate(commune = str_remove(commune, " .$"))
}
```
Expand Down Expand Up @@ -388,7 +393,7 @@ get_test_communes <- function(former_communes, current_communes){
communes[which(communes == "Clemency")] <- "Clémency"
communes[which(communes == "Redange")] <- "Redange-sur-Attert"
communes[which(communes == "Erpeldange-sur-Sûre")] <- "Erpeldange"
communes[which(communes == "Luxembourg-City")] <- "Luxembourg"
communes[which(communes == "Luxembourg City")] <- "Luxembourg"
communes[which(communes == "Käerjeng")] <- "Kaerjeng"
communes[which(communes == "Petange")] <- "Pétange"
Expand Down
31 changes: 22 additions & 9 deletions project_start.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -383,23 +383,35 @@ there. For this, we need a list of communes from Luxembourg. [Thankfully,
Wikipedia has such a
list](https://en.wikipedia.org/wiki/List_of_communes_of_Luxembourg)^[https://w.wiki/6nPu].

Let's scrape and save this list:
An issue with scraping tables off the web is that they might change in the
future. It is therefore a good idea to save the page by right clicking on it and
then selecting save as, and then re-hosting it. I use Github pages to re-host
the Wikipedia page above
[here](https://b-rodrigues.github.io/list_communes/)^[https://is.gd/lux_communes].
I now have full control of this page, and won't get any bad surprises if someone
decides to eventually update it. Instead of re-hosting it, you could simply save
it as any other file of your project.

So let's scrape and save this list:

```{r}
current_communes <- "https://w.wiki/6nPu" |>
current_communes <- "https://is.gd/lux_communes" |>
rvest::read_html() |>
rvest::html_table() |>
purrr::pluck(1) |>
janitor::clean_names()
purrr::pluck(2) |>
janitor::clean_names() |>
dplyr::filter(name_2 != "Name") |>
dplyr::rename(commune = name_2)
```

We scrape the table from the Wikipedia page using `{rvest}`.
We scrape the table from the re-hosted Wikipedia page using `{rvest}`.
`rvest::html_table()` returns a list of tables from the Wikipedia table, and
then we use `purrr::pluck()` to keep the first table from the website, which is
then we use `purrr::pluck()` to keep the second table from the website, which is
what we need (I made the calls to the packages explicit, because you might not
be familiar with these packages). `janitor::clean_names()` transforms column
names written for human eyes into machine-friendly names (for example `Growth
rate in %` would be transformed to `growth_rate_in_percent`).
rate in %` would be transformed to `growth_rate_in_percent`) and then I use
the `{dplyr}` package for some further cleaning and renaming.

Let’s see if we have all the communes in our data:

Expand All @@ -415,10 +427,11 @@ there’s also a less obvious reason; since 2010, several communes have merged
into new ones. So there are communes that are in our data in 2010 and
2011, but disappear from 2012 onwards. So we need to do several things: first,
get a list of all existing communes from 2010 onwards, and then, harmonise
spelling. Here again, we can use a list from Wikipedia:
spelling. Here again, we can use a list from Wikipedia, and here again, I decide
to re-host it on Github pages to avoid problems in the future:

```{r}
former_communes <- "https://w.wiki/_wFe7" |>
former_communes <- "https://is.gd/lux_former_communes" |>
rvest::read_html() |>
rvest::html_table() |>
purrr::pluck(3) |>
Expand Down
16 changes: 8 additions & 8 deletions repro_cont.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ architecture with their Apple silicon CPUs (as of writing, the Mac Pro is the
only computer manufactured by Apple that doesn't use an Apple silicon CPU and
only because it was released in 2019) and it wouldn't surprise me if other
manufacturers follow suit and develop their own ARM cpus. This means that
projects written today may not run anymore in the future, because of this
projects written today may not run anymore in the future, because of these
architecture changes. Libraries compiled for current architectures would need to
be recompiled for ARM, and that may be difficult.

Expand Down Expand Up @@ -534,7 +534,7 @@ Google search (but I'm giving it to you, dear reader, for free).

Then come `RUN` statements. The first one uses Ubuntu's package manager to first
refresh the repositories (this ensures that our local Ubuntu installation
repositories are in synch with the latest software updates that were pushed to
repositories are in sync with the latest software updates that were pushed to
the central Ubuntu repos). Then we use Ubuntu's package manager to install
`r-base`. `r-base` is the package that installs R. We then finish this
Dockerfile by running `CMD ["R"]`. This is the command that will be executed
Expand Down Expand Up @@ -587,7 +587,7 @@ What is going on here? When you run a container, the command specified by `CMD`
gets executed, and then the container quits. So here, the container ran the
command `R`, which started the R interpreter, but then quit immediately. When
quitting R, users should specify if they want to save or not save the workspace.
This is what the message above is telling us. So, how can be use this? Is there
This is what the message above is telling us. So, how can we use this? Is there
a way to use this R version interactively?

Yes, there is a way to use this R version boxed inside our Docker image
Expand Down Expand Up @@ -694,7 +694,7 @@ as a file. I’ll explain how later.
The Rocker project offers many different images, which are described
[here](https://rocker-project.org/images/)^[https://rocker-project.org/images/].
We are going to be using the *versioned* images. These are images that ship
specific versions of R. This way, it doesn't matter when the image gets build,
specific versions of R. This way, it doesn't matter when the image gets built,
the same version of R will be installed by getting built from source. Let me
explain why building R from source is important. When we build the image from
the Dockerfile we wrote before, R gets installed from the Ubuntu repositories.
Expand Down Expand Up @@ -882,7 +882,7 @@ and final step:

This runs the `R` program from the Linux command line with the option `-e`. This
option allows you to pass an `R` expression to the command line, which needs to
be written between `""`. Using `R -e` will quickly become an habit, because this
be written between `""`. Using `R -e` will quickly become a habit, because this
is how you can run R non-interactively, from the command line. The expression we
pass sets the working directory to `/home/housing`, and then we use
`renv::init()` and `renv::restore()` to restore the packages from the
Expand Down Expand Up @@ -1086,7 +1086,7 @@ the R session in the right directory. So we move to the right directory, then we
run the pipeline using `R -e "targets::tar_make()"`. Notice that we do both
operations within a `RUN` statement. This means that the pipeline will run at
build-time (remember, `RUN` statements run at build-time, `CMD` statements at
run-time). In order words, the image will contain the outputs. This way, if the
run-time). In other words, the image will contain the outputs. This way, if the
build process and the pipeline take a long time to run, you can simply leave
them running overnight for example. In the morning, while sipping on your
coffee, you can then simply run the container to instantly get the outputs. This
Expand Down Expand Up @@ -1320,7 +1320,7 @@ By following these two rules, you should keep any issues to a minimum. When or
if you need to update R and/or the package library on your machine, simply
create a new Docker image that reflects these changes.

However, if work in a field where operating system versions matter, then yes,
However, if you work in a field where operating system versions matter, then yes,
you should find a way to either use the dockerized environment for development,
or you should install Ubuntu on your computer (the same version as in Docker of
course).
Expand Down Expand Up @@ -1636,7 +1636,7 @@ needs mitigation, and thus a plan B. This plan B could be to host the images
yourself, by saving them using `docker save`. Or you could even self-host an
image registry (or lobby your employer/institution/etc to host a registry for
its developers/data scientists/researchers). In any case, it's good to have
options and now what potential risks using this technology entail.
options and know what potential risks using this technology entail.

### Is Docker enough?

Expand Down
4 changes: 2 additions & 2 deletions repro_intro.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ can find in the `renv` folder. Let’s take a look at the contents of this folde

::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ ls renv
owner@localhost ➤ ls -la renv
```
:::

Expand Down Expand Up @@ -575,7 +575,7 @@ The first problem, and I’m repeating myself here, is that `{renv}` only record
the R version used for the project, but does not restore it when calling
`renv::restore()`. You need to install the right R version yourself. On Windows
this should be fairly easy to do, but then you need to start juggling R versions
and know which scrips need which R version, which can get confusing.
and know which scripts need which R version, which can get confusing.

There is the `{rig}` package that makes it easy to install and switch between R
versions that you could check
Expand Down
Loading

0 comments on commit bca569e

Please sign in to comment.