Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignette on the add_*/remove_*/update_*, adjust_* and tidy functions #274

Merged
merged 6 commits into from
Jan 28, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
254 changes: 254 additions & 0 deletions vignettes/update.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
---
title: "Using the add/update/remove and adjust functions"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Using the update and adjust functions}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
echo = TRUE,
collapse = TRUE,
comment = "#>",
out.width = "100%"
)
```

```{r setup, message=FALSE}
library(epipredict)
library(recipes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
library(recipes)
library(recipes)
library(dplyr)
library(workflows)

```

In this vignette, we will state the main goal of the add/update/remove and adjust functions and describe what part of the processing each function is intended for. We will then demonstrate how to use the sets of add/update/remove functions, followed by the adjust functions, and end with a brief discussion on the tidy methods for recipe and frosting objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add carriage returns so that this line breaks at 80 characters? Makes reviewing and editing much easier. Same for much of the below.


## Main goal of the add/update/remove and adjust functions

The primary goal of the update and adjust functions is to allow the user to modify a `step`, `layer`, `epi_recipe`, `frosting`, or a part of an `epi_workflow` so that they do not have to create a new object each time they wish to make a change to the pre-processing, fitting, or post-processing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for functions, it's better to end them with open+close parenthesis. I think here, these are all "objects", so it doesn't need that, but maybe clearer then to add "objects" in the sentence.


In the context of pre-processing, the goal of the update functions is to add/remove/update an `epi_recipe` or a step in it. For this, we have `add_epi_recipe`, `update_epi_recipe`, and `remove_epi_recipe` to add/update/remove an entire `epi_recipe` in an `epi_workflow` as well as `adjust_epi_recipe` to adjust a particular step in an `epi_recipe` or `epi_workflow` by the step number or name. For a model, one may `add_model`, `update_model`, or `remove_model` in an `epi_workflow`. For post-processing, where the goal is to update a frosting object or a layer in it, we have `add_frosting`, `remove_frosting`, and `update_frosting` to add/update/remove an entire `frosting` object in an `epi_workflow` as well as `adjust_frosting` to adjust a particular layer in a `frosting` or `epi_workflow` by its number or name. A summary of the function uses by processing step is shown by the following table:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

Suggested change
In the context of pre-processing, the goal of the update functions is to add/remove/update an `epi_recipe` or a step in it. For this, we have `add_epi_recipe`, `update_epi_recipe`, and `remove_epi_recipe` to add/update/remove an entire `epi_recipe` in an `epi_workflow` as well as `adjust_epi_recipe` to adjust a particular step in an `epi_recipe` or `epi_workflow` by the step number or name. For a model, one may `add_model`, `update_model`, or `remove_model` in an `epi_workflow`. For post-processing, where the goal is to update a frosting object or a layer in it, we have `add_frosting`, `remove_frosting`, and `update_frosting` to add/update/remove an entire `frosting` object in an `epi_workflow` as well as `adjust_frosting` to adjust a particular layer in a `frosting` or `epi_workflow` by its number or name. A summary of the function uses by processing step is shown by the following table:
In the context of pre-processing, the goal of the update functions is to add/remove/update an
`epi_recipe` or a step in it. For this, we have `add_epi_recipe()`, `update_epi_recipe()`, and `remove_epi_recipe()` to alter an entire `epi_recipe` in an `epi_workflow` as well as
`adjust_epi_recipe()` to alter the arguments to a particular step in an `epi_recipe` or `epi_workflow` by the step number or name.
For a model, one may `add_model()`, `update_model()`, or `remove_model()` in an `epi_workflow`.
For post-processing, where the goal is to update a frosting object or a layer in it, we have `add_frosting()`, `remove_frosting()`, and `update_frosting()` to alter the entire `frosting` object in an `epi_workflow` as
well as `adjust_frosting()` to alter the arguments to a particular layer in a `frosting` or `epi_workflow` by
its number or name.
A summary of the function uses by processing step is shown by the following table:


| | Add/update/remove functions | adjust functions |
|----------------------------|------------------------------------------------------------|---------------------|
| Pre-processing | `add_epi_recipe`, `update_epi_recipe`, `remove_epi_recipe` | `adjust_epi_recipe` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more functions.

| Model specification | `add_model`, `update_model` `remove_model` | |
| Post-processing | `add_frosting`, `remove_frosting`, `update_frosting` | `adjust_frosting` |

Since adding/removing/updating frosting as well as adjusting a layer in a `frosting` object proceeds in the same way as performing those tasks on an `epi_recipe`, we will focus on implementing those for an `epi_recipe` in this vignette and only briefly go through some examples for a `frosting` object.

## Add/update/remove an `epi_recipe` in an `epi_workflow`

We start with the built-in `case_death_rate_subset` dataset that contains JHU daily COVID-19 cases and deaths by state and take a subset of it from Nov. 1, 2021 to Dec. 31, 2021 for the four states of Alaska, California, New York, and South Carolina.

```{r}
jhu <- case_death_rate_subset %>%
dplyr::filter(time_value >= as.Date("2021-11-01"), geo_value %in% c("ak", "ca", "ny", "sc"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dplyr::filter(time_value >= as.Date("2021-11-01"), geo_value %in% c("ak", "ca", "ny", "sc"))
filter(time_value >= as.Date("2021-11-01"), geo_value %in% c("ak", "ca", "ny", "sc"))


jhu
```

Then, we construct a simple `epi_recipe` named `r`, where we lag the death rates by 0, 7, and 14 days, lead the death rate by 14 days, omit NA values in all predictors and then in all outcomes (and set `skip = TRUE` to skip over this processing of the outcome variable when the recipe is baked).

```{r}
r <- epi_recipe(jhu) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 14) %>%
step_naomit(all_predictors()) %>%
step_naomit(all_outcomes(), skip = TRUE)
```

We add this recipe to an `epi_workflow` object by inputting `r` into the `add_epi_recipe` function:

```{r}
wf <- epi_workflow() %>%
add_epi_recipe(r)

wf
```

We may then go on to add the fitted linear model to our `epi_workflow`:
```{r}
# Fit a linear model
wf <- epi_workflow(r, parsnip::linear_reg()) %>% fit(jhu)

wf
```

At this stage, suppose we decide to overhaul our recipe so that we have a different set of pre-processing steps or we want to make multiple changes to existing steps, but we desire to keep the remainder of the `epi_workflow` the same. We can use the `update_epi_recipe` function to trade our current recipe `r` for another recipe `r2` in `wf` as follows:

```{r}
r2 <- epi_recipe(jhu) %>%
step_epi_lag(death_rate, lag = c(0, 1, 7, 14)) %>%
step_epi_lag(case_rate, lag = c(0:7, 14)) %>%
step_epi_ahead(death_rate, ahead = 7) %>%
step_epi_naomit()

wf <- update_epi_recipe(wf, r2)
wf
```
You can see that the output of `wf` depicts the sequence of steps in `r2` instead of `r`, which indicates that the update was successful.

A longer approach to achieve the same end is to use `remove_epi_recipe` to remove the old recipe and then `add_epi_recipe` to add the new one. Under the hood, the `update_epi_recipe` function operates in this way.

The `add_epi_recipe` and `remove_epi_recipe` functions offload to the `workflows` versions of the functions (`workflows::add_recipe()` and `workflows::remove_recipe()`) as much as possible. The main reason for using the `epipredict` version is so that we ensure that we do not lose the `epi_workflow` class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer package names to go in curly braces. (Good use of paren on the function names here.)

Suggested change
The `add_epi_recipe` and `remove_epi_recipe` functions offload to the `workflows` versions of the functions (`workflows::add_recipe()` and `workflows::remove_recipe()`) as much as possible. The main reason for using the `epipredict` version is so that we ensure that we do not lose the `epi_workflow` class.
The `add_epi_recipe()` and `remove_epi_recipe()` functions offload to the `{workflows}` versions of the
functions as much as possible. The main reason for using the `{epipredict}` version is so that we ensure
that we retain the `epi_workflow` class.


To see this, let's look at what happens if we remove our current `epi_recipe` using `workflows::remove_recipe` and then inspect the class of `wf`:

```{r}
wf %>% class() # class before
workflows::remove_recipe(wf) %>% class() # class after removing recipe using workflows function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
workflows::remove_recipe(wf) %>% class() # class after removing recipe using workflows function
workflows::remove_recipe(wf) %>% class() # class after removing recipe using workflows function

```

We can observe that `wf` is no longer an `epi_workflow` and a `workflow`. It has been demoted to only a `workflow`. While all `epi_workflow`s are `workflow`s, not all `workflow`s are `epi_workflow`s, meaning that there may be compatibility issues and limitations to the tools that may be used from the `epipredict` package on a plain `workflow` object.

Now, while we checked what happens to the above `epi_recipe` if we remove it, note that we did not actually store that change to `wf` (using the assignment operator `<-`). Hence, our `epi_workflow` remains unchanged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now, while we checked what happens to the above `epi_recipe` if we remove it, note that we did not actually store that change to `wf` (using the assignment operator `<-`). Hence, our `epi_workflow` remains unchanged.
Now, while we checked what happens to the above `epi_recipe` if we remove it, note that we did not actually store that change to `wf`.


```{r}
wf
```

One thing to notice about this workflow output is that is that the model fit remains the same as when we had `r` as the recipe. This illustrates an important point - Any operations performed using the old recipe are not updated automatically. So we should be careful to fit the model using the new recipe, `r2`. Similarly, if predictions were made using the old recipe, then they should be re-generated using the version `epi_workflow` that contains the updated recipe. We can use `update_model` to replace the model used in `wf`, and then fit as before:

```{r}
# fit linear model
wf <- update_model(wf, parsnip::linear_reg()) %>% fit(jhu)
wf
```

Alternatively, we may use the `remove_model` followed by `add_model` combination for the same effect.

## Add/update/remove a `frosting` object in an `epi_workflow`

We will now generate and create a `frosting` object for post-processing predictions. In our initial frosting object, `f`, we simply implement predictions on the fitted `epi_workflow`:

```{r}
latest <- get_test_data(recipe = r2, x = jhu)

f <- frosting() %>%
layer_predict()

wf1 <- wf %>% add_frosting(f)
p1 <- predict(wf1, latest)
p1
```

Suppose we decide to augment our post-processing to include a threshold to enforce that the predictions are at least 0. As well, let's include the forecast and target dates as separate columns.

To update the `frosting` while leaving the remainder of the `epi_workflow` the same, we can use the `update_frosting` function as follows:

```{r}
# Update frosting in a workflow and predict
f2 <- frosting() %>%
layer_predict() %>%
layer_threshold(.pred) %>%
layer_add_forecast_date() %>%
layer_add_target_date()

wf2 <- wf1 %>% update_frosting(f2)
p2 <- predict(wf2, latest)
p2
```

Internally, this works by removing the old frosting followed by adding the new frosting, just like when we update a recipe or model.

```{r}
update_frosting
```

If we decide that we do not want the `frosting` post-processing at all, we can remove the `frosting` object from the workflow and make predictions as follows:

```{r}
wf3 <- wf2 %>% remove_frosting()
p3 <- predict(wf3, latest)
p3
```
You can see that the above results from `p3` are the same as from `p1`, when we simply have a prediction layer in the `frosting` post-processing container.

## Adjust a single step of an `epi_recipe`

Suppose that we just want to change a single step in an `epi_recipe` (that is either standalone or a part of an `epi_workflow`). Instead of replacing an entire `epi_recipe`, we can use the `adjust_epi_recipe` function. In this function, the step to be adjusted is indicated either the step number or name in the `which_step` parameter. Then, the parameter name and update value must be inputted as `...`.

For instance, suppose that we decide to lead the `death_rate` by 14 days instead of 7. We may adjust this step in `wf` recipe by setting `which_step` to the step number in the order of operations, which can be obtained by inspecting `r2` or the tidy summary of it:

```{r}
workflows::extract_preprocessor(wf) # step_epi_ahead is the third step in r2
tidy(workflows::extract_preprocessor(wf)) # tidy tibble summary of r2
Comment on lines +248 to +249
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
workflows::extract_preprocessor(wf) # step_epi_ahead is the third step in r2
tidy(workflows::extract_preprocessor(wf)) # tidy tibble summary of r2
extract_preprocessor(wf) # step_epi_ahead is the third step in r2
tidy(extract_preprocessor(wf)) # tidy tibble summary of r2


wf <- wf %>% adjust_epi_recipe(which_step = 3, ahead = 14)
```

Alternatively, we may adjust that step by name by specifying the full name of the step, `step_epi_ahead`, in `which_step`:

```{r}
wf %>% adjust_epi_recipe(which_step = "step_epi_ahead", ahead = 14) # not overwrite r2 because same result
```

If there are at least two steps in a recipe that share the same name, specifying the name in `which_step` will throw an error as `adjust_epi_recipe` is not intended to be used to modify multiple steps at once. The way, then, to modify a step that has the same name as another is to indicate what number it is in the ordering of the steps. For example, in `r2` there are two steps named `step_epi_lag` - the first step where we lag the death rate, and the second where we lag the case rate. If we want to modify the lags for the `case_rate` variable, we would specify the step number of 2 in `which_step`.

```{r}
wf <- wf %>% adjust_epi_recipe(which_step = 2, lag = c(0, 1, 7, 14, 21))

workflows::extract_preprocessor(wf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
workflows::extract_preprocessor(wf)
extract_preprocessor(wf)

```

We could adjust a recipe directly in the same way as we adjust a recipe in a workflow. The main difference is that we would not input `wf` as the first argument to `adjust_epi_recipe` but rather `r2`.

```{r}
adjust_epi_recipe(r2, which_step = 2, lag = c(0, 1, 7, 14, 21)) # should be same result as above
```

Note that when we adjust the `r2` object directly, we are not adjusting the recipe in the `epi_workflow`. That is, if we modify a step in `r2`, the change will not automatically transfer over to `wf`. We would need to modify the recipe in `wf` directly (`adjust_epi_recipe` on `wf`) or update the recipe in `wf` with a new `epi_recipe` that has undergone the adjustment (using `update_epi_recipe`):

```{r}
r2 <- adjust_epi_recipe(r2, which_step = 2, lag = 0:21)

workflows::extract_preprocessor(wf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
workflows::extract_preprocessor(wf)
extract_preprocessor(wf)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to keep the workflows:: for now because it's a quick and clear way to indicate what functions I used are from that package & I think it's good to lay that out somewhere (because otherwise it can be hard to tell for a beginner which functions are from that package and which are from ours - because it's not exceedingly common).

```

## Adjust a single layer of a `frosting`

Adjusting a layer of a `frosting` object proceeds in the same way as adjusting a step in an `epi_recipe` does. So if we want to change a single layer in a `frosting` (that is either in a standalone object or part of an `epi_workflow`), we can use the `adjust_frosting` function wherein the layer to be adjusted is indicated by either its number or name in the `which_layer` parameter. In addition, the argument name and update value must be inputted as `...`.

Let's work with the frosting object directly instead of working on it through the `epi_workflow` in a simple, illustrative example. Recall frosting `f2` which has the following layers:

```{r}
f2
```

Suppose that we decide to change the upper bound of the prediction threshold to 10 instead of `Inf`. We can adjust this layer in frosting object by setting `which_layer` to the layer number, 3 (which can be found by inspecting `f2` or `tidy(f2)`):

```{r}
f2 <- f2 %>% adjust_frosting(which_layer = 2, upper = 10)

f2
```

Alternatively, we may adjust that layer by specifying its full name, `layer_threshold`, in `which_layer`, to achieve the same result:

```{r}
f2 %>% adjust_frosting(which_layer = "layer_threshold", upper = 10) # not overwrite f2 because same result
```

## On the tidy method to inspect an `epi_recipe` or a `frosting` object

The tidy method, when used on an `epi_recipe`, will return a data frame that contains specific overview information about the recipe including the operation number, the operation class (either "step" or "check"), the type of method, a boolean value to indicate whether `prep()` has been used to estimate the operation, a boolean value to indicate whether the step is applied when `bake()` is called, and the id of the operation.

```{r}
tidy(r2)
```

In contrast, printing the `epi_recipe` object shows the inputs (number and roles of the variables) as well as the ordering and a brief written summary of the operations:

```{r}
r2
```

This same general structure persists when we compare the output of a frosting object to that of its tidy tibble. However, we no longer have the output specific to a recipe such as the roles in the recipe output and the trained and skip columns in tidy tibble for it. Thus, the output of a frosting object and the tidy tibble are simplified in comparison to those for an `epi_recipe`.

```{r}
f

tidy(f)
```