Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute grid info dplyr #961

Merged
merged 7 commits into from
Nov 14, 2024
Merged

Compute grid info dplyr #961

merged 7 commits into from
Nov 14, 2024

Conversation

topepo
Copy link
Member

@topepo topepo commented Nov 13, 2024

Closes #959.

A rework with Simon of #960

The grids are sorted in the new version, so there were some changes to the expected results.

dplyr::mutate(
.iter_config = purrr::map(data, make_iter_config),
.model = purrr::map(data, ~ tibble::tibble(.iter_model = seq_len(nrow(.x)))),
.num_models = purrr::map_int(.model, nrow)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for res$.msg_model (and is removed later)

dplyr::select(-.iter_preprocessor) %>%
tidyr::unnest(cols = c(data, .model, .iter_config)) %>%
dplyr::select(-.lab_pre) %>%
dplyr::relocate(dplyr::starts_with(".iter"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and another relocate below) should help with column ordering stability in case we refactor again.

res$.iter_preprocessor <- seq_len(nrow(res))
pp_df <-
dplyr::distinct(res, !!!syms_pre) %>%
dplyr::arrange(!!!syms_pre) %>%
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the ordering of the preprocessors predictable. Previously, it would order them as-is. It's no big deal, but someone might wonder why deg_free = 10 is executed before deg_free=2.

This is why there are several sort() calls in the unit tests.

@@ -212,6 +221,12 @@ test_that("compute_grid_info - recipe and model (with and without submodels)", {
list(trees = c(1L, 1000L))
)
)
expect_equal(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't be too paranoid about this.

@@ -185,25 +189,30 @@ test_that("compute_grid_info - recipe and model (with and without submodels)", {
# use grid_regular to (partially) trigger submodel trick
set.seed(1)
param_set <- extract_parameter_set_dials(wflow)
grid <- bind_rows(grid_regular(param_set), grid_space_filling(param_set))
grid <-
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to add more tests for unbalanced grids but this one covers it well.

@topepo topepo marked this pull request as ready for review November 13, 2024 18:00
Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. This fixes bugs I had introduced in the previous refactor and I find it more readable.

We're just one format_with_padding() away from being there, though it's proven tricky to situate that in the right place given that each of those .iter_configs live inside of lists. I'm open to entertaining the argument that the padding is ultimately unneeded if you think that's the best way to move forward—those .iter_config indices can be reasonably separated using just "Preprocessor" and "_Model" already. In the case we're fine with that padding going away, consider this an "Approve" merge once we've noted the change in NEWS. Otherwise, a bit more work to do.

I just adjusted the ref in tidymodels/extratests#227 to test this PR. I'm assuming it will fail with a format_with_padding() snap change but otherwise run fine.

expect_equal(unique(res$.iter_model), 1:3)
expect_equal(
res$.iter_config[1:3],
res$.iter_config[res$.iter_preprocessor == 1],
list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your new enumeration is correct (and fixes the case where I had broken previously), but it does look like we're missing a format_with_padding still. When I run these tests with CRAN compute_grid_info(), I see:

── Failure (test-grid_helpers.R:204:3): compute_grid_info - recipe and model (with and without submodels) ──
res$.iter_config[res$.iter_preprocessor == 1] (`actual`) not equal to list(...) (`expected`).

    actual[[1]]             | expected[[1]]             
[1] "Preprocessor1_Model01" - "Preprocessor1_Model1" [1]
[2] "Preprocessor1_Model02" - "Preprocessor1_Model2" [2]
[3] "Preprocessor1_Model03" - "Preprocessor1_Model3" [3]
[4] "Preprocessor1_Model04" - "Preprocessor1_Model4" [4]

`actual[[2]]`:   "Preprocessor1_Model05" "Preprocessor1_Model06" "Preprocessor1_Model07"
`expected[[2]]`: "Preprocessor1_Model5"  "Preprocessor1_Model6"  "Preprocessor1_Model7" 

`actual[[3]]`:   "Preprocessor1_Model08" "Preprocessor1_Model09" "Preprocessor1_Model10"
`expected[[3]]`: "Preprocessor1_Model8"  "Preprocessor1_Model9"  "Preprocessor1_Model10"

R/grid_helpers.R Outdated
# Compute labels for the models *within* each preprocessing loop.
num_submodels <- purrr::map_int(dat$.submodels, ~ length(unlist(.x)))
num_models <- sum(num_submodels + 1) # +1 for the model being trained
.mod_label <- paste0("Model", 1:num_models)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.mod_label <- paste0("Model", 1:num_models)
.mod_label <- paste0("Model", seq_len(num_models))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be curious if we could just add a format_with_padding() here, but I think that actually needs to happen outside of the loop since a different preprocessor might be paired with models that enum up to the tens / hundreds place.

Copy link
Member Author

@topepo topepo Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this will work in make_iter_config(); all the models within the preprocessor combination are present at that time.

Copy link
Member Author

@topepo topepo Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tidymodels/extratests#227 looks goo after the last commit.

Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I see passing tests with the CRAN version, too! Let's merge this in.

@topepo topepo merged commit e16bb0d into main Nov 14, 2024
14 checks passed
@topepo topepo deleted the compute-grid-info-dplyr branch November 14, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug in refactored compute_grid_info()
2 participants