-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute grid info dplyr #961
Conversation
dplyr::mutate( | ||
.iter_config = purrr::map(data, make_iter_config), | ||
.model = purrr::map(data, ~ tibble::tibble(.iter_model = seq_len(nrow(.x)))), | ||
.num_models = purrr::map_int(.model, nrow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for res$.msg_model
(and is removed later)
dplyr::select(-.iter_preprocessor) %>% | ||
tidyr::unnest(cols = c(data, .model, .iter_config)) %>% | ||
dplyr::select(-.lab_pre) %>% | ||
dplyr::relocate(dplyr::starts_with(".iter")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and another relocate
below) should help with column ordering stability in case we refactor again.
res$.iter_preprocessor <- seq_len(nrow(res)) | ||
pp_df <- | ||
dplyr::distinct(res, !!!syms_pre) %>% | ||
dplyr::arrange(!!!syms_pre) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make the ordering of the preprocessors predictable. Previously, it would order them as-is. It's no big deal, but someone might wonder why deg_free = 10
is executed before deg_free=2
.
This is why there are several sort()
calls in the unit tests.
@@ -212,6 +221,12 @@ test_that("compute_grid_info - recipe and model (with and without submodels)", { | |||
list(trees = c(1L, 1000L)) | |||
) | |||
) | |||
expect_equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't be too paranoid about this.
@@ -185,25 +189,30 @@ test_that("compute_grid_info - recipe and model (with and without submodels)", { | |||
# use grid_regular to (partially) trigger submodel trick | |||
set.seed(1) | |||
param_set <- extract_parameter_set_dials(wflow) | |||
grid <- bind_rows(grid_regular(param_set), grid_space_filling(param_set)) | |||
grid <- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to add more tests for unbalanced grids but this one covers it well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. This fixes bugs I had introduced in the previous refactor and I find it more readable.
We're just one format_with_padding()
away from being there, though it's proven tricky to situate that in the right place given that each of those .iter_config
s live inside of lists. I'm open to entertaining the argument that the padding is ultimately unneeded if you think that's the best way to move forward—those .iter_config
indices can be reasonably separated using just "Preprocessor" and "_Model" already. In the case we're fine with that padding going away, consider this an "Approve" merge once we've noted the change in NEWS. Otherwise, a bit more work to do.
I just adjusted the ref in tidymodels/extratests#227 to test this PR. I'm assuming it will fail with a format_with_padding()
snap change but otherwise run fine.
expect_equal(unique(res$.iter_model), 1:3) | ||
expect_equal( | ||
res$.iter_config[1:3], | ||
res$.iter_config[res$.iter_preprocessor == 1], | ||
list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your new enumeration is correct (and fixes the case where I had broken previously), but it does look like we're missing a format_with_padding
still. When I run these tests with CRAN compute_grid_info()
, I see:
── Failure (test-grid_helpers.R:204:3): compute_grid_info - recipe and model (with and without submodels) ──
res$.iter_config[res$.iter_preprocessor == 1] (`actual`) not equal to list(...) (`expected`).
actual[[1]] | expected[[1]]
[1] "Preprocessor1_Model01" - "Preprocessor1_Model1" [1]
[2] "Preprocessor1_Model02" - "Preprocessor1_Model2" [2]
[3] "Preprocessor1_Model03" - "Preprocessor1_Model3" [3]
[4] "Preprocessor1_Model04" - "Preprocessor1_Model4" [4]
`actual[[2]]`: "Preprocessor1_Model05" "Preprocessor1_Model06" "Preprocessor1_Model07"
`expected[[2]]`: "Preprocessor1_Model5" "Preprocessor1_Model6" "Preprocessor1_Model7"
`actual[[3]]`: "Preprocessor1_Model08" "Preprocessor1_Model09" "Preprocessor1_Model10"
`expected[[3]]`: "Preprocessor1_Model8" "Preprocessor1_Model9" "Preprocessor1_Model10"
R/grid_helpers.R
Outdated
# Compute labels for the models *within* each preprocessing loop. | ||
num_submodels <- purrr::map_int(dat$.submodels, ~ length(unlist(.x))) | ||
num_models <- sum(num_submodels + 1) # +1 for the model being trained | ||
.mod_label <- paste0("Model", 1:num_models) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.mod_label <- paste0("Model", 1:num_models) | |
.mod_label <- paste0("Model", seq_len(num_models)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be curious if we could just add a format_with_padding()
here, but I think that actually needs to happen outside of the loop since a different preprocessor might be paired with models that enum up to the tens / hundreds place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty easy with recipes::names0()
: 2a78da3#diff-f8b62de58f2adfdb08f139a17fd26e3ddb0b7a0ab33909f5bbdb6fdb93ba7923L333
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this will work in make_iter_config()
; all the models within the preprocessor combination are present at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tidymodels/extratests#227 looks goo after the last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I see passing tests with the CRAN version, too! Let's merge this in.
Closes #959.
A rework with Simon of #960
The grids are sorted in the new version, so there were some changes to the expected results.