Compute grid info dplyr #961

topepo · 2024-11-13T16:21:08Z

Closes #959.

A rework with Simon of #960

The grids are sorted in the new version, so there were some changes to the expected results.

topepo · 2024-11-13T16:21:55Z

R/grid_helpers.R

+    dplyr::mutate(
+      .iter_config = purrr::map(data, make_iter_config),
+      .model = purrr::map(data, ~ tibble::tibble(.iter_model = seq_len(nrow(.x)))),
+      .num_models = purrr::map_int(.model, nrow)


This is needed for res$.msg_model (and is removed later)

topepo · 2024-11-13T16:22:59Z

R/grid_helpers.R

+    dplyr::select(-.iter_preprocessor) %>%
+    tidyr::unnest(cols = c(data, .model, .iter_config)) %>%
+    dplyr::select(-.lab_pre) %>%
+    dplyr::relocate(dplyr::starts_with(".iter"))


This (and another relocate below) should help with column ordering stability ~~in case~~ we refactor again.

topepo · 2024-11-13T16:25:14Z

R/grid_helpers.R

-    res$.iter_preprocessor <- seq_len(nrow(res))
+    pp_df <-
+      dplyr::distinct(res, !!!syms_pre) %>%
+      dplyr::arrange(!!!syms_pre) %>%


This will make the ordering of the preprocessors predictable. Previously, it would order them as-is. It's no big deal, but someone might wonder why deg_free = 10 is executed before deg_free=2.

This is why there are several sort() calls in the unit tests.

topepo · 2024-11-13T16:26:13Z

tests/testthat/test-grid_helpers.R

@@ -212,6 +221,12 @@ test_that("compute_grid_info - recipe and model (with and without submodels)", {
      list(trees = c(1L, 1000L))
    )
  )
+  expect_equal(


We can't be too paranoid about this.

topepo · 2024-11-13T16:27:46Z

tests/testthat/test-grid_helpers.R

@@ -185,25 +189,30 @@ test_that("compute_grid_info - recipe and model (with and without submodels)", {
  # use grid_regular to (partially) trigger submodel trick
  set.seed(1)
  param_set <- extract_parameter_set_dials(wflow)
-  grid <- bind_rows(grid_regular(param_set), grid_space_filling(param_set))
+  grid <-


I was going to add more tests for unbalanced grids but this one covers it well.

simonpcouch

Awesome. This fixes bugs I had introduced in the previous refactor and I find it more readable.

We're just one format_with_padding() away from being there, though it's proven tricky to situate that in the right place given that each of those .iter_configs live inside of lists. I'm open to entertaining the argument that the padding is ultimately unneeded if you think that's the best way to move forward—those .iter_config indices can be reasonably separated using just "Preprocessor" and "_Model" already. In the case we're fine with that padding going away, consider this an "Approve" merge once we've noted the change in NEWS. Otherwise, a bit more work to do.

I just adjusted the ref in tidymodels/extratests#227 to test this PR. I'm assuming it will fail with a format_with_padding() snap change but otherwise run fine.

simonpcouch · 2024-11-13T21:24:04Z

tests/testthat/test-grid_helpers.R

  expect_equal(unique(res$.iter_model), 1:3)
  expect_equal(
-    res$.iter_config[1:3],
+    res$.iter_config[res$.iter_preprocessor == 1],
    list(


Your new enumeration is correct (and fixes the case where I had broken previously), but it does look like we're missing a format_with_padding still. When I run these tests with CRAN compute_grid_info(), I see:

── Failure (test-grid_helpers.R:204:3): compute_grid_info - recipe and model (with and without submodels) ── res$.iter_config[res$.iter_preprocessor == 1] (`actual`) not equal to list(...) (`expected`). actual[[1]] | expected[[1]] [1] "Preprocessor1_Model01" - "Preprocessor1_Model1" [1] [2] "Preprocessor1_Model02" - "Preprocessor1_Model2" [2] [3] "Preprocessor1_Model03" - "Preprocessor1_Model3" [3] [4] "Preprocessor1_Model04" - "Preprocessor1_Model4" [4] `actual[[2]]`: "Preprocessor1_Model05" "Preprocessor1_Model06" "Preprocessor1_Model07" `expected[[2]]`: "Preprocessor1_Model5" "Preprocessor1_Model6" "Preprocessor1_Model7" `actual[[3]]`: "Preprocessor1_Model08" "Preprocessor1_Model09" "Preprocessor1_Model10" `expected[[3]]`: "Preprocessor1_Model8" "Preprocessor1_Model9" "Preprocessor1_Model10"

simonpcouch · 2024-11-13T21:24:55Z

R/grid_helpers.R

+  # Compute labels for the models *within* each preprocessing loop.
+  num_submodels <- purrr::map_int(dat$.submodels, ~ length(unlist(.x)))
+  num_models <- sum(num_submodels + 1) # +1 for the model being trained
+  .mod_label <- paste0("Model", 1:num_models)


Suggested change

.mod_label <- paste0("Model", 1:num_models)

.mod_label <- paste0("Model", seq_len(num_models))

Would be curious if we could just add a format_with_padding() here, but I think that actually needs to happen outside of the loop since a different preprocessor might be paired with models that enum up to the tens / hundreds place.

It's pretty easy with recipes::names0(): 2a78da3#diff-f8b62de58f2adfdb08f139a17fd26e3ddb0b7a0ab33909f5bbdb6fdb93ba7923L333

I think that this will work in make_iter_config(); all the models within the preprocessor combination are present at that time.

tidymodels/extratests#227 looks goo after the last commit.

simonpcouch

Nice, I see passing tests with the CRAN version, too! Let's merge this in.

topepo added 5 commits November 12, 2024 17:58

refactored compute_grid_info() using dplyr, purrr, and tidyr

38fe8a0

remove padding in .config

2a78da3

sort values for tests

a5d94aa

update test specification for different sorting

d719360

fix bug in the messages

7450c12

topepo commented Nov 13, 2024

View reviewed changes

update snapshots with new remotes

3935987

topepo requested a review from simonpcouch November 13, 2024 17:43

topepo marked this pull request as ready for review November 13, 2024 18:00

simonpcouch reviewed Nov 13, 2024

View reviewed changes

added padding back

cd1a79f

simonpcouch approved these changes Nov 14, 2024

View reviewed changes

topepo merged commit e16bb0d into main Nov 14, 2024
14 checks passed

topepo deleted the compute-grid-info-dplyr branch November 14, 2024 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute grid info dplyr #961

Compute grid info dplyr #961

topepo commented Nov 13, 2024 •

edited

Loading

topepo Nov 13, 2024

topepo Nov 13, 2024

topepo Nov 13, 2024

topepo Nov 13, 2024

topepo Nov 13, 2024

simonpcouch left a comment

simonpcouch Nov 13, 2024

simonpcouch Nov 13, 2024

simonpcouch Nov 13, 2024

topepo Nov 13, 2024 •

edited

Loading

topepo Nov 13, 2024

topepo Nov 13, 2024 •

edited

Loading

simonpcouch left a comment

	.mod_label <- paste0("Model", 1:num_models)
	.mod_label <- paste0("Model", seq_len(num_models))

Compute grid info dplyr #961

Compute grid info dplyr #961

Conversation

topepo commented Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonpcouch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topepo Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topepo Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

simonpcouch left a comment

Choose a reason for hiding this comment

topepo commented Nov 13, 2024 •

edited

Loading

topepo Nov 13, 2024 •

edited

Loading

topepo Nov 13, 2024 •

edited

Loading