Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot save class predictions on fitted resamples when response variable name starts with 'id' #967

Open
gimholte opened this issue Nov 21, 2024 · 0 comments

Comments

@gimholte
Copy link

I am working through a two-outcome classification model workflow. My response variable was a two-level factor named idr_outcome. I attempted to tune a boosted tree model with cross validation via tune_grid. With the default control parameter, I was able to fit the different parameter combinations on the provided resample object.

For later postprocessing, I required the out-of-sample predicted values and I changed the control parameter to control = control_grid(save_pred = TRUE). When I made this change, tune_grid errored out.

I discovered that the error appears to be related to the name of my response variable idr_outcome. When I changed the name of the response variable to something else (that does not start with id), the tuning algorithm was able to save the predicted values. I created a reproducible example to explore this behavior. The issue persists when I revert to a simple logistic regression example that only uses fit_resamples.

You'll see in the reprex session info that I am using the (as of writing) latest development version of tune, but I originally encountered the error in the latest release version.

### in R/pull.R
pull_all_outcome_names <- function(resamples, res) {
  all_outcome_names <- purrr::map(res, ~ .x[[".all_outcome_names"]])
  print(resamples)
  resamples$.all_outcome_names <- all_outcome_names
  resamples
}
library(tidyverse)
library(tidymodels)
library(modeldata)
set.seed(98210)

example_dat <-
  two_class_dat %>%
  mutate(id_outcome = Class)

my_rec <- recipe(example_dat, id_outcome ~ A + B)

my_wf <-
  workflow() %>%
  add_model(logistic_reg()) %>%
  add_recipe(my_rec)

my_cv_folds <- vfold_cv(example_dat, v = 5)

# Fails when save_pred = TRUE
fit_resamples(
    my_wf,
    resamples = my_cv_folds,
    metrics = metric_set(mn_log_loss, roc_auc, sensitivity, specificity),
    control = control_resamples(save_pred = TRUE)
  )
#> Error in `$<-` at tune/R/pull.R:93:3:
#> ! Assigned data `all_outcome_names` must be compatible with existing
#>   data.
#> ✖ Existing data has 10 rows.
#> ✖ Assigned data has 5 rows.
#> ℹ Only vectors of size 1 are recycled.
#> Caused by error in `vectbl_recycle_rhs_rows()`:
#> ! Can't recycle input of size 5 to size 10.

# Works when save_pred = FALSE
fit_resamples(
  my_wf,
  resamples = my_cv_folds,
  metrics = metric_set(mn_log_loss, roc_auc, sensitivity, specificity),
  control = control_resamples(save_pred = FALSE)
)
#> # Resampling results
#> # 5-fold cross-validation 
#> # A tibble: 5 × 4
#>   splits            id    .metrics         .notes          
#>   <list>            <chr> <list>           <list>          
#> 1 <split [632/159]> Fold1 <tibble [4 × 4]> <tibble [0 × 4]>
#> 2 <split [633/158]> Fold2 <tibble [4 × 4]> <tibble [0 × 4]>
#> 3 <split [633/158]> Fold3 <tibble [4 × 4]> <tibble [0 × 4]>
#> 4 <split [633/158]> Fold4 <tibble [4 × 4]> <tibble [0 × 4]>
#> 5 <split [633/158]> Fold5 <tibble [4 × 4]> <tibble [0 × 4]>

# Fit the same model with a new outcome name
working_rec <- recipe(example_dat, Class ~ A + B)
working_wf <- update_recipe(my_wf, working_rec)

# Works
fit_resamples(
  working_wf,
  resamples = my_cv_folds,
  metrics = metric_set(mn_log_loss, roc_auc, sensitivity, specificity),
  control = control_resamples(save_pred = TRUE)
)
#> # Resampling results
#> # 5-fold cross-validation 
#> # A tibble: 5 × 5
#>   splits            id    .metrics         .notes           .predictions      
#>   <list>            <chr> <list>           <list>           <list>            
#> 1 <split [632/159]> Fold1 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [159 × 6]>
#> 2 <split [633/158]> Fold2 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [158 × 6]>
#> 3 <split [633/158]> Fold3 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [158 × 6]>
#> 4 <split [633/158]> Fold4 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [158 × 6]>
#> 5 <split [633/158]> Fold5 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [158 × 6]>

Created on 2024-11-21 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       America/Los_Angeles
#>  date     2024-11-21
#>  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  backports      1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
#>  broom        * 1.0.7      2024-09-26 [1] CRAN (R 4.4.1)
#>  class          7.3-22     2023-05-03 [1] CRAN (R 4.4.2)
#>  cli            3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
#>  codetools      0.2-20     2024-03-31 [1] CRAN (R 4.4.2)
#>  colorspace     2.1-1      2024-07-26 [1] CRAN (R 4.4.1)
#>  data.table     1.16.2     2024-10-10 [1] CRAN (R 4.4.1)
#>  dials        * 1.3.0      2024-07-30 [1] CRAN (R 4.4.1)
#>  DiceDesign     1.10       2023-12-07 [1] CRAN (R 4.4.1)
#>  digest         0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
#>  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.4.1)
#>  evaluate       1.0.1      2024-10-10 [1] CRAN (R 4.4.1)
#>  fansi          1.0.6      2023-12-08 [1] CRAN (R 4.4.1)
#>  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.4.1)
#>  forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.4.1)
#>  foreach        1.5.2      2022-02-02 [1] CRAN (R 4.4.1)
#>  fs             1.6.5      2024-10-30 [1] CRAN (R 4.4.1)
#>  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.4.1)
#>  future         1.34.0     2024-07-29 [1] CRAN (R 4.4.1)
#>  future.apply   1.11.3     2024-10-27 [1] CRAN (R 4.4.1)
#>  generics       0.1.3      2022-07-05 [1] CRAN (R 4.4.1)
#>  ggplot2      * 3.5.1      2024-04-23 [1] CRAN (R 4.4.1)
#>  globals        0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
#>  glue           1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
#>  gower          1.0.1      2022-12-22 [1] CRAN (R 4.4.0)
#>  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.4.1)
#>  gtable         0.3.6      2024-10-25 [1] CRAN (R 4.4.1)
#>  hardhat        1.4.0.9002 2024-11-21 [1] Github (tidymodels/hardhat@aa7204b)
#>  hms            1.1.3      2023-03-21 [1] CRAN (R 4.4.1)
#>  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.4.1)
#>  infer        * 1.0.7      2024-03-25 [1] CRAN (R 4.4.1)
#>  ipred          0.9-15     2024-07-18 [1] CRAN (R 4.4.1)
#>  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.4.1)
#>  knitr          1.48       2024-07-07 [1] CRAN (R 4.4.1)
#>  lattice        0.22-6     2024-03-20 [1] CRAN (R 4.4.2)
#>  lava           1.8.0      2024-03-05 [1] CRAN (R 4.4.1)
#>  lhs            1.2.0      2024-06-30 [1] CRAN (R 4.4.1)
#>  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.4.1)
#>  listenv        0.9.1      2024-01-29 [1] CRAN (R 4.4.1)
#>  lubridate    * 1.9.3      2023-09-27 [1] CRAN (R 4.4.1)
#>  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.4.1)
#>  MASS           7.3-61     2024-06-13 [1] CRAN (R 4.4.2)
#>  Matrix         1.7-1      2024-10-18 [1] CRAN (R 4.4.2)
#>  modeldata    * 1.4.0      2024-06-19 [1] CRAN (R 4.4.1)
#>  munsell        0.5.1      2024-04-01 [1] CRAN (R 4.4.1)
#>  nnet           7.3-19     2023-05-03 [1] CRAN (R 4.4.2)
#>  parallelly     1.38.0     2024-07-27 [1] CRAN (R 4.4.1)
#>  parsnip      * 1.2.1.9003 2024-11-21 [1] Github (tidymodels/parsnip@a212f78)
#>  pillar         1.9.0      2023-03-22 [1] CRAN (R 4.4.1)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.4.1)
#>  prodlim        2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
#>  purrr        * 1.0.2      2023-08-10 [1] CRAN (R 4.4.1)
#>  R6             2.5.1      2021-08-19 [1] CRAN (R 4.4.1)
#>  Rcpp           1.0.13     2024-07-17 [1] CRAN (R 4.4.1)
#>  readr        * 2.1.5      2024-01-10 [1] CRAN (R 4.4.1)
#>  recipes      * 1.1.0.9001 2024-11-21 [1] Github (tidymodels/recipes@59345e1)
#>  reprex         2.1.1      2024-07-06 [1] CRAN (R 4.4.1)
#>  rlang          1.1.4      2024-06-04 [1] CRAN (R 4.4.1)
#>  rmarkdown      2.28       2024-08-17 [1] CRAN (R 4.4.1)
#>  rpart          4.1.23     2023-12-05 [1] CRAN (R 4.4.2)
#>  rsample      * 1.2.1.9000 2024-11-21 [1] Github (tidymodels/rsample@f799dba)
#>  rstudioapi     0.17.1     2024-10-22 [1] CRAN (R 4.4.1)
#>  scales       * 1.3.0      2023-11-28 [1] CRAN (R 4.4.1)
#>  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.4.2)
#>  sparsevctrs    0.1.0.9002 2024-11-21 [1] Github (r-lib/sparsevctrs@f72feb2)
#>  stringi        1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
#>  stringr      * 1.5.1      2023-11-14 [1] CRAN (R 4.4.1)
#>  survival       3.7-0      2024-06-05 [1] CRAN (R 4.4.2)
#>  tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.4.1)
#>  tidymodels   * 1.2.0      2024-03-25 [1] CRAN (R 4.4.1)
#>  tidyr        * 1.3.1      2024-01-24 [1] CRAN (R 4.4.1)
#>  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.4.1)
#>  tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.4.1)
#>  timechange     0.3.0      2024-01-18 [1] CRAN (R 4.4.1)
#>  timeDate       4041.110   2024-09-22 [1] CRAN (R 4.4.1)
#>  tune         * 1.2.1.9000 2024-11-21 [1] local
#>  tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.4.1)
#>  utf8           1.2.4      2023-10-22 [1] CRAN (R 4.4.1)
#>  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.4.1)
#>  withr          3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
#>  workflows    * 1.1.4.9000 2024-11-21 [1] Github (tidymodels/workflows@cd34921)
#>  workflowsets * 1.1.0      2024-03-21 [1] CRAN (R 4.4.1)
#>  xfun           0.49       2024-10-31 [1] CRAN (R 4.4.1)
#>  yaml           2.3.10     2024-07-26 [1] CRAN (R 4.4.1)
#>  yardstick    * 1.3.1      2024-03-21 [1] CRAN (R 4.4.1)
#> 
#>  [1] C:/Users/Gregory.Imholte/AppData/Local/Programs/R/R-4.4.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant