You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working through a two-outcome classification model workflow. My response variable was a two-level factor named idr_outcome. I attempted to tune a boosted tree model with cross validation via tune_grid. With the default control parameter, I was able to fit the different parameter combinations on the provided resample object.
For later postprocessing, I required the out-of-sample predicted values and I changed the control parameter to control = control_grid(save_pred = TRUE). When I made this change, tune_grid errored out.
I discovered that the error appears to be related to the name of my response variable idr_outcome. When I changed the name of the response variable to something else (that does not start with id), the tuning algorithm was able to save the predicted values. I created a reproducible example to explore this behavior. The issue persists when I revert to a simple logistic regression example that only uses fit_resamples.
You'll see in the reprex session info that I am using the (as of writing) latest development version of tune, but I originally encountered the error in the latest release version.
### in R/pull.Rpull_all_outcome_names<-function(resamples, res) {
all_outcome_names<-purrr::map(res, ~.x[[".all_outcome_names"]])
print(resamples)
resamples$.all_outcome_names<-all_outcome_namesresamples
}
library(tidyverse)
library(tidymodels)
library(modeldata)
set.seed(98210)
example_dat<-two_class_dat %>%
mutate(id_outcome=Class)
my_rec<- recipe(example_dat, id_outcome~A+B)
my_wf<-
workflow() %>%
add_model(logistic_reg()) %>%
add_recipe(my_rec)
my_cv_folds<- vfold_cv(example_dat, v=5)
# Fails when save_pred = TRUE
fit_resamples(
my_wf,
resamples=my_cv_folds,
metrics= metric_set(mn_log_loss, roc_auc, sensitivity, specificity),
control= control_resamples(save_pred=TRUE)
)
#> Error in `$<-` at tune/R/pull.R:93:3:#> ! Assigned data `all_outcome_names` must be compatible with existing#> data.#> ✖ Existing data has 10 rows.#> ✖ Assigned data has 5 rows.#> ℹ Only vectors of size 1 are recycled.#> Caused by error in `vectbl_recycle_rhs_rows()`:#> ! Can't recycle input of size 5 to size 10.# Works when save_pred = FALSE
fit_resamples(
my_wf,
resamples=my_cv_folds,
metrics= metric_set(mn_log_loss, roc_auc, sensitivity, specificity),
control= control_resamples(save_pred=FALSE)
)
#> # Resampling results#> # 5-fold cross-validation #> # A tibble: 5 × 4#> splits id .metrics .notes #> <list> <chr> <list> <list> #> 1 <split [632/159]> Fold1 <tibble [4 × 4]> <tibble [0 × 4]>#> 2 <split [633/158]> Fold2 <tibble [4 × 4]> <tibble [0 × 4]>#> 3 <split [633/158]> Fold3 <tibble [4 × 4]> <tibble [0 × 4]>#> 4 <split [633/158]> Fold4 <tibble [4 × 4]> <tibble [0 × 4]>#> 5 <split [633/158]> Fold5 <tibble [4 × 4]> <tibble [0 × 4]># Fit the same model with a new outcome nameworking_rec<- recipe(example_dat, Class~A+B)
working_wf<- update_recipe(my_wf, working_rec)
# Works
fit_resamples(
working_wf,
resamples=my_cv_folds,
metrics= metric_set(mn_log_loss, roc_auc, sensitivity, specificity),
control= control_resamples(save_pred=TRUE)
)
#> # Resampling results#> # 5-fold cross-validation #> # A tibble: 5 × 5#> splits id .metrics .notes .predictions #> <list> <chr> <list> <list> <list> #> 1 <split [632/159]> Fold1 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [159 × 6]>#> 2 <split [633/158]> Fold2 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [158 × 6]>#> 3 <split [633/158]> Fold3 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [158 × 6]>#> 4 <split [633/158]> Fold4 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [158 × 6]>#> 5 <split [633/158]> Fold5 <tibble [4 × 4]> <tibble [0 × 4]> <tibble [158 × 6]>
I am working through a two-outcome classification model workflow. My response variable was a two-level factor named
idr_outcome
. I attempted to tune a boosted tree model with cross validation viatune_grid
. With the defaultcontrol
parameter, I was able to fit the different parameter combinations on the provided resample object.For later postprocessing, I required the out-of-sample predicted values and I changed the control parameter to
control = control_grid(save_pred = TRUE)
. When I made this change,tune_grid
errored out.I discovered that the error appears to be related to the name of my response variable
idr_outcome
. When I changed the name of the response variable to something else (that does not start withid
), the tuning algorithm was able to save the predicted values. I created a reproducible example to explore this behavior. The issue persists when I revert to a simple logistic regression example that only usesfit_resamples
.You'll see in the reprex session info that I am using the (as of writing) latest development version of
tune
, but I originally encountered the error in the latest release version.Created on 2024-11-21 with reprex v2.1.1
Session info
The text was updated successfully, but these errors were encountered: