Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipeOp "classweights" should throw a warning if it is attached to a learner which does not have the "weights" property #931

Open
FlorianPargent opened this issue Mar 7, 2023 · 2 comments
Assignees
Labels

Comments

@FlorianPargent
Copy link

FlorianPargent commented Mar 7, 2023

I reran code from a now accepted paper of mine on cost-sensitive ML for psychologists with mlr3 in which we used classweights in combination with "classif.log_reg". Recently, mlr3learners removed the "weights" property from "classif.log_reg" (which I think was a very bad decision, see my related issue in the mlr3learners github repository: mlr-org/mlr3learners#265). This made me aware of the following issue in mlr3pipelines:

If I build a pipeline combining a "classweights" pipeop with "classif.log_reg", this will silently ignore my weights (because classif.log_reg does not have the weights property anymore). This behavior is extremely dangerous, and I only noticed it because in my paper I have computed both conditions (with and without weighting) and both gave exactly the same results!

Would it be possible to include a warning when using classweights with learners without this property?

library("mlr3verse")
task = tsk("german_credit")

costs = matrix(c(0, 1, 5, 0), nrow = 2)
dimnames(costs) = list(response = c("good", "bad"), truth = c("good", "bad"))
print(costs)

logreg = lrn("classif.log_reg", predict_type = "prob")
tw = po("classweights", minor_weight = 5)
tw_logreg = as_learner(tw %>>% logreg)

# we get exactly the same benchmark result for both conditions (normal logreg and logreg with weighting)
# the fact that classif.log_reg does not have a weights property (since mlr3learners version 0.5.4) is silently ignored!
set.seed(1)
bm = benchmark(benchmark_grid(list(task), list(logreg, tw_logreg),
  rsmp("cv")), store_models = FALSE)
bm$aggregate(msr("classif.costs", costs = costs))

#     nr      resample_result       task_id                   learner_id resampling_id iters classif.costs
# 1:  1 <ResampleResult[21]> german_credit              classif.log_reg            cv    10         0.851
# 2:  2 <ResampleResult[21]> german_credit classweights.classif.log_reg            cv    10         0.851
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] mlr3verse_0.2.7 mlr3_0.14.1    

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0       xfun_0.37              clusterCrit_1.2.8      listenv_0.9.0          mlr3cluster_0.1.6     
 [6] generics_0.1.3         colorspace_2.1-0       vctrs_0.5.2            htmltools_0.5.4        bbotk_0.7.2           
[11] yaml_2.3.7             paradox_0.11.0         utf8_1.2.3             rlang_1.0.6            pillar_1.8.1          
[16] glue_1.6.2             withr_2.5.0            palmerpenguins_0.1.1   uuid_1.1-0             mlr3fselect_0.9.1     
[21] lifecycle_1.0.3        mlr3learners_0.5.6     munsell_0.5.0          gtable_0.3.1           future_1.31.0         
[26] codetools_0.2-18       evaluate_0.20          mlr3data_0.6.1         knitr_1.42             fastmap_1.1.0         
[31] parallel_4.2.2         fansi_1.0.4            mlr3tuningspaces_0.3.3 renv_0.16.0            backports_1.4.1       
[36] scales_1.2.1           checkmate_2.1.0        mlr3filters_0.7.1      mlr3viz_0.6.1          mlr3tuning_0.17.2     
[41] parallelly_1.34.0      ggplot2_3.4.1          digest_0.6.31          dplyr_1.1.0            grid_4.2.2            
[46] clue_0.3-64            cli_3.6.0              tools_4.2.2            magrittr_2.0.3         tibble_3.1.8          
[51] cluster_2.1.4          mlr3misc_0.11.0        future.apply_1.10.0    pkgconfig_2.0.3        data.table_1.14.8     
[56] mlr3pipelines_0.4.2    rmarkdown_2.20         rstudioapi_0.14        lgr_0.4.4              R6_2.5.1              
[61] globals_0.16.2         compiler_4.2.2
@mb706
Copy link
Collaborator

mb706 commented May 22, 2023

An mlr3pipelines PipeOp does not know about what other operations happen before / after it. In this case, it may be worth considering whether an mlr3 Learner should throw an error when it is given a task with weights but without the weights property.

@mb706 mb706 transferred this issue from mlr-org/mlr3pipelines May 22, 2023
@berndbischl berndbischl self-assigned this Aug 17, 2024
@berndbischl
Copy link
Member

add a unit test to pipelines before this is closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants