Voting methods for feature ranking in efs #112

bblodfon · 2024-07-31T07:50:07Z

Use fastVoteR, where 4 voting theory methods are now implemented in Rcpp
Add embedded_ensemble_fselect()
Refactoring/Simplified code on both ensemble feature selection functions and EnsembleFSResult()

R/embedded_ensemble_fselect.R

be-marc · 2024-11-29T11:11:50Z

R/EnsembleFSResult.R

+    #'  can be changed with `$set_active_measure()`.
+    #' @param inner_measure ([mlr3::Measure])\cr
+    #'  The inner measure used to optimize and score the learners on the train sets
+    #'  generated during the ensemble feature selection process.


Can we say that differently? Scoring on a train set sounds wrong. Is this the outer train set which is split by the inner resampling? We score the inner resample result?

Yes, its the outer train set. The inner_resampling generates N train/test splits. The inner_measure is used to optimize/tune on the train set and you get the best subset and final model + score on that train set. We use these final models to also score the corresponding test splits (the inner resampling result you ask), with the measure. In embedded efs we only do the second (no inner_measure is needed/used).

can change the wording to specifically mention the train/test splits of the inner resampling (I also mentionthat earlier in the doc), what do you think?

you get the best subset and final model + score on that train set

It is the final model with the best subset and corresponding performance estimated on the inner resampling. There is no scoring on the outer training set but scoring on the inner resampling result. This is very similar to nested resampling. Maybe stick to the words used bellow figure 4.5

https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html#sec-nested-resampling

Yes sorry Marc, it's as you say, when I was writing the above comment, I meant outer resampling (what we call init_resampling) as the one that generates the train/test splits. And yes, pretty much we are doing nested CV, with outer resampling the N times holdout split. I will update the doc

bblodfon added 30 commits July 31, 2024 09:09

add stability selection article

a5d1b38

add Rcpp code for approval voting feature ranking method

4cc3815

add citation

21ae7d7

extra check during init()

ccffa4b

update doc + use the Rcpp interface for approval voting

108ddc2

add templates for params in ArchiveBatchFSelect + updocs

589df2e

use testthat expectations (not checkmate ones!)

e520c77

add test for newly implemented voting methods

0ecc618

update test for av

2622c96

fix note

97f21c4

refactor AV_rcpp, add SAV_rcpp

f84f91c

add norm_score, and SAV R function

3614d93

add sav, improve doc

0a1eb49

fix efs test

fc5d24d

update and improve test for AV

6df3bbd

add sav test

fc86503

Merge branch 'main' into voting_methods

0d9eccf

add borda score

87d68d4

update tests

fa05f09

add seq and revseq PAV Rcpp methods

6a89966

add R functions for the PAV methods

5c09975

comment printing

103bf45

add tests for PAV methods

ff17d11

add PAV methods to efs

b6f4b5e

refactor: do not use C++ RNGs

3a248cf

fix startsWith

92ce0df

updocs

283003e

fix data.table note

567f456

add committee_size parameter, refactor borda score

e55ae24

add large data test for seq pav

9a37e60

bblodfon added 7 commits October 25, 2024 17:19

updocs

7f3ab3b

small styling fix

4137404

add Stabl ref

d151303

more descriptive name

83529b6

add embedded ensemble feature selection

49bb097

remove print()

6f3923f

add TOCHECK comment on benchmark design

123624e

be-marc reviewed Nov 11, 2024

View reviewed changes

R/embedded_ensemble_fselect.R Outdated Show resolved Hide resolved

be-marc and others added 20 commits November 11, 2024 19:34

use internal valid task

0581cdc

simplify

14acd73

...

81b475d

store_models = FALSE

79747ad

...

331f231

separate the use of inner_measure and measure used in the test sets

081acc8

updocs

efc0155

update tests

0e2f93f

Merge branch 'main' into voting_methods

3bca203

refactor: expect_vector => expect_numeric

d457221

fix partial arg match

9cb56b1

fix example

cc36179

use fastVoteR for feature ranking

816376a

pass named list to callback parameter

3dae249

skip test if fastVoteR is not available

fd5afbc

refactor: better handling of inner measure

c937024

add tests for embedded_ensemble_fselect()

8e506c8

update NEWs

3bd1772

add active_measure field

9e05dca

remove Remotes as fastVoteR is now on CRAN :)

832bd7f

be-marc reviewed Nov 29, 2024

View reviewed changes

refine doc

8c0d73f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voting methods for feature ranking in efs #112

Voting methods for feature ranking in efs #112

bblodfon commented Jul 31, 2024 •

edited

Loading

be-marc Nov 29, 2024

bblodfon Nov 29, 2024

bblodfon Nov 29, 2024

be-marc Nov 29, 2024

bblodfon Nov 29, 2024

Voting methods for feature ranking in efs #112

Are you sure you want to change the base?

Voting methods for feature ranking in efs #112

Conversation

bblodfon commented Jul 31, 2024 • edited Loading

be-marc Nov 29, 2024

Choose a reason for hiding this comment

bblodfon Nov 29, 2024

Choose a reason for hiding this comment

bblodfon Nov 29, 2024

Choose a reason for hiding this comment

be-marc Nov 29, 2024

Choose a reason for hiding this comment

bblodfon Nov 29, 2024

Choose a reason for hiding this comment

bblodfon commented Jul 31, 2024 •

edited

Loading