-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Voting methods for feature ranking in efs #112
base: main
Are you sure you want to change the base?
Conversation
R/EnsembleFSResult.R
Outdated
#' can be changed with `$set_active_measure()`. | ||
#' @param inner_measure ([mlr3::Measure])\cr | ||
#' The inner measure used to optimize and score the learners on the train sets | ||
#' generated during the ensemble feature selection process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we say that differently? Scoring on a train set sounds wrong. Is this the outer train set which is split by the inner resampling? We score the inner resample result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its the outer train set. The inner_resampling
generates N train/test splits. The inner_measure
is used to optimize/tune on the train set and you get the best subset and final model + score on that train set. We use these final models to also score the corresponding test splits (the inner resampling result you ask), with the measure
. In embedded efs
we only do the second (no inner_measure
is needed/used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can change the wording to specifically mention the train/test splits of the inner resampling (I also mentionthat earlier in the doc), what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you get the best subset and final model + score on that train set
It is the final model with the best subset and corresponding performance estimated on the inner resampling. There is no scoring on the outer training set but scoring on the inner resampling result. This is very similar to nested resampling. Maybe stick to the words used bellow figure 4.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry Marc, it's as you say, when I was writing the above comment, I meant outer resampling (what we call init_resampling
) as the one that generates the train/test splits. And yes, pretty much we are doing nested CV, with outer resampling the N times holdout split. I will update the doc
fastVoteR
, where 4 voting theory methods are now implemented inRcpp
embedded_ensemble_fselect()
EnsembleFSResult()