Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Registry of learners #81

RaphaelS1 opened this issue Apr 3, 2020 · 15 comments

Feature Request: Registry of learners #81

RaphaelS1 opened this issue Apr 3, 2020 · 15 comments


Copy link

It would be nice to have a permanent registry specifically for mlr3learners and mlr3learners.<package> that lists all available learners to install.
i.e. like mlr3::mlr_learners except not a dictionary that gets repopulated but instead a permanent list of all available learners that can be installed at any given time. If this was a table like mlr::listLearners() with properties that would be a bonus!

Copy link

so not whats there, currently in memory, but what available for installation?
but that function would need access to github?
and it would need to download packages and install them just to able to peek inside them?

Copy link

ok, i also just read your other issue #82
the thing is: where would that registry live? and who maintains this how?
currently it is very simple for a new party to add an extension learner package for mlr, without talking to us, you simply put it on github.

Copy link

Not necessarily, I'd imagine that this would just be a registry of strings, e.g. like a datatable that is appended once by whoever adds a new learner, so of the form

id package properties
classif.xgboost xgboost ...

It would make sense to live in mlr3learners but could also live in mlr3, it would be quite lightweight...

Copy link

currently it is very simple for a new party to add an extension learner package for mlr, without talking to us, you simply put it on github.

Surely one of you checks this package though and verifies it's up to some level of standard? You could also ask the maintainer of the extension to put in a PR to make sure they are added to the registry. Ultimately if they don't do it then it's their loss

Copy link

well, no, what you are describing is a different process.

a) this is how it is, currently. you write a new learner extension package. you put it somewhere on github. we have given you enough unit-testing tools to demonstrate it works. now maybe the whole mlr3-team is on extended leave. you can still publish your package, everything works.

b) if we do what you propose, we now have to update package X on CRAN (mlr3learners, or mlr3, rather the first?) each time we have to update that table?

OTOH you COULD argue that we are already maintaining the wiki table on github?
your whole issue here in one sentence is basically: "why is the wiki table not in machine readable format" correct?

Copy link

I'm not suggesting you push updates to CRAN for each learner. They can wait until the next release.

But yes that is essentially the issue, because when working in R I don't want to go back and forth between GitHub

Copy link

mllg commented Apr 3, 2020

Machine-readable format:

available.packages(repos = "")

Copy link

mllg commented Apr 3, 2020

Not saying that this is very convenient, we should really look into making the additional learners easier to discover and install.

Copy link

mllg commented Apr 3, 2020

Oh, and if you want properties ... yes, we might need to create a JSON file for this.

Copy link

Machine-readable format:

available.packages(repos = "")

I would have been nice to have posted the output too.

> options(width = 200)
> available.packages(repos = "")
                        Package                   Version Priority Depends        Imports                                                          LinkingTo Suggests                               
mlr3learners.C50        "mlr3learners.C50"        "0.1.0" NA       "R (>= 3.1.0)" "C50, data.table, mlr3 (>= 0.1.7), mlr3misc, paradox, R6"        NA        "checkmate, rmarkdown, testthat"       
mlr3learners.c50        "mlr3learners.c50"        "0.1.2" NA       "R (>= 3.1.0)" "C50, data.table, mlr3 (>= 0.1.7), mlr3misc, paradox, R6"        NA        "checkmate, rmarkdown, testthat"       
mlr3learners.extratrees "mlr3learners.extratrees" "0.2.0" NA       "R (>= 3.1.0)" "checkmate, data.table, extraTrees, mlr3, mlr3misc, paradox, R6" NA        "rmarkdown, testthat"                  
mlr3learners.fnn        "mlr3learners.fnn"        "0.2.0" NA       "R (>= 3.1.0)" "checkmate, data.table, FNN, mlr3, mlr3misc, paradox, R6"        NA        "rmarkdown, testthat"                  
mlr3learners.gbm        "mlr3learners.gbm"        "0.1.0" NA       "R (>= 3.1.0)" "data.table, gbm, mlr3, mlr3misc, paradox, R6"                   NA        "checkmate, testthat"                  
mlr3learners.kernlab    "mlr3learners.kernlab"    "0.2.0" NA       "R (>= 3.1.0)" "data.table, kernlab, mlr3, mlr3misc, paradox, R6"               NA        "bibtex, checkmate, testthat"          
mlr3learners.mboost     "mlr3learners.mboost"     "0.3.0" NA       "R (>= 3.1.0)" "data.table, mlr3, mlr3misc, paradox, R6, mboost, withr"         NA        "checkmate, bibtex, testthat"          
mlr3learners.partykit   "mlr3learners.partykit"   "0.2.0" NA       "R (>= 3.1.0)" "data.table, mlr3, mlr3misc, paradox, R6"                        NA        "bibtex, checkmate, partykit, testthat"
                        Enhances License  License_is_FOSS License_restricts_use OS_type Archs MD5sum                             NeedsCompilation File
mlr3learners.C50        NA       "LGPL-3" NA              NA                    NA      NA    "e1a819fb277b7af59ec573f5ec592375" "no"             NA  
mlr3learners.c50        NA       "LGPL-3" NA              NA                    NA      NA    "2fd5ba51ba155ce890d9df31e29aa0e0" "no"             NA  
mlr3learners.extratrees NA       "LGPL-3" NA              NA                    NA      NA    "20763c7a1474efa44ace5f9330255f18" "no"             NA  
mlr3learners.fnn        NA       "LGPL-3" NA              NA                    NA      NA    "ef54c27564a3c571dd626fdeea4dec58" "no"             NA  
mlr3learners.gbm        NA       "LGPL-3" NA              NA                    NA      NA    "0d447219ff12a42b92b3341f0d9068f6" "no"             NA  
mlr3learners.kernlab    NA       "LGPL-3" NA              NA                    NA      NA    "1481447ea6d469e67d6bc333640e0c82" "no"             NA  
mlr3learners.mboost     NA       "LGPL-3" NA              NA                    NA      NA    "94dc921e0c41776cf37a59efd281d6bf" "no"             NA  
mlr3learners.partykit   NA       "LGPL-3" NA              NA                    NA      NA    "4e335cc9c201ab10d90c969709bef746" "no"             NA  
mlr3learners.C50        ""
mlr3learners.c50        ""
mlr3learners.extratrees ""
mlr3learners.fnn        ""
mlr3learners.gbm        ""
mlr3learners.kernlab    ""
mlr3learners.mboost     ""
mlr3learners.partykit   ""

which demonstrates that this does not answer raphael's request? you cannot see the provided learners of the packages. you would have to download and install them.
(which might still be the best way to go? in order to keep maintenance lightweight)

Copy link

please note that the package name DOES not coincide with the learner-name. and an extension package can and should contain multiple learners

Copy link

I also think this is too much information for the average user, it really doesn't need to be more than: id, package (+ properties for a bonus)

Copy link

mllg commented Apr 3, 2020

Ok, then maintaining an extra file seems inevitable.

Copy link

pat-s commented Apr 4, 2020

The reprex below creates meta-information which could be deployed as text files to mlr3learners.drat (keys.txt or packages.txt) in a daily CRON run.
All that is needed is dget(<file.txt> to scrape the respective piece of information.

This information can be used to auto-load/auto-install packages/learners behind the scenes.

The only restriction is that one has internet access - but we could assert this.

So in summary we could have files that store

  • All available learner IDs across the whole mlr3 ecosystem
  • All available learner extension packages across the whole mlr3 ecosystem
  • A machine readable list of all learners including their properties/feature types/whatever there is available after a learner has been created.

(This information could also be used to automate the creation of a nice HTML table, similar as we have one in mlr2)


extra_learners <- rownames(available.packages(repos = ""))
lapply(extra_learners, require, character.only = TRUE, quietly = TRUE)
keys <- mlr_learners$keys()
#> [1] "mlr3learners.C50"        "mlr3learners.c50"       
#> [3] "mlr3learners.extratrees" "mlr3learners.fnn"       
#> [5] "mlr3learners.gbm"        "mlr3learners.kernlab"   
#> [7] "mlr3learners.mboost"     "mlr3learners.partykit"
dput(keys, file = paste0(tempdir(), "/keys.txt"))
dget(file = paste0(tempdir(), "/keys.txt"))
#>  [1] "classif.C5.0"         "classif.ctree"        "classif.debug"       
#>  [4] "classif.extratrees"   "classif.featureless"  "classif.fnn"         
#>  [7] "classif.gamboost"     "classif.gbm"          "classif.glmboost"    
#> [10] "classif.glmnet"       "classif.kknn"         "classif.ksvm"        
#> [13] "classif.lda"          "classif.log_reg"      "classif.naive_bayes" 
#> [16] "classif.qda"          "classif.ranger"       "classif.rpart"       
#> [19] "classif.svm"          "classif.xgboost"      "dens.hist"           
#> [22] "dens.kde"             "dens.kdeKD"           "dens.kdeKS"          
#> [25] "dens.locfit"          "dens.logspline"       "dens.mixed"          
#> [28] "dens.nonpar"          "dens.pen"             "dens.plug"           
#> [31] "dens.spline"          "regr.ctree"           "regr.extratrees"     
#> [34] "regr.featureless"     "regr.fnn"             "regr.gamboost"       
#> [37] "regr.gbm"             "regr.glmboost"        "regr.glmnet"         
#> [40] "regr.kknn"            ""              "regr.ksvm"           
#> [43] "regr.lm"              "regr.ranger"          "regr.rpart"          
#> [46] "regr.svm"             "regr.xgboost"         "surv.blackboost"     
#> [49] "surv.coxph"           "surv.cvglmnet"        "surv.flexible"       
#> [52] "surv.gamboost"        "surv.gbm"             "surv.glmboost"       
#> [55] "surv.glmnet"          "surv.kaplan"          "surv.mboost"         
#> [58] "surv.nelson"          "surv.obliqueRSF"      "surv.parametric"     
#> [61] "surv.penalized"       "surv.randomForestSRC" "surv.ranger"         
#> [64] "surv.rpart"           "surv.svm"

all_lrns = lrns(keys)
properties = mlr3misc::map(all_lrns, function(.x) .x$properties) %>% 
package = mlr3misc::map(all_lrns, function(.x) .x$packages)
tibble::tibble(name = keys, package = package, properties = properties) 
#> # A tibble: 65 x 3
#>    name                package   properties  
#>    <chr>               <list>    <named list>
#>  1 classif.C5.0        <chr [1]> <chr [4]>   
#>  2 classif.ctree       <chr [1]> <chr [3]>   
#>  3 classif.debug       <chr [0]> <chr [3]>   
#>  4 classif.extratrees  <chr [1]> <chr [3]>   
#>  5 classif.featureless <chr [0]> <chr [5]>   
#>  6 classif.fnn         <chr [1]> <chr [2]>   
#>  7 classif.gamboost    <chr [1]> <chr [2]>   
#>  8 classif.gbm         <chr [1]> <chr [5]>   
#>  9 classif.glmboost    <chr [1]> <chr [2]>   
#> 10 classif.glmnet      <chr [1]> <chr [3]>   
#> # … with 55 more rows

Created on 2020-04-04 by the reprex package (v0.3.0)

Copy link

I guess we can close this @mllg @berndbischl ?

@mllg mllg closed this as completed Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

5 participants