Feature Request: Registry of learners #81

RaphaelS1 · 2020-04-03T09:15:34Z

It would be nice to have a permanent registry specifically for mlr3learners and mlr3learners.<package> that lists all available learners to install.
i.e. like mlr3::mlr_learners except not a dictionary that gets repopulated but instead a permanent list of all available learners that can be installed at any given time. If this was a table like mlr::listLearners() with properties that would be a bonus!

The text was updated successfully, but these errors were encountered:

berndbischl · 2020-04-03T10:12:52Z

so not whats there, currently in memory, but what available for installation?
but that function would need access to github?
and it would need to download packages and install them just to able to peek inside them?

berndbischl · 2020-04-03T10:15:37Z

ok, i also just read your other issue #82
the thing is: where would that registry live? and who maintains this how?
currently it is very simple for a new party to add an extension learner package for mlr, without talking to us, you simply put it on github.

RaphaelS1 · 2020-04-03T10:16:57Z

Not necessarily, I'd imagine that this would just be a registry of strings, e.g. like a datatable that is appended once by whoever adds a new learner, so of the form

id	package	properties
classif.xgboost	xgboost	...

It would make sense to live in mlr3learners but could also live in mlr3, it would be quite lightweight...

RaphaelS1 · 2020-04-03T10:17:40Z

currently it is very simple for a new party to add an extension learner package for mlr, without talking to us, you simply put it on github.

Surely one of you checks this package though and verifies it's up to some level of standard? You could also ask the maintainer of the extension to put in a PR to make sure they are added to the registry. Ultimately if they don't do it then it's their loss

berndbischl · 2020-04-03T10:34:19Z

well, no, what you are describing is a different process.

a) this is how it is, currently. you write a new learner extension package. you put it somewhere on github. we have given you enough unit-testing tools to demonstrate it works. now maybe the whole mlr3-team is on extended leave. you can still publish your package, everything works.

b) if we do what you propose, we now have to update package X on CRAN (mlr3learners, or mlr3, rather the first?) each time we have to update that table?

OTOH you COULD argue that we are already maintaining the wiki table on github?
your whole issue here in one sentence is basically: "why is the wiki table not in machine readable format" correct?

RaphaelS1 · 2020-04-03T10:39:05Z

I'm not suggesting you push updates to CRAN for each learner. They can wait until the next release.

But yes that is essentially the issue, because when working in R I don't want to go back and forth between GitHub

mllg · 2020-04-03T15:16:02Z

Machine-readable format:

available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")

mllg · 2020-04-03T15:17:21Z

Not saying that this is very convenient, we should really look into making the additional learners easier to discover and install.

mllg · 2020-04-03T15:18:47Z

Oh, and if you want properties ... yes, we might need to create a JSON file for this.

berndbischl · 2020-04-03T15:27:41Z

Machine-readable format:

available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")

I would have been nice to have posted the output too.

> options(width = 200)
> available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")
                        Package                   Version Priority Depends        Imports                                                          LinkingTo Suggests                               
mlr3learners.C50        "mlr3learners.C50"        "0.1.0" NA       "R (>= 3.1.0)" "C50, data.table, mlr3 (>= 0.1.7), mlr3misc, paradox, R6"        NA        "checkmate, rmarkdown, testthat"       
mlr3learners.c50        "mlr3learners.c50"        "0.1.2" NA       "R (>= 3.1.0)" "C50, data.table, mlr3 (>= 0.1.7), mlr3misc, paradox, R6"        NA        "checkmate, rmarkdown, testthat"       
mlr3learners.extratrees "mlr3learners.extratrees" "0.2.0" NA       "R (>= 3.1.0)" "checkmate, data.table, extraTrees, mlr3, mlr3misc, paradox, R6" NA        "rmarkdown, testthat"                  
mlr3learners.fnn        "mlr3learners.fnn"        "0.2.0" NA       "R (>= 3.1.0)" "checkmate, data.table, FNN, mlr3, mlr3misc, paradox, R6"        NA        "rmarkdown, testthat"                  
mlr3learners.gbm        "mlr3learners.gbm"        "0.1.0" NA       "R (>= 3.1.0)" "data.table, gbm, mlr3, mlr3misc, paradox, R6"                   NA        "checkmate, testthat"                  
mlr3learners.kernlab    "mlr3learners.kernlab"    "0.2.0" NA       "R (>= 3.1.0)" "data.table, kernlab, mlr3, mlr3misc, paradox, R6"               NA        "bibtex, checkmate, testthat"          
mlr3learners.mboost     "mlr3learners.mboost"     "0.3.0" NA       "R (>= 3.1.0)" "data.table, mlr3, mlr3misc, paradox, R6, mboost, withr"         NA        "checkmate, bibtex, testthat"          
mlr3learners.partykit   "mlr3learners.partykit"   "0.2.0" NA       "R (>= 3.1.0)" "data.table, mlr3, mlr3misc, paradox, R6"                        NA        "bibtex, checkmate, partykit, testthat"
                        Enhances License  License_is_FOSS License_restricts_use OS_type Archs MD5sum                             NeedsCompilation File
mlr3learners.C50        NA       "LGPL-3" NA              NA                    NA      NA    "e1a819fb277b7af59ec573f5ec592375" "no"             NA  
mlr3learners.c50        NA       "LGPL-3" NA              NA                    NA      NA    "2fd5ba51ba155ce890d9df31e29aa0e0" "no"             NA  
mlr3learners.extratrees NA       "LGPL-3" NA              NA                    NA      NA    "20763c7a1474efa44ace5f9330255f18" "no"             NA  
mlr3learners.fnn        NA       "LGPL-3" NA              NA                    NA      NA    "ef54c27564a3c571dd626fdeea4dec58" "no"             NA  
mlr3learners.gbm        NA       "LGPL-3" NA              NA                    NA      NA    "0d447219ff12a42b92b3341f0d9068f6" "no"             NA  
mlr3learners.kernlab    NA       "LGPL-3" NA              NA                    NA      NA    "1481447ea6d469e67d6bc333640e0c82" "no"             NA  
mlr3learners.mboost     NA       "LGPL-3" NA              NA                    NA      NA    "94dc921e0c41776cf37a59efd281d6bf" "no"             NA  
mlr3learners.partykit   NA       "LGPL-3" NA              NA                    NA      NA    "4e335cc9c201ab10d90c969709bef746" "no"             NA  
                        Repository                                                    
mlr3learners.C50        "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.c50        "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.extratrees "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.fnn        "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.gbm        "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.kernlab    "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.mboost     "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.partykit   "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"

which demonstrates that this does not answer raphael's request? you cannot see the provided learners of the packages. you would have to download and install them.
(which might still be the best way to go? in order to keep maintenance lightweight)

berndbischl · 2020-04-03T15:28:30Z

please note that the package name DOES not coincide with the learner-name. and an extension package can and should contain multiple learners

RaphaelS1 · 2020-04-03T15:29:36Z

I also think this is too much information for the average user, it really doesn't need to be more than: id, package (+ properties for a bonus)

mllg · 2020-04-03T18:59:40Z

Ok, then maintaining an extra file seems inevitable.

pat-s · 2020-04-04T13:50:45Z

The reprex below creates meta-information which could be deployed as text files to mlr3learners.drat (keys.txt or packages.txt) in a daily CRON run.
All that is needed is dget(<file.txt> to scrape the respective piece of information.

This information can be used to auto-load/auto-install packages/learners behind the scenes.

The only restriction is that one has internet access - but we could assert this.

So in summary we could have files that store

All available learner IDs across the whole mlr3 ecosystem
All available learner extension packages across the whole mlr3 ecosystem
A machine readable list of all learners including their properties/feature types/whatever there is available after a learner has been created.

(This information could also be used to automate the creation of a nice HTML table, similar as we have one in mlr2)

library(mlr3)
library(mlr3learners)
library(mlr3proba)
library(magrittr)

extra_learners <- rownames(available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat"))
lapply(extra_learners, require, character.only = TRUE, quietly = TRUE)
keys <- mlr_learners$keys()
print(extra_learners)
#> [1] "mlr3learners.C50"        "mlr3learners.c50"       
#> [3] "mlr3learners.extratrees" "mlr3learners.fnn"       
#> [5] "mlr3learners.gbm"        "mlr3learners.kernlab"   
#> [7] "mlr3learners.mboost"     "mlr3learners.partykit"
dput(keys, file = paste0(tempdir(), "/keys.txt"))
dget(file = paste0(tempdir(), "/keys.txt"))
#>  [1] "classif.C5.0"         "classif.ctree"        "classif.debug"       
#>  [4] "classif.extratrees"   "classif.featureless"  "classif.fnn"         
#>  [7] "classif.gamboost"     "classif.gbm"          "classif.glmboost"    
#> [10] "classif.glmnet"       "classif.kknn"         "classif.ksvm"        
#> [13] "classif.lda"          "classif.log_reg"      "classif.naive_bayes" 
#> [16] "classif.qda"          "classif.ranger"       "classif.rpart"       
#> [19] "classif.svm"          "classif.xgboost"      "dens.hist"           
#> [22] "dens.kde"             "dens.kdeKD"           "dens.kdeKS"          
#> [25] "dens.locfit"          "dens.logspline"       "dens.mixed"          
#> [28] "dens.nonpar"          "dens.pen"             "dens.plug"           
#> [31] "dens.spline"          "regr.ctree"           "regr.extratrees"     
#> [34] "regr.featureless"     "regr.fnn"             "regr.gamboost"       
#> [37] "regr.gbm"             "regr.glmboost"        "regr.glmnet"         
#> [40] "regr.kknn"            "regr.km"              "regr.ksvm"           
#> [43] "regr.lm"              "regr.ranger"          "regr.rpart"          
#> [46] "regr.svm"             "regr.xgboost"         "surv.blackboost"     
#> [49] "surv.coxph"           "surv.cvglmnet"        "surv.flexible"       
#> [52] "surv.gamboost"        "surv.gbm"             "surv.glmboost"       
#> [55] "surv.glmnet"          "surv.kaplan"          "surv.mboost"         
#> [58] "surv.nelson"          "surv.obliqueRSF"      "surv.parametric"     
#> [61] "surv.penalized"       "surv.randomForestSRC" "surv.ranger"         
#> [64] "surv.rpart"           "surv.svm"

all_lrns = lrns(keys)
properties = mlr3misc::map(all_lrns, function(.x) .x$properties) %>% 
  setNames(keys)
package = mlr3misc::map(all_lrns, function(.x) .x$packages)
tibble::tibble(name = keys, package = package, properties = properties) 
#> # A tibble: 65 x 3
#>    name                package   properties  
#>    <chr>               <list>    <named list>
#>  1 classif.C5.0        <chr [1]> <chr [4]>   
#>  2 classif.ctree       <chr [1]> <chr [3]>   
#>  3 classif.debug       <chr [0]> <chr [3]>   
#>  4 classif.extratrees  <chr [1]> <chr [3]>   
#>  5 classif.featureless <chr [0]> <chr [5]>   
#>  6 classif.fnn         <chr [1]> <chr [2]>   
#>  7 classif.gamboost    <chr [1]> <chr [2]>   
#>  8 classif.gbm         <chr [1]> <chr [5]>   
#>  9 classif.glmboost    <chr [1]> <chr [2]>   
#> 10 classif.glmnet      <chr [1]> <chr [3]>   
#> # … with 55 more rows

^{Created on 2020-04-04 by the reprex package (v0.3.0)}

sebffischer · 2024-08-17T08:55:11Z

I guess we can close this @mllg @berndbischl ?

RaphaelS1 mentioned this issue Apr 3, 2020

Feature Request: Sugar for installing new learners #82

Open

RaphaelS1 mentioned this issue Jul 20, 2020

adds learners table and overloads lrn #142

Closed

mllg closed this as completed Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Registry of learners #81

Feature Request: Registry of learners #81

RaphaelS1 commented Apr 3, 2020

berndbischl commented Apr 3, 2020

berndbischl commented Apr 3, 2020

RaphaelS1 commented Apr 3, 2020

RaphaelS1 commented Apr 3, 2020

berndbischl commented Apr 3, 2020

RaphaelS1 commented Apr 3, 2020

mllg commented Apr 3, 2020

mllg commented Apr 3, 2020

mllg commented Apr 3, 2020

berndbischl commented Apr 3, 2020

berndbischl commented Apr 3, 2020

RaphaelS1 commented Apr 3, 2020

mllg commented Apr 3, 2020

pat-s commented Apr 4, 2020 •

edited

Loading

sebffischer commented Aug 17, 2024

Feature Request: Registry of learners #81

Feature Request: Registry of learners #81

Comments

RaphaelS1 commented Apr 3, 2020

berndbischl commented Apr 3, 2020

berndbischl commented Apr 3, 2020

RaphaelS1 commented Apr 3, 2020

RaphaelS1 commented Apr 3, 2020

berndbischl commented Apr 3, 2020

RaphaelS1 commented Apr 3, 2020

mllg commented Apr 3, 2020

mllg commented Apr 3, 2020

mllg commented Apr 3, 2020

berndbischl commented Apr 3, 2020

berndbischl commented Apr 3, 2020

RaphaelS1 commented Apr 3, 2020

mllg commented Apr 3, 2020

pat-s commented Apr 4, 2020 • edited Loading

sebffischer commented Aug 17, 2024

pat-s commented Apr 4, 2020 •

edited

Loading