Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds learners table and overloads lrn #142

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Suggests:
ranger,
rmarkdown,
testthat,
tibble,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not pull in tibble as a soft dep.
The following overloads the printer function for data.frame (can be extended for data.tables) and can live in everyone's .Rprofile:

# tibble > data.frame
if (interactive() && "tibble" %in% rownames(utils::installed.packages())) {
  print.data.frame = function(x, ...) {
    tibble:::print.tbl(tibble::as_tibble(x), ...)
  }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't CRAN give a warning/note when using triple colon in packages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for inclusion in a package, this is for a local .Rprofile.

xgboost
RdMacros:
mlr3misc
Expand Down
130 changes: 130 additions & 0 deletions R/mlr3learners_table.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Ideally this table would be created automatically and all required packages would be installed.
# and loaded. Required packages are mlr3, mlr3learners, mlr3proba, and all packages in
# mlr3learners org, also when ready other packages in mlr3verse that have learners implemented in
# them.

# library(mlr3)
# library(mlr3learners)
# library(mlr3proba)
# library(data.table)
extra_learners = rownames(
available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")
)
Comment on lines +10 to +12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Wrap in tryCatch() because it should not block if no internet is available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is automated and created during builds then surely there is always internet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I did not see that this is not part of the function.

We could possibly store it in the package though the content then depends on the version which users have installed.

I guess querying an online resource (could also be the mlr3 GH repo) while requiring internet access would be better? What about both: Trying to query the online table and fall back to the local one included in the package (with a warning message that this one might not include all learners).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess querying an online resource (could also be the mlr3 GH repo) while requiring internet access would be better?

What's/where's the mlr3 GH repo?

all back to the local one included in the package (with a warning message that this one might not include all learners).

I think this would slightly defeat the point because say a user wants to install xgboost but does not know that it lives in mlr3learners and have therefore only installed mlr3. Then the table will not show xgboost nor will it be able to install it if called.

Unless you just mean only fall back to local when internet is not available? I guess that would make sense and be more intuitive than just erroring. Assuming the local one is identical to the code above except with no call to install.packages/available.packages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you just mean only fall back to local when internet is not available?

I meant this, yes.

What's/where's the mlr3 GH repo?

The mlr3 GitHub repo.

Outlining the process again:

  1. lrns() queries an online resource which could live in the mlr3data repo (to not bloat mlr3 with too much CI updates) GitHub repo. This resource gets generated once a day.
  2. If no internet access is available, a static version included in the mlr3 package installation is used. This option included a warning that a static version was used.
  3. In every mlr3 release, mlr3 ships with the most recent version of this static learner table.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry completely misread and thought you were saying there's a separate repo just for certain GHaction automations.
I completely agree with the process outlined above. But would this not result in mlr3data becoming an import not suggest as the lrn/lrns functions would depend on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The latest version of the learner table (which would be updated by the CI every day) would live in mlr3data and would be queried via the web
  • The static version would live in mlr3 and ship with the package

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good and just so I understand would the static version just have to be manually typed out and updated. Do you want me to try and set-up the build for mlr3data so we can close the first part (i.e. the online table)?

Copy link
Member

@pat-s pat-s Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I would like to ask @mllg if he agrees that mlr3data would be a good place to store a .csv file containing this information?

That sounds good and just so I understand would the static version just have to be manually typed out

Not sure I understand what you mean by this.

  1. Read the .csv from mlr3data into an .rda file into mlr3
  2. Ship this .rda file (static snapshot) with an mlr3 release and use it as the static fallback

Apart from both you can continue to write a .csv containing the table that should be read in later.
Oh wait - is a .csv suitable? We have a nested structure here, don't we? We could write a JSON?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, sorry I have no experience with external data structures in R so just trying to visualise it, but understand it now :)

# install.packages(extra_learners, repos = "https://mlr3learners.github.io/mlr3learners.drat")
lapply(extra_learners, require, character.only = TRUE, quietly = TRUE)

# construct all learners in attached mlr3verse
keys = mlr_learners$keys()
# potential warnings are either that external package required but not installed or package built
# under different R version
all_lrns = suppressWarnings(mlr3::lrns(keys))

# creates data.table with id split into name and class, as well as original id;
# the mlr3 package that the learner is implemented in; external package it is interfaced from;
# learner properties; feature types; and predict types
#
# may look better as tibble, option to print given in below function
#
# ideally this table is abstracted from the user and they access it through the getter below
learner_table = data.table(t(rbindlist(list(mlr3misc::map(all_lrns, function(.x) {
idsplt = strsplit(.x$id, ".", TRUE)[[1]]
list(idsplt[[2]], idsplt[[1]], .x$id, strsplit(.x$man, "::", TRUE)[[1]][1],
.x$packages[1], .x$properties, .x$feature_types, .x$predict_types)
})))))

colnames(learner_table) = c("name", "class", "id", "mlr3_package", "required_package",
"properties", "feature_types", "predict_types")
learner_table[, 1:4] = lapply(learner_table[, 1:4], as.character)
rm(all_lrns, extra_learners, keys)

# getter function for the mlr3 learner table, assume it is called `learner_table`
# args:
# hide_cols `character()`: specify which, if any, columns to hide
# filter `list()`: named list of conditions to filter on, names correspond to column names
# in table
# tibble `logical(1)`: if TRUE returns table as tibble otherwise data.table
#
# examples:
# list_mlr3learners(hide_cols = c("properties", "feature_types"),
# filter = list(class = "surv", predict_types = "distr"))
# list_mlr3learners(tibble = TRUE)
list_mlr3learners = function(hide_cols = NULL, filter = NULL, tibble = FALSE) {

dt = copy(learner_table)

class = mlr3_package = required_package = NULL # hacky fix to prevent NOTE for global binding

if (!is.null(filter)) {
if (!is.null(filter$class)) {
dt = subset(dt, class %in% filter$class)
}
if (!is.null(filter$mlr3_package)) {
dt = subset(dt, mlr3_package %in% filter$mlr3_package)
}
if (!is.null(filter$required_package)) {
dt = subset(dt, required_package %in% filter$required_package)
}
if (!is.null(filter$properties)) {
dt = subset(dt, mlr3misc::map_lgl(dt$properties,
function(.x) any(filter$properties %in% .x)))
}
if (!is.null(filter$feature_types)) {
dt = subset(dt, mlr3misc::map_lgl(dt$feature_types,
function(.x) any(filter$feature_types %in% .x)))
}
if (!is.null(filter$predict_types)) {
dt = subset(dt, mlr3misc::map_lgl(dt$predict_types,
function(.x) any(filter$predict_types %in% .x)))
}
}

if (!is.null(hide_cols)) {
dt = subset(dt, select = !(colnames(dt) %in% hide_cols))
}

if (tibble) {
return(tibble::tibble(dt))
} else {
return(dt)
}
}


# overloads lrn function to automatically detect and install learners from any packages in
# mlr3verse. uses list_mlr3learners with filtering for the given key.
# this should actually probably be implemented in mlr3misc::dictionary_sugar_get
# however this would create a dependency loop unless the learners table also lives in mlr3misc.
# a vectorised version of this for `lrns` follows naturally.
#
# the function filters the learner_table, searches to see if the required mlr3_package is installed
# and if not uses usethis::ui_yeah to ask user to install, if yes then installed and learner loaded,
# if not then errors
#
# args:
# .key `character(1)`: learner key
#
# examples:
#
# lrn("classif.ranger")
#
# unloadNamespace("mlr3learners.coxboost")
# utils::remove.packages("mlr3learners.coxboost")
# lrn("surv.coxboost")

lrn = function(.key, ...) {
id = NULL # hacky fix to prevent NOTE for global binding
RaphaelS1 marked this conversation as resolved.
Show resolved Hide resolved
pkg = unlist(subset(list_mlr3learners(), id == .key)$mlr3_package)
inst = suppressWarnings(require(pkg, quietly = FALSE, character.only = TRUE))
if (!inst) {
ans = usethis::ui_yeah(
RaphaelS1 marked this conversation as resolved.
Show resolved Hide resolved
sprintf("%s is not installed but is required, do you want to install this now?", pkg),
n_no = 1
)
if (ans) {
install.packages(pkg, repos = "https://mlr3learners.github.io/mlr3learners.drat")
} else {
stop(sprintf("%s is not installed but is required.", pkg))
}
}
mlr3misc::dictionary_sugar_get(mlr_learners, .key, ...)
}