-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check and update data before explanation #245
Conversation
Currently not possible to test the custom model as get_supported_models searches globalenv, but apparently not the testing environment
and doing this checking within get_model_specs, but I don't require this function at all for custom models to simplify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make it through all the code but I did not finish test-explanation.R and test-models.R. I have a tiny number of comments and spelling things to suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments. Nothing serious. Mostly spelling/grammar, some notes about ordering tests, some notes about being consistent with commenting, some notes about naming. I did my best to be as thorough as possible but it is possible I missed some things.
95% Annabelles comments. Some minor modifications Co-authored-by: Annabelle Redelmeier <[email protected]>
forgotten suggestion Co-authored-by: Annabelle Redelmeier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a through review, this was just what I wanted. I think have been through it all now. Will fix a few things and then push again.
R/preprocess_data.R
Outdated
# Reorder and delete unused columns | ||
cnms_remove <- setdiff(colnames(data), new_labels) | ||
if (length(cnms_remove) > 0) { | ||
message(paste0("The columns(s) ",paste0(cnms_remove,collapse=", ")," is not used by the model and thus removed ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just about avoiding too long lines. I am splitting the whole thing into a multiline ting now.
tests/testthat/test-explanation.R
Outdated
prediction_zero = p0, sample = FALSE) | ||
# Ex 18: Explain combined II - all empirical | ||
approach <- c(rep("empirical", 4)) | ||
ex_list[[18]] <- explain(x_test, explainer, approach = approach, prediction_zero = p0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Just a mistake. Will fix
tests/testthat/test-explanation.R
Outdated
explain(x_test, explainer, approach = "ctree", prediction_zero = p0, sample = FALSE, | ||
mc_cores_create_ctree = multicore, mc_cores_sample_ctree = 1) | ||
) | ||
# Ex 38: Test that ctree with mincriterion equal to same probability four times gives the same as only passing one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing
and do not accept NULL either
Updated!
The point of this PR is to add proper checks that the data being passed to shapr (training data) and explain (testing data) have the correct format, meaning that they contain the right features, have the right class and if they are factor features have the right levels. While this was not crucial when we only supported numerical features, it has become important now that we support factors as well through the ctree approach. If there are differences between what is defined in model object and in the data, the data are either adjusted/updated and a message is provided, or an error is thrown.
All of this is mainly done through a the function
preprocess_data
: The arguments to that function is the data to check and a list describing the "correct" feature details which is extracted from the model object using a new functionget_model_specs
which is an updated version of the previousfeatures
function (now obsolete, so deleted).preprocess_data
callsget_data_specs
which is a new function that extracts the feature info from the data. The new functioncheck_features
is the main contribution, and that compares the feature list from the model with that from the data. That function is motivated by the earliermake_dummies
andapply_dummies
functions. The function does no re-shuffling, deletion of unused columns or so, but returns a feature_list that is passed on toupdate_data
which handles that. The feature list is written to the explainer at the end ofshapr
and passed on toexplain
where the checking and updating is done for the test data.A bunch of remarks
get_model_specs
calls the function get_supported_models which gives a table with all function classes defined for the functionsmodel_type
,predict_model
(andget_model_specs
) which are needed to run explain a model class. This seems like a valid approach, but I currently have some issues testing custom models as testthat does not see functions that are defined within a testthat environment, and I am probably not allowed to write to .globalenv within the tests. Testing ofget_supported_models
works and is included in the standard test environment, while tests for custom models are for now done manually by runningexpect_silent(source("tests/testthat/manual_test_scripts/test_custom_models.R"))
in the console. This works, but at some point we should find a better way to do this.The
make_dummies
function now also makes use of thecheck_features
function to reuse more code.A full round of linting and styling of the full package will be done in a separate PR after this one, so just ignore all of that for now.
There are tests are not great, but covr::covr() shows that the majority of the code is tested, so I think it is fine for now. Ref
Restructure tests #249 , we should restucture the tests later, but not for this CRAN release.
Tests still failing on macOS. Don't understand why. Keep on ignoring for now.