Check and update data before explanation #245

martinju · 2021-01-07T21:28:21Z

Updated!

The point of this PR is to add proper checks that the data being passed to shapr (training data) and explain (testing data) have the correct format, meaning that they contain the right features, have the right class and if they are factor features have the right levels. While this was not crucial when we only supported numerical features, it has become important now that we support factors as well through the ctree approach. If there are differences between what is defined in model object and in the data, the data are either adjusted/updated and a message is provided, or an error is thrown.

All of this is mainly done through a the function preprocess_data: The arguments to that function is the data to check and a list describing the "correct" feature details which is extracted from the model object using a new function get_model_specs which is an updated version of the previous features function (now obsolete, so deleted). preprocess_data calls get_data_specs which is a new function that extracts the feature info from the data. The new function check_features is the main contribution, and that compares the feature list from the model with that from the data. That function is motivated by the earlier make_dummies and apply_dummies functions. The function does no re-shuffling, deletion of unused columns or so, but returns a feature_list that is passed on to update_data which handles that. The feature list is written to the explainer at the end of shapr and passed on to explain where the checking and updating is done for the test data.

A bunch of remarks

get_model_specs calls the function get_supported_models which gives a table with all function classes defined for the functions model_type, predict_model (and get_model_specs) which are needed to run explain a model class. This seems like a valid approach, but I currently have some issues testing custom models as testthat does not see functions that are defined within a testthat environment, and I am probably not allowed to write to .globalenv within the tests. Testing of get_supported_models works and is included in the standard test environment, while tests for custom models are for now done manually by running expect_silent(source("tests/testthat/manual_test_scripts/test_custom_models.R")) in the console. This works, but at some point we should find a better way to do this.
The make_dummies function now also makes use of the check_features function to reuse more code.
A full round of linting and styling of the full package will be done in a separate PR after this one, so just ignore all of that for now.
There are tests are not great, but covr::covr() shows that the majority of the code is tested, so I think it is fine for now. Ref
Restructure tests #249 , we should restucture the tests later, but not for this CRAN release.
Tests still failing on macOS. Don't understand why. Keep on ignoring for now.

Currently not possible to test the custom model as get_supported_models searches globalenv, but apparently not the testing environment

and doing this checking within get_model_specs, but I don't require this function at all for custom models to simplify

…actors

R/explanation.R

aredelmeier

I tried to make it through all the code but I did not finish test-explanation.R and test-models.R. I have a tiny number of comments and spelling things to suggest.

R/features.R

R/preprocess_data.R

R/shapley.R

vignettes/understanding_shapr.Rmd

aredelmeier

A few more comments. Nothing serious. Mostly spelling/grammar, some notes about ordering tests, some notes about being consistent with commenting, some notes about naming. I did my best to be as thorough as possible but it is possible I missed some things.

R/features.R

R/preprocess_data.R

tests/testthat/test-explanation.R

tests/testthat/test-models.R

tests/testthat/test-sampling.R

95% Annabelles comments. Some minor modifications Co-authored-by: Annabelle Redelmeier <[email protected]>

forgotten suggestion Co-authored-by: Annabelle Redelmeier <[email protected]>

…check_factors"

martinju

Thanks for a through review, this was just what I wanted. I think have been through it all now. Will fix a few things and then push again.

martinju · 2021-01-26T07:58:52Z

R/preprocess_data.R

+  # Reorder and delete unused columns
+  cnms_remove <- setdiff(colnames(data), new_labels)
+  if (length(cnms_remove) > 0) {
+    message(paste0("The columns(s) ",paste0(cnms_remove,collapse=", ")," is not used by the model and thus removed ",


It is just about avoiding too long lines. I am splitting the whole thing into a multiline ting now.

martinju · 2021-01-26T08:05:07Z

tests/testthat/test-explanation.R

-                           prediction_zero = p0, sample = FALSE)
+      # Ex 18: Explain combined II - all empirical
+      approach <- c(rep("empirical", 4))
+      ex_list[[18]] <- explain(x_test, explainer, approach = approach, prediction_zero = p0)


Good catch. Just a mistake. Will fix

martinju · 2021-01-26T08:08:40Z

tests/testthat/test-explanation.R

-      explain(x_test, explainer, approach = "ctree", prediction_zero = p0, sample = FALSE,
-              mc_cores_create_ctree = multicore, mc_cores_sample_ctree = 1)
-    )
+      # Ex 38: Test that ctree with mincriterion equal to same probability four times gives the same as only passing one


tests/testthat/test-explanation.R

tests/testthat/test-models.R

and do not accept NULL either

martinju added 30 commits November 20, 2020 16:25

NOTE ON TODO

b7cda2a

edit stop message

d0b33f6

create custom model checking function

bd53c55

save current WIP state

486e3e7

start of new get_model_features structure

0cbc73a

done with get_model_features -- continue with get_data_features!

a7064f0

added get_data_features

29b671d

done with shapr functions. Do the same with explain

349b4d3

cleanup old functions

476a703

explanation-func

37ee3bc

started tests + adding factor level reordering

35ed600

work on updater

8e5a7d2

finished update data

0b0112d

started fixing update_data

c5b6145

Finished tests for check_features ++

7214b53

edits + docu

d98f5d2

Some documentation + tests for update_data

037d4a1

Update FULLY checked explainer object for tests

debea32

bugfix for how dummylist was used + started to improve tests for shapr

ca02ebf

bugfix + more tests

3377083

Various testing updates

1c3704f

Currently not possible to test the custom model as get_supported_models searches globalenv, but apparently not the testing environment

hack to simplify bugfix with testthat/methods

8448d49

.

e0cec62

bugfix

4194177

fixed methods-stuff in tests by attach/detach inside the test

8dc927d

updating todo

cbbc182

updating examples + todo

61ef798

adding function fix_data ++

5b8547d

fix feature_labels issue

e2201d1

fix all tests (for now)

3aba239

martinju added 15 commits January 15, 2021 12:58

move from model_type to model_checker

fd9f432

and doing this checking within get_model_specs, but I don't require this function at all for custom models to simplify

.

7912b68

update vignette with new style

acd81a8

Fix tests

56a8486

remove native column

4b907e7

ensure naming of f_lists are being used

b0623c0

.

4375439

docs and examples done

80901b3

make actions run also on cranversion repo

29bfa76

Use suggested packages conditionally -- Get back on CRAN (#246)

8bad733

remove $p in shapr

e2c4e10

cran submission

74e3117

Merge remote-tracking branch 'origin/cranversion' into martin/check_f…

b61c58b

…actors

Update examples and tests according to CRAN policy

332b966

Add manual test ++

4cee96a

martinju marked this pull request as ready for review January 22, 2021 12:25

martinju requested a review from aredelmeier January 22, 2021 12:25

aredelmeier reviewed Jan 22, 2021

View reviewed changes

R/explanation.R Outdated Show resolved Hide resolved

aredelmeier reviewed Jan 22, 2021

View reviewed changes

R/explanation.R Outdated Show resolved Hide resolved

aredelmeier reviewed Jan 22, 2021

View reviewed changes

aredelmeier approved these changes Jan 25, 2021

View reviewed changes

martinju and others added 3 commits January 25, 2021 22:08

Apply suggestions from code review

13e92c9

95% Annabelles comments. Some minor modifications Co-authored-by: Annabelle Redelmeier <[email protected]>

Apply suggestions from code review

09407d5

forgotten suggestion Co-authored-by: Annabelle Redelmeier <[email protected]>

Auto stash before merge of "martin/check_factors" and "origin/martin/…

6d8cf0f

…check_factors"

martinju commented Jan 26, 2021

View reviewed changes

martinju added 3 commits January 26, 2021 11:16

Review updates

2761d59

Allow NULL feature labels in model for checking

320a20e

do factor level checking only for factors

596c2da

and do not accept NULL either

martinju merged commit ba36e74 into master Jan 26, 2021

martinju deleted the martin/check_factors branch January 26, 2021 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check and update data before explanation #245

Check and update data before explanation #245

martinju commented Jan 7, 2021 •

edited

Loading

aredelmeier left a comment •

edited

Loading

aredelmeier left a comment

martinju left a comment

martinju Jan 26, 2021

martinju Jan 26, 2021

martinju Jan 26, 2021

Check and update data before explanation #245

Check and update data before explanation #245

Conversation

martinju commented Jan 7, 2021 • edited Loading

aredelmeier left a comment • edited Loading

Choose a reason for hiding this comment

aredelmeier left a comment

Choose a reason for hiding this comment

martinju left a comment

Choose a reason for hiding this comment

martinju Jan 26, 2021

Choose a reason for hiding this comment

martinju Jan 26, 2021

Choose a reason for hiding this comment

martinju Jan 26, 2021

Choose a reason for hiding this comment

martinju commented Jan 7, 2021 •

edited

Loading

aredelmeier left a comment •

edited

Loading