Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check and update data before explanation #245

Merged
merged 80 commits into from
Jan 26, 2021
Merged
Show file tree
Hide file tree
Changes from 77 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
b7cda2a
NOTE ON TODO
martinju Nov 20, 2020
d0b33f6
edit stop message
martinju Nov 23, 2020
bd53c55
create custom model checking function
martinju Nov 23, 2020
486e3e7
save current WIP state
martinju Nov 23, 2020
0cbc73a
start of new get_model_features structure
martinju Nov 23, 2020
a7064f0
done with get_model_features -- continue with get_data_features!
martinju Nov 23, 2020
29b671d
added get_data_features
martinju Nov 24, 2020
349b4d3
done with shapr functions. Do the same with explain
martinju Nov 24, 2020
476a703
cleanup old functions
martinju Nov 24, 2020
37ee3bc
explanation-func
martinju Nov 24, 2020
35ed600
started tests + adding factor level reordering
martinju Nov 24, 2020
8e5a7d2
work on updater
martinju Nov 25, 2020
0b0112d
finished update data
martinju Nov 25, 2020
c5b6145
started fixing update_data
martinju Nov 25, 2020
7214b53
Finished tests for check_features ++
martinju Nov 25, 2020
d98f5d2
edits + docu
martinju Nov 25, 2020
037d4a1
Some documentation + tests for update_data
martinju Dec 8, 2020
debea32
Update FULLY checked explainer object for tests
martinju Dec 8, 2020
ca02ebf
bugfix for how dummylist was used + started to improve tests for shapr
martinju Dec 8, 2020
3377083
bugfix + more tests
martinju Dec 9, 2020
1c3704f
Various testing updates
martinju Dec 10, 2020
8448d49
hack to simplify bugfix with testthat/methods
martinju Dec 10, 2020
e0cec62
.
martinju Dec 10, 2020
4194177
bugfix
martinju Dec 10, 2020
8dc927d
fixed methods-stuff in tests by attach/detach inside the test
martinju Dec 10, 2020
cbbc182
updating todo
martinju Dec 10, 2020
61ef798
updating examples + todo
martinju Jan 4, 2021
5b8547d
adding function fix_data ++
martinju Jan 6, 2021
e2201d1
fix feature_labels issue
martinju Jan 6, 2021
3aba239
fix all tests (for now)
martinju Jan 6, 2021
0400d5a
remove model_type from core of explainer
martinju Jan 6, 2021
c9a14f7
wrapping up
martinju Jan 7, 2021
ef1c718
check passed
martinju Jan 7, 2021
b0c57b2
style explanation.R
martinju Jan 7, 2021
9af5227
update todo
martinju Jan 7, 2021
bf80e55
.
martinju Jan 7, 2021
ed3f540
renormalize line endings on windows
martinju Jan 7, 2021
e762cfc
naming and simplification
martinju Jan 7, 2021
90ffdc4
docs
martinju Jan 7, 2021
1a57ae6
update preprocess_data
martinju Jan 7, 2021
4ae6428
pass data feature description if no check for model
martinju Jan 7, 2021
ceacfae
add specs type instead of names to check_features
martinju Jan 7, 2021
8f03bb3
idea update
martinju Jan 8, 2021
375a5eb
git add --renormalize
martinju Jan 8, 2021
ab8a750
Started to update make and apply dummies
martinju Jan 8, 2021
9acbd5c
.
martinju Jan 13, 2021
62c47a5
make dummies and apply_dummies updated
martinju Jan 13, 2021
9d4ee53
make tests pass
martinju Jan 13, 2021
12b295c
dummylist -> feature_list within package
martinju Jan 13, 2021
3826222
x -> model in get_model_specs
martinju Jan 13, 2021
0fc8a40
move processing to new file
martinju Jan 13, 2021
e5d5ecc
.
martinju Jan 13, 2021
ca544cf
f_list_2 edits
martinju Jan 13, 2021
87aeabe
some doc updates
martinju Jan 13, 2021
b318e45
Removed feature_labels from shapr to always require get_model_specs
martinju Jan 13, 2021
5c3974b
.
martinju Jan 13, 2021
bdfa9b9
modified example
martinju Jan 14, 2021
f0806e1
fixing check_features + passing tests
martinju Jan 15, 2021
f2ebb9e
make cusotm model example
martinju Jan 15, 2021
fd9f432
move from model_type to model_checker
martinju Jan 15, 2021
7912b68
.
martinju Jan 15, 2021
acd81a8
update vignette with new style
martinju Jan 17, 2021
56a8486
Fix tests
martinju Jan 17, 2021
4b907e7
remove native column
martinju Jan 17, 2021
b0623c0
ensure naming of f_lists are being used
martinju Jan 17, 2021
4375439
.
martinju Jan 17, 2021
80901b3
docs and examples done
martinju Jan 20, 2021
29bfa76
make actions run also on cranversion repo
martinju Jan 21, 2021
8bad733
Use suggested packages conditionally -- Get back on CRAN (#246)
martinju Jan 21, 2021
e2c4e10
remove $p in shapr
martinju Jan 21, 2021
74e3117
cran submission
martinju Jan 21, 2021
b61c58b
Merge remote-tracking branch 'origin/cranversion' into martin/check_f…
martinju Jan 22, 2021
332b966
Update examples and tests according to CRAN policy
martinju Jan 22, 2021
4cee96a
Add manual test ++
martinju Jan 22, 2021
13e92c9
Apply suggestions from code review
martinju Jan 25, 2021
09407d5
Apply suggestions from code review
martinju Jan 25, 2021
6d8cf0f
Auto stash before merge of "martin/check_factors" and "origin/martin/…
martinju Jan 25, 2021
2761d59
Review updates
martinju Jan 26, 2021
320a20e
Allow NULL feature labels in model for checking
martinju Jan 26, 2021
596c2da
do factor level checking only for factors
martinju Jan 26, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@ on:
push:
branches:
- master
- cranversion
pull_request:
branches:
- master
- cranversion

name: R-CMD-check

Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@ on:
push:
branches:
- master
- cranversion
pull_request:
branches:
- master
- cranversion

name: lint

Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
on:
push:
branches: master

branches:
- master
- cranversion
name: pkgdown

jobs:
Expand Down
2 changes: 2 additions & 0 deletions CRAN-RELEASE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This package was submitted to CRAN on 2021-01-21.
Once it is accepted, delete this file and tag the release (commit 8bad7333).
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: shapr
Version: 0.1.3
Version: 0.1.4.9000
Title: Prediction Explanation with Dependence-Aware Shapley Values
Description: Complex machine learning models are often hard to interpret. However, in
many situations it is crucial to understand and explain why a model made a specific
Expand Down
33 changes: 20 additions & 13 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@ S3method(explain,ctree)
S3method(explain,ctree_comb_mincrit)
S3method(explain,empirical)
S3method(explain,gaussian)
S3method(features,gam)
S3method(features,glm)
S3method(features,lm)
S3method(features,ranger)
S3method(features,xgb.Booster)
S3method(model_type,default)
S3method(model_type,gam)
S3method(model_type,glm)
S3method(model_type,lm)
S3method(model_type,ranger)
S3method(model_type,xgb.Booster)
S3method(get_model_specs,gam)
S3method(get_model_specs,glm)
S3method(get_model_specs,lm)
S3method(get_model_specs,ranger)
S3method(get_model_specs,xgb.Booster)
S3method(model_checker,default)
S3method(model_checker,gam)
S3method(model_checker,glm)
S3method(model_checker,lm)
S3method(model_checker,ranger)
S3method(model_checker,xgb.Booster)
S3method(plot,shapr)
S3method(predict_model,default)
S3method(predict_model,gam)
Expand All @@ -29,21 +29,25 @@ S3method(prepare_data,ctree)
S3method(prepare_data,empirical)
S3method(prepare_data,gaussian)
export(aicc_full_single_cpp)
export(check_features)
export(correction_matrix_cpp)
export(create_ctree)
export(explain)
export(feature_combinations)
export(feature_matrix_cpp)
export(features)
export(get_data_specs)
export(get_model_specs)
export(hat_matrix_cpp)
export(mahalanobis_distance_cpp)
export(make_dummies)
export(model_type)
export(model_checker)
export(observation_impute_cpp)
export(predict_model)
export(prepare_data)
export(preprocess_data)
export(rss_cpp)
export(shapr)
export(update_data)
export(weight_matrix_cpp)
importFrom(Rcpp,sourceCpp)
importFrom(data.table,":=")
Expand All @@ -65,9 +69,12 @@ importFrom(graphics,hist)
importFrom(graphics,plot)
importFrom(graphics,rect)
importFrom(stats,as.formula)
importFrom(stats,contrasts)
importFrom(stats,model.frame)
importFrom(stats,model.matrix)
importFrom(stats,predict)
importFrom(stats,setNames)
importFrom(utils,head)
importFrom(utils,methods)
importFrom(utils,tail)
useDynLib(shapr, .registration = TRUE)
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@

# shapr 0.1.4

* Patch to fulfill CRAN policy of using packages under Suggests conditionally (in tests and examples)

# shapr 0.1.3

* Fix installation error on Solaris
Expand Down
76 changes: 17 additions & 59 deletions R/explanation.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@
#'
#' @param ... Additional arguments passed to \code{\link{prepare_data}}
#'
#' @details The most important thing to notice is that \code{shapr} has implemented three different
#' @details The most important thing to notice is that \code{shapr} has implemented four different
#' approaches for estimating the conditional distributions of the data, namely \code{"empirical"},
#' \code{"gaussian"} and \code{"copula"}.
#' \code{"gaussian"}, \code{"copula"} and \code{"ctree"}.
#'
#' In addition to this the user will also have the option of combining the three approaches.
#' In addition, the user also has the option of combining the four approaches.
#' E.g. if you're in a situation where you have trained a model the consists of 10 features,
#' and you'd like to use the \code{"gaussian"} approach when you condition on a single feature,
#' the \code{"empirical"} approach if you condition on 2-5 features, and \code{"copula"} version
Expand Down Expand Up @@ -60,9 +60,10 @@
#'
#' @export
#'
#' @author Camilla Lingjaerde, Nikolai Sellereite
#' @author Camilla Lingjaerde, Nikolai Sellereite, Martin Jullum, Annabelle Redelmeier
#'
#' @examples
#' if (requireNamespace("MASS", quietly = TRUE)) {
#' # Load example data
#' data("Boston", package = "MASS")
#'
Expand Down Expand Up @@ -99,19 +100,22 @@
#' print(explain1$dt)
#'
#' # Plot the results
#' if (requireNamespace("ggplot2", quietly = TRUE)) {
#' plot(explain1)
#' }
#' }
explain <- function(x, explainer, approach, prediction_zero, ...) {
extras <- list(...)

# Check input for x
if (!is.matrix(x) & !is.data.frame(x)) {
stop("x should be a matrix or a dataframe.")
stop("x should be a matrix or a data.frame/data.table.")
}

# Check input for approach
if (!(is.vector(approach) &&
is.atomic(approach) &&
(length(approach) == 1 | length(approach) == length(explainer$feature_labels)) &&
(length(approach) == 1 | length(approach) == length(explainer$feature_list$labels)) &&
all(is.element(approach, c("empirical", "gaussian", "copula", "ctree"))))
) {
stop(
Expand All @@ -123,16 +127,7 @@ explain <- function(x, explainer, approach, prediction_zero, ...) {
)
}

# Check that x contains correct variables
if (!all(explainer$feature_labels %in% colnames(x))) {
stop(
paste0(
"\nThe test data, x, does not contain all features necessary for\n",
"generating predictions. Please modify x so that all labels given\n",
"by explainer$feature_labels is present in colnames(x)."
)
)
}


if (length(approach) > 1) {
class(x) <- "combined"
Expand Down Expand Up @@ -175,7 +170,7 @@ explain.empirical <- function(x, explainer, approach, prediction_zero,
start_aicc = 0.1, w_threshold = 0.95, ...) {

# Add arguments to explainer object
explainer$x_test <- explainer_x_test(x, explainer$feature_labels)
explainer$x_test <- as.matrix(preprocess_data(x, explainer$feature_list)$x_dt)
explainer$approach <- approach
explainer$type <- type
explainer$fixed_sigma_vec <- fixed_sigma_vec
Expand Down Expand Up @@ -207,8 +202,9 @@ explain.empirical <- function(x, explainer, approach, prediction_zero,
#' @export
explain.gaussian <- function(x, explainer, approach, prediction_zero, mu = NULL, cov_mat = NULL, ...) {


# Add arguments to explainer object
explainer$x_test <- explainer_x_test(x, explainer$feature_labels)
explainer$x_test <- as.matrix(preprocess_data(x, explainer$feature_list)$x_dt)
explainer$approach <- approach

# If mu is not provided directly, use mean of training data
Expand Down Expand Up @@ -246,7 +242,7 @@ explain.gaussian <- function(x, explainer, approach, prediction_zero, mu = NULL,
explain.copula <- function(x, explainer, approach, prediction_zero, ...) {

# Setup
explainer$x_test <- explainer_x_test(x, explainer$feature_labels)
explainer$x_test <- as.matrix(preprocess_data(x, explainer$feature_list)$x_dt)
explainer$approach <- approach

# Prepare transformed data
Expand Down Expand Up @@ -314,7 +310,7 @@ explain.ctree <- function(x, explainer, approach, prediction_zero,
}

# Add arguments to explainer object
explainer$x_test <- explainer_x_test_dt(x, explainer$feature_labels)
explainer$x_test <- preprocess_data(x, explainer$feature_list)$x_dt
explainer$approach <- approach
explainer$mincriterion <- mincriterion
explainer$minsplit <- minsplit
Expand All @@ -341,7 +337,7 @@ explain.combined <- function(x, explainer, approach, prediction_zero,
# Get indices of combinations
l <- get_list_approaches(explainer$X$n_features, approach)
explainer$return <- TRUE
explainer$x_test <- explainer_x_test(x, explainer$feature_labels)
explainer$x_test <- as.matrix(preprocess_data(x, explainer$feature_list)$x_dt)

dt_l <- list()
for (i in seq_along(l)) {
Expand Down Expand Up @@ -398,32 +394,6 @@ get_list_approaches <- function(n_features, approach) {
return(l)
}

#' @keywords internal
explainer_x_test <- function(x_test, feature_labels) {

# Remove variables that were not used for training
x <- data.table::as.data.table(x_test)
cnms_remove <- setdiff(colnames(x), feature_labels)
if (length(cnms_remove) > 0) x[, (cnms_remove) := NULL]
data.table::setcolorder(x, feature_labels)

return(as.matrix(x))
}

#' @keywords internal
explainer_x_test_dt <- function(x_test, feature_labels) {

# Remove variables that were not used for training
# Same as explainer_x_test() but doesn't convert to a matrix
# Useful for ctree method which sometimes takes categorical features
x <- data.table::as.data.table(x_test)
cnms_remove <- setdiff(colnames(x), feature_labels)
if (length(cnms_remove) > 0) x[, (cnms_remove) := NULL]
data.table::setcolorder(x, feature_labels)

return(x)
}


#' @rdname explain
#' @name explain
Expand Down Expand Up @@ -462,15 +432,3 @@ get_list_ctree_mincrit <- function(n_features, mincriterion) {
}
return(l)
}

#' @keywords internal
explainer_x_test <- function(x_test, feature_labels) {

# Remove variables that were not used for training
x <- data.table::as.data.table(x_test)
cnms_remove <- setdiff(colnames(x), feature_labels)
if (length(cnms_remove) > 0) x[, (cnms_remove) := NULL]
data.table::setcolorder(x, feature_labels)

return(as.matrix(x))
}
Loading