Releases: oegedijk/explainerdashboard
v0.3.4.1: fixes detailed shap plots bug when cats=None
Fixes dtreeviz 1.3 breaking change bug
Release Notes
Version 0.3.4:
Bug Fixes
- Fixes incompatibility bug with dtreeviz >= 1.3
- Fixes ExplainerHub dbc.Jumbotron style bug
Improvements
- raises ValueError when passing
shap='deep'
as it is not yet correctly supported
v0.3.3.1: minor bugfix with outliers and nan
Fixes a bug with removing outliers when nan's are present.
v0.3.3: better pipeline support and thread safety
Version 0.3.3:
Highlights:
- Adding support for cross validated metrics
- Better support for pipelines by using kernel explainer
- Making explainer threadsafe by adding locks
- Remove outliers from shap dependence plots
Breaking Changes
- parameter
permutation_cv
has been deprecated and replaced by parametercv
which
now also works to calculate cross-validated metrics besides cross-validated
permutation importances.
New Features
- metrics now get calculated with cross validation over
X
when you pass the
cv
parameter to the explainer, this is useful when for some reason you
want to pass the training set to the explainer. - adds winsorization to shap dependence and shap interaction plots
- If
shap='guess'
fails (unable to guess the right type of shap explainer),
then default to the model agnosticshap='kernel'
. - Better support for sklearn
Pipelines
: if not able to extract transformer+model,
then default toshap.KernelExplainer
to explain the entire pipeline - you can now remove outliers from shap dependence/interaction plots with
remove_outliers=True
: filters all outliers beyond 1.5*IQR
Bug Fixes
- Sets proper
threading.Locks
before making calls to shap explainer to prevent race
conditions with dashboards calling for shap values in multiple threads.
(shap is unfortunately not threadsafe)
Improvements
- single shap row KernelExplainer calculations now go without tqdm progress bar
- added cutoff tpr anf fpr to roc auc plot
- added cutoff precision and recall to pr auc plot
- put a loading spinner on shap contrib table
v0.3.2.2: more bugfixes
Version 0.3.2.2:
index_dropdown=False
now works for indexes not listed in set_index_list_func()
as long as it can be found by set_index_exists_func
New Features
- adds
set_index_exists_func
to add function that checks for index existing
besides those listed byset_index_list_func()
Bug Fixes
- bug fix to make
shap.KernelExplainer
(used with explainer parametershap='kernel'
)
work withRegressionExplainer
- bug fix when no explicit
labels
are passed with index selector - component only update if
explainer.index_exists()
: noIndexNotFoundErrors
anymore. - fixed title for regression index selector labeled 'Custom' bug
get_y()
now returns.item()
when necessary- removed ticks from confusion matrix plot when no
labels
param passed
(this bug got reintroduced in recent plotly release)
Improvements
- new helper function
get_shap_row(index)
to calculate or look up a single
row of shap values.
v0.3.2.1: add index_dropdown=False to regression dashboard
Bugfix: new index_dropdown=False
feature was not working correctly for regression dashboards
v0.3.2: custom metrics
Version 0.3.2:
Highlights:
- Control what metrics to show or use your own custom metrics using
show_metrics
- Set the naming for onehot features with all
0
s withcats_notencoded
- Speed up plots by displaying only a random sample of markers in scatter plots with
plot_sample
. - make index selection a free text field with
index_dropdown=False
New Features
- new parameter
show_metrics
for bothexplainer.metrics()
,ClassifierModelSummaryComponent
andRegressionModelSummaryComponent
:- pass a list of metrics and only display those metrics in that order
- you can also pass custom scoring functions as long as they
are of the formmetric_func(y_true, y_pred)
:show_metrics=[metric_func]
- For
ClassifierExplainer
what is passed to the custom metric function
depends on whether the function takes additional parameterscutoff
andpos_label
. If these are not arguments, theny_true=self.y_binary(pos_label)
andy_pred=np.where(self.pred_probas(pos_label)>cutoff, 1, 0)
.
Else the rawself.y
andself.pred_probas
are passed for the
custom metric function to do something with. - custom functions are also stored to
dashboard.yaml
and imported upon
loadingExplainerDashboard.from_config()
- For
- new parameter
cats_notencoded
: a dict to indicate how to name the value
of a onehotencoded features when all onehot columns equal 0. Defaults
to'NOT_ENCODED'
, but can be adjusted with this parameter. E.g.
cats_notencoded=dict(Deck="Deck not known")
. - new parameter
plot_sample
to only plot a random sample in the various
scatter plots. When you have a large dataset, this may significantly
speed up various plots without sacrificing much in expressiveness:
ExplainerDashboard(explainer, plot_sample=1000).run
- new parameter
index_dropdown=False
will replace the index dropdowns with a
free text field. This can be useful when you have a lot of potential indexes,
and the user is expected to know the index string.
Input will be checked for validity withexplainer.index_exists(index)
,
and field indicates when input index does not exist. If index does not exist,
will not be forwarded to other components, unless you also setindex_check=False
. - adds mean absolute percentage error to the regression metrics. If it is too
large a warning will be printed. Can be excluded with the newshow_metrics
parameter.
Bug Fixes
get_classification_df
added toClassificationComponent
dependencies.
Improvements
- accepting single column
pd.Dataframe
fory
, and automatically converting
it to apd.Series
- if WhatIf
FeatureInputComponent
detects the presence of missing onehot features
(i.e. rows where all columns of the onehotencoded feature equal 0), then
adds'NOT_ENCODED'
or the matching value fromcats_notencoded
to the
dropdown options. - Generating
name
for parameters forExplainerComponents
for which no
name is given is now done with a determinative process instead of a random
uuid
. This should help with scaling custom dashboards across cluster
deployments. Also dropsshortuuid
dependency. ExplainerDashboard
now prints out local ip address when starting dashboard.get_index_list()
is only called once upon starting dashboard.
v0.3.1: responsive classifier components
Version 0.3.1:
This version is mostly about pre-calculating and optimizing the classifier statistics
components. Those components should now be much more responsive with large datasets.
New Features
- new methods
roc_auc_curve(pos_label)
andpr_auc_curve(pos_label)
- new method
get_classification_df(...)
to get dataframe with number of labels
above and below a given cutoff.- this now gets used by
plot_classification(..)
- this now gets used by
- new method
confusion_matrix(cutoff, binary, pos_label)
- added parameters
sort_features
toFeatureInputComponent
:- defaults to
'shap'
: order features by mean absolute shap - if set to
'alphabet'
features are sorted alphabetically
- defaults to
- added parameter
fill_row_first
toFeatureInputComponent
:- defaults to
True
: fill first row first, then next row, etc - if False: fill first column first, then second column, etc
- defaults to
Bug Fixes
- categorical mappings now updateable with pandas<=1.2 and python==3.6
- title now overridable for
RegressionRandomIndexComponent
- added assert check on
summary_type
forShapSummaryComponent
Improvements
- pre-Calculating lift_curve_df only once and then storing for each pos_label
- plus: storing only 100 evenly spaced rows of lift_curve_df
- dashboard should be more responsive for large datasets
- pre-calculating roc_auc_curve and pr_auc_curve
- dashboard should be more responsive for large datasets
- pre-calculating confusion matrices
- dashboard should be more responsive for large datasets
- pre-calculating classification_dfs
- dashboard should be more responsive for large datasets
- confusion matrix: added axis title, moved predicted labels to bottom of graph
- precision plot component: when only adjusting cutoff, simply updating the cutoff
line, without recalculating the plot.
v0.3.0.1: dependency fixes
version 0.3.0.1:
Some of the new features of version 0.3
only work with pandas>=1.2
, which is not available for python 3.6
.
Breaking Changes
- new dependency requirements
pandas>=1.2
also impliespython>=3.7
Bug Fixes
- updates
pandas
version to be compatible with categorical feature operations - updates dtreeviz version to make
xgboost
andpyspark
dependencies optional
v0.3.0: reducing memory footprint
Version 0.3.0:
This is a major release and comes with lots of breaking changes to the lower level
ClassifierExplainer
and RegressionExplainer
API. The higherlevel ExplainerComponent
and ExplainerDashboard
API has not been
changed however, except for the deprecation of the cats
and hide_cats
parameters.
Explainers generated with version explainerdashboard <= 0.2.20.1
will not work
with this version! So if you have stored explainers to disk you either have to
rebuild them with this new version, or downgrade back to explainerdashboard==0.2.20.1
!
(hope you pinned your dependencies in production! ;)
Main motivation for these breaking changes was to improve memory usage of the
dashboards, especially in production. This lead to the deprecation of the
dual cats grouped/not grouped functionality of the dashboard. Once I had committed
to that breaking change, I decided to clean up the entire API and do all the
needed breaking changes at once.
Breaking Changes
-
onehot encoded features (passed with the
cats
parameter) are now merged by default. This means that thecats=True
parameter has been removed from all explainer methods, and thegroup cats
toggle has been removed from allExplainerComponents
. This saves both
on code complexity and memory usage. If you wish to see the see the individual
contributions of onehot encoded columns, simply don't pass them to the
cats
parameter upon construction. -
Deprecated explainer attributes:
BaseExplainer
:shap_values_cats
shap_interaction_values_cats
permutation_importances_cats
get_dfs()
formatted_contrib_df()
to_sql()
check_cats()
equivalent_col
ClassifierExplainer
:get_prop_for_label
-
Naming changes to attributes:
BaseExplainer
:importances_df()
->get_importances_df()
feature_permutations_df()
->get_feature_permutations_df()
get_int_idx(index)
->get_idx(index)
importances_df()
->get_importances_df()
contrib_df()
->get_contrib_df()
*contrib_summary_df()
->self.get_summary_contrib_df()
*interaction_df()
->get_interactions_df()
*shap_values
->get_shap_values_df
plot_shap_contributions()
->plot_contributions()
plot_shap_summary()
->plot_importances_detailed()
plot_shap_dependence()
->plot_dependence()
plot_shap_interaction()
->plot_interaction()
plot_shap_interaction_summary()
->plot_interactions_detailed()
plot_interactions()
->plot_interactions_importance()
n_features()
->n_features
shap_top_interaction()
->top_shap_interactions
shap_interaction_values_by_col()
->shap_interactions_values_for_col()
ClassifierExplainer
:self.pred_probas
->self.pred_probas()
precision_df()
->get_precision_df()
*lift_curve_df()
->get_liftcurve_df()
*
RandomForestExplainer
/XGBExplainer
:decision_trees
->shadow_trees
decisiontree_df()
->get_decisionpath_df()
decisiontree_summary_df()
->get_decisionpath_summary_df()
decision_path_file()
->decisiontree_file()
decision_path()
->decisiontree()
decision_path_encoded()
->decisiontree_encoded()
New Features
- new
Explainer
parameterprecision
: defaults to'float64'
. Can be set to
'float32'
to save on memory usage:ClassifierExplainer(model, X, y, precision='float32')
- new
memory_usage()
method to show which internal attributes take the most memory. - for multiclass classifiers:
keep_shap_pos_label_only(pos_label)
method:- drops shap values and shap interactions for all labels except
pos_label
- this should significantly reduce memory usage for multi class classification
models. - not needed for binary classifiers.
- drops shap values and shap interactions for all labels except
- added
get_index_list()
,get_X_row(index)
, andget_y(index)
methods.- these can be overridden with
.set_index_list_func()
,.set_X_row_func()
and.set_y_func()
. - by overriding these functions you can for example sample observations
from a database or other external storage instead of fromX_test
,y_test
.
- these can be overridden with
- added
Popout
buttons to all the major graphs that open a large modal
showing just the graph. This makes it easier to focus on a particular
graph without distraction from the rest of the dashboard and all it's toggles. - added
max_cat_colors
parameters toplot_importance_detailed
andplot_dependence
andplot_interactions_detailed
- prevents plotting getting slow with categorical features with many categories.
- defaults to
5
- can be set as
**kwarg
toExplainerDashboard
- adds category limits and sorting to
RegressionVsCol
component - adds property
X_merged
that gives a dataframe with the onehot columns merged.
Bug Fixes
- shap dependence: when no point cloud, do not highlight!
- Fixed bug with calculating contributions plot/table for whatif component,
when InputFeatures had not fully loaded, resulting in shap error.
Improvements
- saving
X.copy()
, instead of using a reference toX
- this would result in more memory usage in development
though, so you candel X_test
to save memory.
- this would result in more memory usage in development
ClassifierExplainer
only stores shap (interaction) values for the positive
class: shap values for the negative class are generated on the fly
by multiplying with-1
.- encoding onehot columns as
np.int8
saving memory usage - encoding categorical features as
pd.category
saving memory usage - added base
TreeExplainer
class thatRandomForestExplainer
andXGBExplainer
both derive from- will make it easier to extend tree explainers to other models in the future
- e.g. catboost and lightgbm
- will make it easier to extend tree explainers to other models in the future
- got rid of the callable properties (that were their to assure backward compatibility),
and replaced them with regular methods.