parameter permutation_cv has been deprecated and replaced by parameter cv which
now also works to calculate cross-validated metrics besides cross-validated
permutation importances.

New Features

metrics now get calculated with cross validation over X when you pass the
cv parameter to the explainer, this is useful when for some reason you
want to pass the training set to the explainer.
adds winsorization to shap dependence and shap interaction plots
If shap='guess' fails (unable to guess the right type of shap explainer),
then default to the model agnostic shap='kernel'.
Better support for sklearn Pipelines: if not able to extract transformer+model,
then default to shap.KernelExplainer to explain the entire pipeline
you can now remove outliers from shap dependence/interaction plots with
remove_outliers=True: filters all outliers beyond 1.5*IQR

Bug Fixes

Sets proper threading.Locks before making calls to shap explainer to prevent race
conditions with dashboards calling for shap values in multiple threads.
(shap is unfortunately not threadsafe)

Improvements

single shap row KernelExplainer calculations now go without tqdm progress bar
added cutoff tpr anf fpr to roc auc plot
added cutoff precision and recall to pr auc plot
put a loading spinner on shap contrib table

Assets 2

03 Mar 19:28

oegedijk

v0.3.2.2

dfc4b5a

v0.3.2.2: more bugfixes

Version 0.3.2.2:

index_dropdown=False now works for indexes not listed in set_index_list_func()
as long as it can be found by set_index_exists_func

New Features

adds set_index_exists_func to add function that checks for index existing
besides those listed by set_index_list_func()

Bug Fixes

bug fix to make shap.KernelExplainer (used with explainer parametershap='kernel')
work with RegressionExplainer
bug fix when no explicit labels are passed with index selector
component only update if explainer.index_exists(): no IndexNotFoundErrors anymore.
fixed title for regression index selector labeled 'Custom' bug
get_y() now returns .item() when necessary
removed ticks from confusion matrix plot when no labels param passed
(this bug got reintroduced in recent plotly release)

Improvements

new helper function get_shap_row(index) to calculate or look up a single
row of shap values.

Assets 2

26 Feb 19:06

oegedijk

v0.3.2.1

05dfa18

v0.3.2.1: add index_dropdown=False to regression dashboard

Bugfix: new index_dropdown=False feature was not working correctly for regression dashboards

Assets 2

25 Feb 19:50

oegedijk

v0.3.2

3884b8e

v0.3.2: custom metrics

Version 0.3.2:

Highlights:

Control what metrics to show or use your own custom metrics using show_metrics
Set the naming for onehot features with all 0s with cats_notencoded
Speed up plots by displaying only a random sample of markers in scatter plots with plot_sample.
make index selection a free text field with index_dropdown=False

New Features

new parameter show_metrics for both explainer.metrics(), ClassifierModelSummaryComponent
and RegressionModelSummaryComponent:
- pass a list of metrics and only display those metrics in that order
- you can also pass custom scoring functions as long as they
  are of the form metric_func(y_true, y_pred): show_metrics=[metric_func]
  - For ClassifierExplainer what is passed to the custom metric function
    depends on whether the function takes additional parameters cutoff
    and pos_label. If these are not arguments, then y_true=self.y_binary(pos_label)
    and y_pred=np.where(self.pred_probas(pos_label)>cutoff, 1, 0).
    Else the raw self.y and self.pred_probas are passed for the
    custom metric function to do something with.
  - custom functions are also stored to dashboard.yaml and imported upon
    loading ExplainerDashboard.from_config()
new parameter cats_notencoded: a dict to indicate how to name the value
of a onehotencoded features when all onehot columns equal 0. Defaults
to 'NOT_ENCODED', but can be adjusted with this parameter. E.g.
cats_notencoded=dict(Deck="Deck not known").
new parameter plot_sample to only plot a random sample in the various
scatter plots. When you have a large dataset, this may significantly
speed up various plots without sacrificing much in expressiveness:
ExplainerDashboard(explainer, plot_sample=1000).run
new parameter index_dropdown=False will replace the index dropdowns with a
free text field. This can be useful when you have a lot of potential indexes,
and the user is expected to know the index string.
Input will be checked for validity with explainer.index_exists(index),
and field indicates when input index does not exist. If index does not exist,
will not be forwarded to other components, unless you also set index_check=False.
adds mean absolute percentage error to the regression metrics. If it is too
large a warning will be printed. Can be excluded with the new show_metrics
parameter.

Bug Fixes

get_classification_df added to ClassificationComponent dependencies.

Improvements

accepting single column pd.Dataframe for y, and automatically converting
it to a pd.Series
if WhatIf FeatureInputComponent detects the presence of missing onehot features
(i.e. rows where all columns of the onehotencoded feature equal 0), then
adds 'NOT_ENCODED' or the matching value from cats_notencoded to the
dropdown options.
Generating name for parameters for ExplainerComponents for which no
name is given is now done with a determinative process instead of a random
uuid. This should help with scaling custom dashboards across cluster
deployments. Also drops shortuuid dependency.
ExplainerDashboard now prints out local ip address when starting dashboard.
get_index_list() is only called once upon starting dashboard.

Assets 2

31 Jan 13:46

oegedijk

v0.3.1

2895948

v0.3.1: responsive classifier components

Version 0.3.1:

This version is mostly about pre-calculating and optimizing the classifier statistics
components. Those components should now be much more responsive with large datasets.

New Features

new methods roc_auc_curve(pos_label) and pr_auc_curve(pos_label)
new method get_classification_df(...) to get dataframe with number of labels
above and below a given cutoff.
- this now gets used by plot_classification(..)
new method confusion_matrix(cutoff, binary, pos_label)
added parameters sort_features to FeatureInputComponent:
- defaults to 'shap': order features by mean absolute shap
- if set to 'alphabet' features are sorted alphabetically
added parameter fill_row_first to FeatureInputComponent:
- defaults to True: fill first row first, then next row, etc
- if False: fill first column first, then second column, etc

Bug Fixes

categorical mappings now updateable with pandas<=1.2 and python==3.6
title now overridable for RegressionRandomIndexComponent
added assert check on summary_type for ShapSummaryComponent

Improvements

pre-Calculating lift_curve_df only once and then storing for each pos_label
- plus: storing only 100 evenly spaced rows of lift_curve_df
- dashboard should be more responsive for large datasets
pre-calculating roc_auc_curve and pr_auc_curve
- dashboard should be more responsive for large datasets
pre-calculating confusion matrices
- dashboard should be more responsive for large datasets
pre-calculating classification_dfs
- dashboard should be more responsive for large datasets
confusion matrix: added axis title, moved predicted labels to bottom of graph
precision plot component: when only adjusting cutoff, simply updating the cutoff
line, without recalculating the plot.

Assets 2

27 Jan 15:02

oegedijk

v0.3.0.1

304918b

v0.3.0.1: dependency fixes

version 0.3.0.1:

Some of the new features of version 0.3 only work with pandas>=1.2, which is not available for python 3.6.

Breaking Changes

new dependency requirements pandas>=1.2 also implies python>=3.7

Bug Fixes

updates pandas version to be compatible with categorical feature operations
updates dtreeviz version to make xgboost and pyspark dependencies optional

Assets 2

27 Jan 13:49

oegedijk

v0.3.0

f58767a

v0.3.0: reducing memory footprint

Version 0.3.0:

This is a major release and comes with lots of breaking changes to the lower level
ClassifierExplainer and RegressionExplainer API. The higherlevel ExplainerComponent and ExplainerDashboard API has not been
changed however, except for the deprecation of the cats and hide_cats parameters.

Explainers generated with version explainerdashboard <= 0.2.20.1 will not work
with this version! So if you have stored explainers to disk you either have to
rebuild them with this new version, or downgrade back to explainerdashboard==0.2.20.1!
(hope you pinned your dependencies in production! ;)

Main motivation for these breaking changes was to improve memory usage of the
dashboards, especially in production. This lead to the deprecation of the
dual cats grouped/not grouped functionality of the dashboard. Once I had committed
to that breaking change, I decided to clean up the entire API and do all the
needed breaking changes at once.

Breaking Changes

onehot encoded features (passed with the cats parameter) are now merged by default. This means that the cats=True
parameter has been removed from all explainer methods, and the group cats
toggle has been removed from all ExplainerComponents. This saves both
on code complexity and memory usage. If you wish to see the see the individual
contributions of onehot encoded columns, simply don't pass them to the
cats parameter upon construction.
Deprecated explainer attributes:
- BaseExplainer:
  - shap_values_cats
  - shap_interaction_values_cats
  - permutation_importances_cats
  - get_dfs()
  - formatted_contrib_df()
  - to_sql()
  - check_cats()
  - equivalent_col
- ClassifierExplainer:
  - get_prop_for_label
Naming changes to attributes:
- BaseExplainer:
  - importances_df() -> get_importances_df()
  - feature_permutations_df() -> get_feature_permutations_df()
  - get_int_idx(index) -> get_idx(index)
  - importances_df() -> get_importances_df()
  - contrib_df() -> get_contrib_df() *
  - contrib_summary_df() -> self.get_summary_contrib_df() *
  - interaction_df() -> get_interactions_df() *
  - shap_values -> get_shap_values_df
  - plot_shap_contributions() -> plot_contributions()
  - plot_shap_summary() -> plot_importances_detailed()
  - plot_shap_dependence() -> plot_dependence()
  - plot_shap_interaction() -> plot_interaction()
  - plot_shap_interaction_summary() -> plot_interactions_detailed()
  - plot_interactions() -> plot_interactions_importance()
  - n_features() -> n_features
  - shap_top_interaction() -> top_shap_interactions
  - shap_interaction_values_by_col() -> shap_interactions_values_for_col()
- ClassifierExplainer:
  - self.pred_probas -> self.pred_probas()
  - precision_df() -> get_precision_df() *
  - lift_curve_df() -> get_liftcurve_df() *
- RandomForestExplainer/XGBExplainer:
  - decision_trees -> shadow_trees
  - decisiontree_df() -> get_decisionpath_df()
  - decisiontree_summary_df() -> get_decisionpath_summary_df()
  - decision_path_file() -> decisiontree_file()
  - decision_path() -> decisiontree()
  - decision_path_encoded() -> decisiontree_encoded()

New Features

new Explainer parameter precision: defaults to 'float64'. Can be set to
'float32' to save on memory usage: ClassifierExplainer(model, X, y, precision='float32')
new memory_usage() method to show which internal attributes take the most memory.
for multiclass classifiers: keep_shap_pos_label_only(pos_label) method:
- drops shap values and shap interactions for all labels except pos_label
- this should significantly reduce memory usage for multi class classification
  models.
- not needed for binary classifiers.
added get_index_list(), get_X_row(index), and get_y(index) methods.
- these can be overridden with .set_index_list_func(), .set_X_row_func()
  and .set_y_func().
- by overriding these functions you can for example sample observations
  from a database or other external storage instead of from X_test, y_test.
added Popout buttons to all the major graphs that open a large modal
showing just the graph. This makes it easier to focus on a particular
graph without distraction from the rest of the dashboard and all it's toggles.
added max_cat_colors parameters to plot_importance_detailed and plot_dependence and plot_interactions_detailed
- prevents plotting getting slow with categorical features with many categories.
- defaults to 5
- can be set as **kwarg to ExplainerDashboard
adds category limits and sorting to RegressionVsCol component
adds property X_merged that gives a dataframe with the onehot columns merged.

Bug Fixes

shap dependence: when no point cloud, do not highlight!
Fixed bug with calculating contributions plot/table for whatif component,
when InputFeatures had not fully loaded, resulting in shap error.

Improvements

saving X.copy(), instead of using a reference to X
- this would result in more memory usage in development
  though, so you can del X_test to save memory.
ClassifierExplainer only stores shap (interaction) values for the positive
class: shap values for the negative class are generated on the fly
by multiplying with -1.
encoding onehot columns as np.int8 saving memory usage
encoding categorical features as pd.category saving memory usage
added base TreeExplainer class that RandomForestExplainer and XGBExplainer both derive from
- will make it easier to extend tree explainers to other models in the future
  - e.g. catboost and lightgbm
got rid of the callable properties (that were their to assure backward compatibility),
and replaced them with regular methods.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Notes

Version 0.3.4:

Bug Fixes

Improvements

Version 0.3.3:

Breaking Changes

New Features

Bug Fixes

Improvements

Version 0.3.2.2:

New Features

Bug Fixes

Improvements

Version 0.3.2:

New Features

Bug Fixes

Improvements

Version 0.3.1:

New Features

Bug Fixes

Improvements

version 0.3.0.1:

Breaking Changes

Bug Fixes

Version 0.3.0:

Breaking Changes

New Features

Bug Fixes

Improvements

Releases: oegedijk/explainerdashboard

v0.3.4.1: fixes detailed shap plots bug when cats=None

Fixes dtreeviz 1.3 breaking change bug

Release Notes

Version 0.3.4:

Bug Fixes

Improvements

v0.3.3.1: minor bugfix with outliers and nan

v0.3.3: better pipeline support and thread safety

Version 0.3.3:

Breaking Changes

New Features

Bug Fixes

Improvements

v0.3.2.2: more bugfixes

Version 0.3.2.2:

New Features

Bug Fixes

Improvements

v0.3.2.1: add index_dropdown=False to regression dashboard

v0.3.2: custom metrics

Version 0.3.2:

New Features

Bug Fixes

Improvements

v0.3.1: responsive classifier components

Version 0.3.1:

New Features

Bug Fixes

Improvements

v0.3.0.1: dependency fixes

version 0.3.0.1:

Breaking Changes

Bug Fixes

v0.3.0: reducing memory footprint

Version 0.3.0:

Breaking Changes

New Features

Bug Fixes

Improvements