Skip to content

Commit

Permalink
Merge pull request #72 from oegedijk/dev
Browse files Browse the repository at this point in the history
Dev: version 0.3
  • Loading branch information
oegedijk authored Jan 27, 2021
2 parents e1e6254 + 056c80e commit f58767a
Show file tree
Hide file tree
Showing 46 changed files with 58,526 additions and 99,783 deletions.
7 changes: 3 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,11 +129,9 @@ dmypy.json

# Pyre type checker
.pyre/
catboost_info/learn_error.tsv
catboost_info/time_left.tsv
catboost_info/learn/events.out.tfevents
.vscode/settings.json
catboost_info/*
.vscode/settings.json

scratch_notebook.ipynb
scratch_import.py
show_and_tell_draft.md
Expand Down Expand Up @@ -166,4 +164,5 @@ dashboard1.yaml
dashboard2.yaml
users.yaml
users.json
store_test.csv

20 changes: 0 additions & 20 deletions .vscode/settings.json

This file was deleted.

53 changes: 43 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ a single [ExplainerHub](https://explainerdashboard.readthedocs.io/en/latest/hub.

Examples deployed at: [titanicexplainer.herokuapp.com](http://titanicexplainer.herokuapp.com),
detailed documentation at [explainerdashboard.readthedocs.io](http://explainerdashboard.readthedocs.io),
example notebook on how to launch dashboard for different models [here](https://github.com/oegedijk/explainerdashboard/blob/master/dashboard_examples.ipynb), and an example notebook on how to interact with the explainer object [here](https://github.com/oegedijk/explainerdashboard/blob/master/explainer_examples.ipynb).
example notebook on how to launch dashboard for different models [here](notebooks/dashboard_examples.ipynb), and an example notebook on how to interact with the explainer object [here](notebooks/explainer_examples.ipynb).

Works with `scikit-learn`, `xgboost`, `catboost`, `lightgbm` and others.

Expand Down Expand Up @@ -300,7 +300,6 @@ cats toggle will be hidden on every component that has one:
```python
ExplainerDashboard(explainer,
no_permutations=True, # do not show or calculate permutation importances
hide_cats=True, # hide the group cats toggles
hide_depth=True, # hide the depth (no of features) dropdown
hide_sort=True, # hide sort type dropdown in contributions graph/table
hide_orientation=True, # hide orientation dropdown in contributions graph/table
Expand Down Expand Up @@ -336,9 +335,9 @@ ExplainerDashboard(explainer,
col='Fare', # initial feature in shap graphs
color_col='Age', # color feature in shap dependence graph
interact_col='Age', # interaction feature in shap interaction
cats=False, # do not group categorical onehot features
depth=5, # only show top 5 features
sort = 'low-to-high', # sort features from lowest shap to highest in contributions graph/table
cats_topx=3, # show only the top 3 categories for categorical features
cats_sort='alphabet', # short categorical features alphabetically
orientation='horizontal', # horizontal bars in contributions graph
index='Rugg, Miss. Emily', # initial index to display
Expand All @@ -364,12 +363,12 @@ a few toggles:
from explainerdashboard.custom import *

class CustomDashboard(ExplainerComponent):
def __init__(self, explainer, **kwargs):
def __init__(self, explainer, name=None):
super().__init__(explainer, title="Custom Dashboard")
self.confusion = ConfusionMatrixComponent(explainer,
self.confusion = ConfusionMatrixComponent(explainer, name=self.name+"cm",
hide_selector=True, hide_percentage=True,
cutoff=0.75)
self.contrib = ShapContributionsGraphComponent(explainer,
self.contrib = ShapContributionsGraphComponent(explainer, name=self.name+"contrib",
hide_selector=True, hide_cats=True,
hide_depth=True, hide_sort=True,
index='Rugg, Miss. Emily')
Expand Down Expand Up @@ -452,17 +451,51 @@ or with waitress (also works on Windows):
$ waitress-serve dashboard:app
```


### Minimizing memory usage

When you deploy a dashboard with a dataset with a large number of rows (`n`) and columns (`m`),
the memory usage of the dashboard can be substantial. You can check the (approximate)
memory usage with `explainer.memory_usage()`. In order to reduce the memory
footprint there are a number of things you can do:

1. Not including shap interaction tab: shap interaction values are shape (`n*m*m`),
so can take a subtantial amount of memory.
2. Setting a lower precision. By default shap values are stored as `'float64'`,
but you can store them as `'float32'` instead and save half the space:
```ClassifierExplainer(model, X_test, y_test, precision='float32')```. You
can also set a lower precision on your `X_test` dataset yourself ofcourse.
3. For multi class classifier, by default `ClassifierExplainer` calculates
shap values for all classes. If you're only interested in a single class
you can drop the other shap values: `explainer.keep_shap_pos_label_only(pos_label)`
4. Storing data externally. You can for example only store a subset of 10.000 rows in
the explainer itself (enough to generate importance and dependence plots),
and store the rest of your millions of rows of input data in an external file
or database:
- with `explainer.set_X_row_func()` you can set a function that takes
and `index` as argument and returns a single row dataframe with model
compatible input data for that index. This function can include a query
to a database or fileread.
- with `explainer.set_y_func()` you can set a function that takes
and `index` as argument and returns the observed outcome `y` for
that index.
- with `explainer.set_index_list_func()` you can set a function
that returns a list of available indexes that can be queried.

Important: these function can be called multiple times by multiple independent
components, so probably best to implement some kind of caching functionality.
The functions you pass can be also methods, so you have access to all of the
internals of the explainer.


## Documentation

Documentation can be found at [explainerdashboard.readthedocs.io](https://explainerdashboard.readthedocs.io/en/latest/).

Example notebook on how to launch dashboards for different model types here: [dashboard_examples.ipynb](https://github.com/oegedijk/explainerdashboard/blob/master/dashboard_examples.ipynb).
Example notebook on how to launch dashboards for different model types here: [dashboard_examples.ipynb](notebooks/dashboard_examples.ipynb).

Example notebook on how to interact with the explainer object here: [explainer_examples.ipynb](https://github.com/oegedijk/explainerdashboard/blob/master/explainer_examples.ipynb).
Example notebook on how to interact with the explainer object here: [explainer_examples.ipynb](notebooks/explainer_examples.ipynb).

Example notebook on how to design a custom dashboard: [custom_examples.ipynb](https://github.com/oegedijk/explainerdashboard/blob/master/custom_examples.ipynb).
Example notebook on how to design a custom dashboard: [custom_examples.ipynb](notebooks/custom_examples.ipynb).



Expand Down
117 changes: 117 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,122 @@
# Release Notes


## 0.3.0:
This is a major release and comes with lots of breaking changes to the lower level
`ClassifierExplainer` and `RegressionExplainer` API. The higherlevel `ExplainerComponent` and `ExplainerDashboard` API has not been
changed however, except for the deprecation of the `cats` and `hide_cats` parameters.

Explainers generated with version `explainerdashboard <= 0.2.20.1` will not work
with this version, so if you have stored explainers to disk you either have to
rebuild them with this new version, or downgrade back to `explainerdashboard==0.2.20.1`!
(hope you pinned your dependencies in production! ;)

Main motivation for these breaking changes was to improve memory usage of the
dashboards, especially in production. This lead to the deprecation of the
dual cats grouped/not grouped functionality of the dashboard. Once I had committed
to that breaking change, I decided to clean up the entire API and do all the
needed breaking changes at once.


### Breaking Changes
- onehot encoded features are now merged by default. This means that the `cats=True`
parameter has been removed from all explainer methods, and the `group cats`
toggle has been removed from all `ExplainerComponents`. This saves both
on code complexity and memory usage. If you wish to see the see the individual
contributions of onehot encoded columns, simply don't pass them to the
`cats` parameter upon construction.
- Deprecated explainer attributes:
- `BaseExplainer`:
- `self.shap_values_cats`
- `self.shap_interaction_values_cats`
- `permutation_importances_cats`
- `self.get_dfs()`
- `formatted_contrib_df()`
- `self.to_sql()`
- `self.check_cats()`
- `equivalent_col`
- `ClassifierExplainer`:
- `get_prop_for_label`

- Naming changes to attributes:
- `BaseExplainer`:
- `importances_df()` -> `get_importances_df()`
- `feature_permutations_df()` -> `get_feature_permutations_df()`
- `get_int_idx(index)` -> `get_idx(index)`
- `importances_df()` -> `get_importances_df()`
- `contrib_df()` -> `get_contrib_df()` *
- `contrib_summary_df()` -> `self.get_summary_contrib_df()` *
- `interaction_df()` -> `get_interactions_df()` *
- `shap_values` -> `get_shap_values_df`
- `plot_shap_contributions()` -> `plot_contributions()`
- `plot_shap_summary()` -> `plot_importances_detailed()`
- `plot_shap_dependence()` -> `plot_dependence()`
- `plot_shap_interaction()` -> `plot_interaction()`
- `plot_shap_interaction_summary()` -> `plot_interactions_detailed()`
- `plot_interactions()` -> `plot_interactions_importance()`
- `n_features()` -> `n_features`
- `shap_top_interaction()` -> `top_shap_interactions`
- `shap_interaction_values_by_col()` -> `shap_interactions_values_for_col()`
- `ClassifierExplainer`:
- `self.pred_probas` -> `self.pred_probas()`
- `precision_df()` -> `get_precision_df()` *
- `lift_curve_df()` -> `get_liftcurve_df()` *
- `RandomForestExplainer`/`XGBExplainer`:
- `decision_trees` -> `shadow_trees`
- `decisiontree_df()` -> `get_decisionpath_df()`
- `decisiontree_summary_df()` -> `get_decisionpath_summary_df()`
- `decision_path_file()` -> `decisiontree_file()`
- `decision_path()` -> `decisiontree()`
- `decision_path_encoded()` -> `decisiontree_encoded()`

### New Features
- new `Explainer` parameter `precision`: defaults to `'float64'`. Can be set to
`'float32'` to save on memory usage: `ClassifierExplainer(model, X, y, precision='float32')`
- new `memory_usage()` method to show which internal attributes take the most memory.
- for multiclass classifiers: `keep_shap_pos_label_only(pos_label)` method:
- drops shap values and shap interactions for all labels except `pos_label`
- this should significantly reduce memory usage for multi class classification
models.
- not needed for binary classifiers.
- added `get_index_list()`, `get_X_row(index)`, and `get_y(index)` methods.
- these can be overridden with `.set_index_list_func()`, `.set_X_row_func()`
and `.set_y_func()`.
- by overriding these functions you can for example sample observations
from a database or other external storage instead of from `X_test`, `y_test`.
- added `Popout` buttons to all the major graphs that open a large modal
showing just the graph. This makes it easier to focus on a particular
graph without distraction from the rest of the dashboard and all it's toggles.
- added `max_cat_colors` parameters to `plot_importance_detailed` and `plot_dependence` and `plot_interactions_detailed`
- prevents plotting getting slow with categorical features with many categories.
- defaults to `5`
- can be set as `**kwarg` to `ExplainerDashboard`
- adds category limits and sorting to `RegressionVsCol` component
- adds property `X_merged` that gives a dataframe with the onehot columns merged.

### Bug Fixes
- shap dependence: when no point cloud, do not highlight!
- Fixed bug with calculating contributions plot/table for whatif component,
when InputFeatures had not fully loaded, resulting in shap error.

### Improvements
- saving `X.copy()`, instead of using a reference to `X`
- this would result in more memory usage in development
though, so you can `del X_test` to save memory.
- `ClassifierExplainer` only stores shap (interaction) values for the positive
class: shap values for the negative class are generated on the fly
by multiplying with `-1`.
- encoding onehot columns as `np.int8` saving memory usage
- encoding categorical features as `pd.category` saving memory usage
- added base `TreeExplainer` class that `RandomForestExplainer` and `XGBExplainer` both derive from
- will make it easier to extend tree explainers to other models in the future
- e.g. catboost and lightgbm
- got rid of the callable properties (that were their to assure backward compatibility),
and replaced them with regular methods.

### Other Changes
-
-

## 0.2.20.1:


Expand Down
34 changes: 8 additions & 26 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@

# TODO:

## Bugs:
- dash contributions reload bug: Exception: Additivity check failed in TreeExplainer!
- shap dependence: when no point cloud, do not highlight!

## Layout:
- Find a proper frontender to help :)
## Version 0.3:
- check InlineExplainer

## dfs:
- wrap shap values in pd.DataFrames?
- wrap predictions in pd.Series?
## Bugs:

## Plots:
- make plot background transparent?
Expand All @@ -21,10 +15,6 @@
- https://community.plotly.com/t/announcing-plotly-py-4-12-horizontal-and-vertical-lines-and-rectangles/46783
- add some of these:
https://towardsdatascience.com/introducing-shap-decision-plots-52ed3b4a1cba
- shap dependence plot, sort categorical features by:
- alphabet
- number of obs
- mean abs shap

### Classifier plots:
- move predicted and actual to outer layer of ConfusionMatrixComponent
Expand All @@ -36,23 +26,15 @@
### Regression plots:



## Explainers:
- add get_X_row() and get_index_list() methods, and implement it throughout the dashboard.
- minimize pd.DataFrame and np.array size:
- astype(float16), pd.category, etc
- pass n_jobs to pdp_isolate
- add option drop non-cats
- add ExtraTrees and GradientBoostingClassifier to tree visualizers
- add plain language explanations
- could add an parameter to the` explainer.plot_*` function `in_words=True` in which
case instead of a plot the function returns a verbal description of the
relationship in the plot.
- Then add an "in words" button to the components, that show a popup with
the verbal explanation.
- rename RandomForestExplainer and XGBExplainer methods into something more logical
- Breaking change!


## notebooks:

Expand Down Expand Up @@ -85,16 +67,13 @@
- add pos_label_name property to PosLabelConnector search
- add "number of indexes" indicator to RandomIndexComponents for current restrictions
- set equivalent_col when toggling cats in dependence/interactions

- add width/height to components
- whatif:
- Add a constraints function to whatif component:
- tests if current feature input is allowed
- gives specific feedback when constraint broken
- could build WhatIfComponentException for this?
- Add sliders option to what if component


## Methods:
- add support for SamplingExplainer, PartitionExplainer, PermutationExplainer, AdditiveExplainer
- add support for LimeTabularExplainer:
Expand All @@ -110,13 +89,17 @@
- write tests for explainer_plots

## Docs:
- add memory savings to docs:
- memory_usage()
- keep_shap_pos_label_only()
- set_X_row_func, etc
- add cats_topx cats_sort to docs
- add hide_wizard and wizard to docs
- add hide_poweredby to docs
- add Docker deploy example (from issue)
- document register_components no longer necessary
- add new whatif parameters to README and docs
- add section to README on storing and loading explainer/dashboard from file/config
- add section to docs and README on storing and loading explainer/dashboard from file/config

- retake screenshots of components as cards
- Add type hints:
Expand All @@ -130,7 +113,6 @@
## Library level:
- Make example heroku deployment repo
- Make example heroku ExplainerHub repo
- hide (prefix '_') to non-public API class methods
- submit pull request to shap with broken test for
https://github.com/slundberg/shap/issues/723

7 changes: 4 additions & 3 deletions docs/source/cli.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
``explainerdashboard`` CLI
**************************
explainerdashboard CLI
**********************

The library comes with a ``explainerdashboard`` command line tool (CLI) that
you can use to build and run explainerdashboards from your terminal.
Expand All @@ -23,7 +23,8 @@ from the command line by running::

$ explainerdashboard run explainer.joblib

Or to run on specific port, not launch a browser or show help::
The CLI uses the ``waitress`` web server by default to run your dashboard.
To run on a specific port, not launch a browser or show help::

$ explainerdashboard run explainer.joblib --port 8051
$ explainerdashboard run explainer.joblib --no-browser
Expand Down
Loading

0 comments on commit f58767a

Please sign in to comment.