-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Causal and asymmetric Shapley values implementation #273
Conversation
… al.) Shapley values
…nto CauSHAPley
…mbinations with harmless option for which all combinations are returned
…al_ordering components
Thank you, @igbucur for taking your time to prepare this PR! I have looked at the code and iterated through the most important parts of the new code with some example data. While I have some minor comments, and it does indeed seem to work well 👍 Before we start discussing details, I do have a few broader question/comment: You have added the causal method as a new approach, which applies the practical implementation method from theorem 1 in your paper while assuming a Gaussian distribution for your data. Please do correct me if I am wrong, but I don't see any reason the method should be restricted to the Gaussian distribution. We have implemented a series of other approaches for estimating the conditional distributions, and it would be great if the user could use the causal method with any of these. Allowing that will require some changes in the main package, but from what I understand, it can be carried out by figuring out which conditional distributions that needs to be estimated, and in what order, and then simply looping over these different chain components -- adding new sampled columns iteratively, similarly to how you did with the Gaussian method. What do you think @igbucur ? Did I miss any details regarding this possibility? If you think it is doable, I could assist you in making the appropriate modifications in the core package. |
Thank you for the feedback @martinju . Yes, there should be no reason for the approach to be limited to the Gaussian distribution, and we could in principle use any of the approaches for estimating the conditional distributions for computing the causal Shapley values. I think it would be doable. Perhaps it would be better then to have a causal flag, used in a similar way to asymmetric, instead of treating it a separate approach in explain. The other approaches for which to implement causal Shapley values would then be "empirical", "copula", "ctree", and "independence"? |
Sounds good! Yes, I am thinking that whenever causal_ordering is not NULL, then the causal ordering is respected with the method specified under approach (gaussian, copula, ctree, empirical or independence). I think we have to think a bit regarding the best approach for implementing this. Ongoing work on implementing a "batch mode" allowing just parts of the subsets to be handled simultaneously (see #244 ) may also affect this a bit. In any case, I believe the best starting point would be to create a function which "figures out" which conditional distributions need to be computed based on the S-matrix (or X-matrix) created in the |
Thanks for the tip. Yeah, I think this makes sense, but I'll have to give some more thought on how to implement it. |
@igbucur Are you currently working on this? If so, let me know if you want to chat about how to go about it! |
@martinju Yes, I think I'm ready to give it a go. I had a look at how to tackle the proposed extension and here are my thoughts:
What do you think? Does this approach seem reasonable? |
Sounds good! I think that the function that we need is really the two functions say A(S,j) and B(S,j) which gives p(X_Sbar|X_S) = \prod_j p(X_A(S,j)|X_B(S,j)) for the specified causal ordering, and I believe it would be best to compute these within the shapr-function and store them in some object there which is then used in the explain/prepare_data by iteratively updating the data matrix to perform prediction on. In prepare data this could either be achieved by replacing the lapply-call as you write, or by modifying the sampling functions to actually doing iterative sampling. I am not sure which approach preferrable at the moment. Note that the empirical (+ independence) methods are not constructed as lapply around sampling functions, and ctree also requires an initial model fitting procedure. My main point is that it think construction of the "routine" needed for the specific iterative sampling should be created already in the shapr-function :-) |
Okay, I will think how it could be done in the I was thinking about encapsulating the causal ordering functionality either in a new custom |
Maybe I was unclear, but the function you talk about , taking features and causal ordering as inputs, is exactly what I was thinking about putting in the shapr function. :-) |
And let me know if you want me to out together a function like that is shapr. It should be rather straightforward, I think. |
This branch contains an implementation for computing causal and asymmetric Shapley values, based on the supplement code for the paper [1]. The code is adapted from the package CauSHAPley (https://gitlab.science.ru.nl/gbucur/caushapley/).
Asymmetric Shapley values were proposed in [2] as a way to incorporate causal knowledge in the real world by restricting the possible permutations of the features when computing the Shapley values to those consistent with a (partial) causal ordering.
Causal Shapley values were proposed in [1] as a way to explain the total effect of features on the prediction, taking into account their causal relationships, by adapting the sampling procedure in shapr.
The two ideas can be combined to obtain asymmetric causal Shapley values. For more details, see [1].
The branch adds the following functions for computing causal Shapley values:
The branch adds the following functionality for computing asymmetric Shapley values:
Finally, the function shapr gets two new parameters:
These parameters are saved in the explainer object returned by shapr, for which reason the known objects in the test suite have been updated. The branch also adds a number of basic tests for the new functionality.
References:
[1] Heskes, T., Sijben, E., Bucur, I. G., & Claassen, T. (2020). Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models. Advances in Neural Information Processing Systems, 33.
[2] Frye, C., Rowat, C., & Feige, I. (2020). Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability. Advances in Neural Information Processing Systems, 33.