You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When providing n_combinations to shapr, sampling of features combinations are performed unless a n_combinations is larger than 2^m (number of unique samples), for which all combinations are used instead. Currently, n_combinations specifies the number of samples being performed, not how many unique ones should be used in the end (and that is what matters from a memory/runtime perspective). A drawback of this is that if you have say 13 features and set n_combinations = NULL (to use exact) you get all 8192 unique combinations, while if you set n_combinations = 10000 (our recommendation for larger values of m), you get just 2000-3000 unique features.
Suggesting to change the behavior of n_combinations to represent the unique number of features to sample. This can be achieved by first sampling a larger number of combinations and then cut-off when reaching n_combinations unique combinations. To ensure we sample enough combinations, we should use a while loop to repeat this until n_combinations of unique samples are obtained.
While this could increase runtime for the shapr function a little bit, this part of the code is fast anyway.
The text was updated successfully, but these errors were encountered:
The functions feature_exact/feature_group and feature_not_exact / feature_group_not_exact (see #277) are very similar and reuse a lot of the same code. We could probably combine them into two functions with an additional group argument?
When providing
n_combinations
toshapr
, sampling of features combinations are performed unless an_combinations
is larger than2^m
(number of unique samples), for which all combinations are used instead. Currently,n_combinations
specifies the number of samples being performed, not how many unique ones should be used in the end (and that is what matters from a memory/runtime perspective). A drawback of this is that if you have say 13 features and setn_combinations = NULL
(to use exact) you get all 8192 unique combinations, while if you setn_combinations = 10000
(our recommendation for larger values ofm
), you get just 2000-3000 unique features.Suggesting to change the behavior of
n_combinations
to represent the unique number of features to sample. This can be achieved by first sampling a larger number of combinations and then cut-off when reachingn_combinations
unique combinations. To ensure we sample enough combinations, we should use a while loop to repeat this untiln_combinations
of unique samples are obtained.While this could increase runtime for the
shapr
function a little bit, this part of the code is fast anyway.The text was updated successfully, but these errors were encountered: