Implement unique sampling in feature_not_exact #227

martinju · 2020-08-14T08:45:43Z

When providing n_combinations to shapr, sampling of features combinations are performed unless a n_combinations is larger than 2^m (number of unique samples), for which all combinations are used instead. Currently, n_combinations specifies the number of samples being performed, not how many unique ones should be used in the end (and that is what matters from a memory/runtime perspective). A drawback of this is that if you have say 13 features and set n_combinations = NULL (to use exact) you get all 8192 unique combinations, while if you set n_combinations = 10000 (our recommendation for larger values of m), you get just 2000-3000 unique features.

Suggesting to change the behavior of n_combinations to represent the unique number of features to sample. This can be achieved by first sampling a larger number of combinations and then cut-off when reaching n_combinations unique combinations. To ensure we sample enough combinations, we should use a while loop to repeat this until n_combinations of unique samples are obtained.

While this could increase runtime for the shapr function a little bit, this part of the code is fast anyway.

The text was updated successfully, but these errors were encountered:

JensWahl · 2021-09-17T07:49:23Z

The functions feature_exact/feature_group and feature_not_exact / feature_group_not_exact (see #277) are very similar and reuse a lot of the same code. We could probably combine them into two functions with an additional group argument?

martinju mentioned this issue Aug 14, 2020

Are there any suggestions/best practices on what value to use for n_combinations? #226

Closed

martinju added this to shapr 1.0.0 Jan 4, 2022

martinju self-assigned this Jan 15, 2022

martinju moved this to Todo in shapr 1.0.0 May 6, 2022

martinju assigned JensWahl and unassigned martinju Aug 24, 2022

JensWahl moved this from Todo to Done in shapr 1.0.0 Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement unique sampling in feature_not_exact #227

Implement unique sampling in feature_not_exact #227

martinju commented Aug 14, 2020

JensWahl commented Sep 17, 2021

Implement unique sampling in feature_not_exact #227

Implement unique sampling in feature_not_exact #227

Comments

martinju commented Aug 14, 2020

JensWahl commented Sep 17, 2021