Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement unique sampling in feature_not_exact #227

Open
martinju opened this issue Aug 14, 2020 · 1 comment
Open

Implement unique sampling in feature_not_exact #227

martinju opened this issue Aug 14, 2020 · 1 comment
Assignees

Comments

@martinju
Copy link
Member

When providing n_combinations to shapr, sampling of features combinations are performed unless a n_combinations is larger than 2^m (number of unique samples), for which all combinations are used instead. Currently, n_combinations specifies the number of samples being performed, not how many unique ones should be used in the end (and that is what matters from a memory/runtime perspective). A drawback of this is that if you have say 13 features and set n_combinations = NULL (to use exact) you get all 8192 unique combinations, while if you set n_combinations = 10000 (our recommendation for larger values of m), you get just 2000-3000 unique features.

Suggesting to change the behavior of n_combinations to represent the unique number of features to sample. This can be achieved by first sampling a larger number of combinations and then cut-off when reaching n_combinations unique combinations. To ensure we sample enough combinations, we should use a while loop to repeat this until n_combinations of unique samples are obtained.

While this could increase runtime for the shapr function a little bit, this part of the code is fast anyway.

@JensWahl
Copy link
Contributor

The functions feature_exact/feature_group and feature_not_exact / feature_group_not_exact (see #277) are very similar and reuse a lot of the same code. We could probably combine them into two functions with an additional group argument?

@martinju martinju self-assigned this Jan 15, 2022
@martinju martinju moved this to Todo in shapr 1.0.0 May 6, 2022
@martinju martinju assigned JensWahl and unassigned martinju Aug 24, 2022
@JensWahl JensWahl moved this from Todo to Done in shapr 1.0.0 Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants