Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inefficient in findBestK=TRUE #223

Open
epurdom opened this issue Sep 29, 2017 · 0 comments
Open

inefficient in findBestK=TRUE #223

epurdom opened this issue Sep 29, 2017 · 0 comments

Comments

@epurdom
Copy link
Owner

epurdom commented Sep 29, 2017

If user chooses findBestK=c(TRUE,FALSE) with a range of values, e.g. ks=4:15 we are extremely inefficient, since we run each for 4-15 with findBestK=FALSE, and then for findBestK=TRUE, we RERUN all of k=4-15 and find the best K. This is because everything is run on parallel without cross-talk.

Similarly, if findBestK=TRUE, we throw away k=4-15 and only save the best, which seems like a waste if we just calculated it...

Perhaps should make findBestK so that will calculate and save k=4-15, then post-process those to get best. I.e. in clusterMany, would internally also set findBestK=FALSE, then do findBestK clustering last with just a silhouette processing of the results.

Could also make slot to save silhouette so could easily plot later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant