inefficient in findBestK=TRUE #223

epurdom · 2017-09-29T08:21:00Z

If user chooses findBestK=c(TRUE,FALSE) with a range of values, e.g. ks=4:15 we are extremely inefficient, since we run each for 4-15 with findBestK=FALSE, and then for findBestK=TRUE, we RERUN all of k=4-15 and find the best K. This is because everything is run on parallel without cross-talk.

Similarly, if findBestK=TRUE, we throw away k=4-15 and only save the best, which seems like a waste if we just calculated it...

Perhaps should make findBestK so that will calculate and save k=4-15, then post-process those to get best. I.e. in clusterMany, would internally also set findBestK=FALSE, then do findBestK clustering last with just a silhouette processing of the results.

Could also make slot to save silhouette so could easily plot later.

The text was updated successfully, but these errors were encountered:

epurdom added the enhancement label Sep 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inefficient in findBestK=TRUE #223

inefficient in findBestK=TRUE #223

epurdom commented Sep 29, 2017

inefficient in findBestK=TRUE #223

inefficient in findBestK=TRUE #223

Comments

epurdom commented Sep 29, 2017