-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k-means out of memory error on large data sets #179
Comments
You can try to convert your data to |
are there any plans to provide a minibatch version such as https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html ? |
@jjlynch2 the memory problem you mention happens because the implementation stores a 500,000x50,000 distance matrix when using Ideally it would be very useful to have the option to define a backend implementation when fitting the K-means so that users could opt to different implementations (maybe you care a lot about memory but not that much speed, maybe you want to maximize speed even if at a higher memory cost etc). |
Hi! I'm facing a similar issue; I'm trying to cluster a matrix of size On a side note: are there are plans to implement faster k-means algorithm? Or any kind of support for parallelism or GPUs? Python's |
I'm looking to switch to Julia for my k-means clustering needs. However, I'm regularly using k-means on three-dimensional data sets with 500,000 data points on average. Typically I use k-means to identify 10% or roughly 50,000 clusters. I am unable to run this as it receives an out of memory error on a machine with 64 gb of ram. Is there a way around this, or should I just develop my own k-means implementation in Julia for high performance?
The text was updated successfully, but these errors were encountered: