Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized K-means for 1D case (flash1dkmeans integration for faster quantization) #72

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SyphonArch
Copy link
Contributor

Hi again, I previously hinted at faster 1D specific K-means optimizations at #60, and mentioned that my library flash1dkmeans achieved this at #67.

Here I propose a simple integration of my library into nuq.py.

This yields a modest 5x speedup on top of the previous 22.7x speedup at #60, and with this integration each Llama 2 7B layer can be quantized in 2 or 3 seconds. Excluding file IO time, this would make quantization time close to 1 minute for the whole model, down from 6 minutes (which was, in turn, originally down from 2 hours!)

In our Any-Precision LLM codebase we actually managed to bring down this time close to 30 seconds by using Numba multithreading (possible by using underlying Numba functions of flash1dkmeans), and pipelining out the disk IO. However these are additional separate optimizations, and in this PR I focus on providing a drop-in replacement for sklearn's K-means.

The main speedup comes from reducing the time complexity of K-means++ initialization and Lloyd's algorithm iterations by exploiting sorted prefix sum arrays - only possible with 1D data.

If interested in further speeding up the quantization, please consider testing this code.

Questions are welcome!

I noticed that scikit-learn was not in the original pyproject.toml dependencies, even with its usage in nuq.py. If dependencies exclusive to the quantization pipeline are not meant to be included in pyproject.toml, you may want to exclude that part of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant