Further speeding up the quantization process #67

SyphonArch · 2024-05-05T20:40:44Z

I previously contributed a pull request that reduced the runtime of the main clustering algorithm from over two hours to just six minutes for the Llama 2 7B model (#60). In the 'Further Suggestions' section of that PR, I mentioned potential optimizations by exploiting the 1D nature of the task.

I'm excited to share that I've developed a Python package, flash1dkmeans, which implements a faster 1D K-means algorithm. This package is now part of the Any-Precision LLM project, a variable bit-rate quantization scheme using SqueezeLLM as the seed model. With this new implementation, we've managed to further reduce the execution time for SqueezeLLM to 38 seconds on an i9-13900K machine, achieving a further tenfold speed increase.

If interested in integrating this speed enhancement, you can refer to the code in Any-Precision LLM, as an example where we use the package to create the seed model. For maximum performance gains, consider accelerating the caller function with @numba.njit(parallel=True). However, even using the standard multiprocessing pool should yield significant improvements.

This package can serve as an almost drop-in replacement for sklearn's K-means if you're looking to speed up SqueezeLLM further. Of course, sticking with sklearn for better transparency is perfectly fine too. I wanted to share these findings, as your work helped create ours 👍 .

The text was updated successfully, but these errors were encountered:

SyphonArch mentioned this issue Jul 17, 2024

Optimized K-means for 1D case (flash1dkmeans integration for faster quantization) #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further speeding up the quantization process #67

Further speeding up the quantization process #67

SyphonArch commented May 5, 2024 •

edited

Loading

Further speeding up the quantization process #67

Further speeding up the quantization process #67

Comments

SyphonArch commented May 5, 2024 • edited Loading

SyphonArch commented May 5, 2024 •

edited

Loading