You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I previously contributed a pull request that reduced the runtime of the main clustering algorithm from over two hours to just six minutes for the Llama 2 7B model (#60). In the 'Further Suggestions' section of that PR, I mentioned potential optimizations by exploiting the 1D nature of the task.
I'm excited to share that I've developed a Python package, flash1dkmeans, which implements a faster 1D K-means algorithm. This package is now part of the Any-Precision LLM project, a variable bit-rate quantization scheme using SqueezeLLM as the seed model. With this new implementation, we've managed to further reduce the execution time for SqueezeLLM to 38 seconds on an i9-13900K machine, achieving a further tenfold speed increase.
If interested in integrating this speed enhancement, you can refer to the code in Any-Precision LLM, as an example where we use the package to create the seed model. For maximum performance gains, consider accelerating the caller function with @numba.njit(parallel=True). However, even using the standard multiprocessing pool should yield significant improvements.
This package can serve as an almost drop-in replacement for sklearn's K-means if you're looking to speed up SqueezeLLM further. Of course, sticking with sklearn for better transparency is perfectly fine too. I wanted to share these findings, as your work helped create ours 👍 .
The text was updated successfully, but these errors were encountered:
I previously contributed a pull request that reduced the runtime of the main clustering algorithm from over two hours to just six minutes for the Llama 2 7B model (#60). In the 'Further Suggestions' section of that PR, I mentioned potential optimizations by exploiting the 1D nature of the task.
I'm excited to share that I've developed a Python package, flash1dkmeans, which implements a faster 1D K-means algorithm. This package is now part of the Any-Precision LLM project, a variable bit-rate quantization scheme using SqueezeLLM as the seed model. With this new implementation, we've managed to further reduce the execution time for SqueezeLLM to 38 seconds on an i9-13900K machine, achieving a further tenfold speed increase.
If interested in integrating this speed enhancement, you can refer to the code in Any-Precision LLM, as an example where we use the package to create the seed model. For maximum performance gains, consider accelerating the caller function with
@numba.njit(parallel=True)
. However, even using the standard multiprocessing pool should yield significant improvements.This package can serve as an almost drop-in replacement for sklearn's K-means if you're looking to speed up SqueezeLLM further. Of course, sticking with sklearn for better transparency is perfectly fine too. I wanted to share these findings, as your work helped create ours 👍 .
The text was updated successfully, but these errors were encountered: