v0.2.0

li-plus released this 08 Jul 04:33

· 77 commits to main since this release

First release:

Accelerated CPU inference for ChatGLM-6B and ChatGLM2-6B for real-time chatting on MacBook.
Support int4/int5/int8 quantization, KV cache, efficient sampling, parallel computing and streaming generation.
Python binding, web demo, and more possibilities.

Assets 2