v0.2.0
First release:
- Accelerated CPU inference for ChatGLM-6B and ChatGLM2-6B for real-time chatting on MacBook.
- Support int4/int5/int8 quantization, KV cache, efficient sampling, parallel computing and streaming generation.
- Python binding, web demo, and more possibilities.