Skip to content

v0.2.0

Compare
Choose a tag to compare
@li-plus li-plus released this 08 Jul 04:33
· 77 commits to main since this release
f0433b4

First release:

  • Accelerated CPU inference for ChatGLM-6B and ChatGLM2-6B for real-time chatting on MacBook.
  • Support int4/int5/int8 quantization, KV cache, efficient sampling, parallel computing and streaming generation.
  • Python binding, web demo, and more possibilities.