v0.1.0
What's Changed
- Support Falcon 180B by @casper-hansen in #35
- [NEW] GEMV kernel implementation by @casper-hansen in #40
- Allow user to use custom calibration data for quantization by @boehm-e in #27
- Safetensors and model sharding by @casper-hansen in #47
- 2x faster context processing with GEMV by @casper-hansen in #58
- Support kv_heads by @casper-hansen in #60
- Refactor quantization code by @casper-hansen in #62
- support windows by @qwopqwop200 in #53
- Improve model loading by @casper-hansen in #66
New Contributors
Full Changelog: v0.0.2...v0.1.0