v1.3.3
What's Changed
- fix gptq params loading
- improve decode latency for long sequences two fold
- feat: add more latency metrics in forward by @OlivierDehaene in #1346
- fix: max_past default value must be -1, not 0 by @OlivierDehaene in #1348
Full Changelog: v1.3.2...v1.3.3