Skip to content

v1.3.3

Compare
Choose a tag to compare
@OlivierDehaene OlivierDehaene released this 15 Dec 00:22
· 647 commits to main since this release

What's Changed

  • fix gptq params loading
  • improve decode latency for long sequences two fold
  • feat: add more latency metrics in forward by @OlivierDehaene in #1346
  • fix: max_past default value must be -1, not 0 by @OlivierDehaene in #1348

Full Changelog: v1.3.2...v1.3.3