Skip to content

v0.4.0

Compare
Choose a tag to compare
@OlivierDehaene OlivierDehaene released this 09 Mar 15:10
· 1040 commits to main since this release
411d624

Features

  • router: support best_of sampling
  • router: support left truncation
  • server: support typical sampling
  • launcher: allow local models
  • clients: add text-generation Python client
  • launcher: allow parsing num_shard from CUDA_VISIBLE_DEVICES

Fix

  • server: do not warp prefill logits
  • server: fix formatting issues in generate_stream tokens
  • server: fix galactica batch
  • server: fix index out of range issue with watermarking