v0.4.0

OlivierDehaene released this 09 Mar 15:10

· 1040 commits to main since this release

Features

router: support best_of sampling
router: support left truncation
server: support typical sampling
launcher: allow local models
clients: add text-generation Python client
launcher: allow parsing num_shard from CUDA_VISIBLE_DEVICES

Fix

server: do not warp prefill logits
server: fix formatting issues in generate_stream tokens
server: fix galactica batch
server: fix index out of range issue with watermarking

Assets 2