v2.0.4
Main changes
What's Changed
- OpenAI function calling compatible support by @phangiabao98 in #1888
- Fixing types. by @Narsil in #1906
- Types. by @Narsil in #1909
- Fixing signals. by @Narsil in #1910
- Removing some unused code. by @Narsil in #1915
- MI300 compatibility by @fxmarty in #1764
- Add TGI monitoring guide through Grafana and Prometheus by @fxmarty in #1908
- Update grafana template by @fxmarty in #1918
- Fix TunableOp bug by @fxmarty in #1920
- Fix TGI issues with ROCm by @fxmarty in #1921
- Fixing the download strategy for ibm-fms by @Narsil in #1917
- ROCm: make CK FA2 default instead of Triton by @fxmarty in #1924
- docs: Fix grafana dashboard url by @edwardzjl in #1925
- feat: include token in client test like server tests by @drbh in #1932
- Creating doc automatically for supported models. by @Narsil in #1929
- fix: use path inside of speculator config by @drbh in #1935
- feat: add train medusa head tutorial by @drbh in #1934
- reenable xpu for tgi by @sywangyi in #1939
- Fixing some legacy behavior (big swapout of serverless on legacy stuff). by @Narsil in #1937
- Add completion route to client and add stop parameter where it's missing by @thomas-schillaci in #1869
- Improving the logging system. by @Narsil in #1938
- Fixing codellama loads by using purely
AutoTokenizer
. by @Narsil in #1947
New Contributors
- @phangiabao98 made their first contribution in #1888
- @edwardzjl made their first contribution in #1925
- @thomas-schillaci made their first contribution in #1869
Full Changelog: v2.0.3...v2.0.4