FEAT: support guided decoding for vllm async engine #2391

wxiwnd · 2024-10-03T06:17:44Z

Support Guided Decoding for vllm async engine
waiting for vllm release, a version bump is needed.

#1562
vllm-project/vllm#8252

qinxuye · 2024-10-11T05:01:49Z

Which version is required?

wxiwnd · 2024-10-11T05:16:24Z

Which version is required?

latest version after 0.6.2, waiting for vllm to release new version

qinxuye · 2024-10-17T09:37:46Z

vllm has release v0.6.3, is this PR ready to work?

wxiwnd · 2024-10-17T17:49:21Z

I will do the test.

…

________________________________ 寄件者: Xuye Qin ***@***.***> 寄件日期: 星期四, 10月 17, 2024 5:38:13 下午收件者: xorbitsai/inference ***@***.***> 副本: wxiwnd ***@***.***>; Author ***@***.***> 主旨: Re: [xorbitsai/inference] feat: support guided decoding for vllm async engine (PR #2391) vllm has release v0.6.3, is this PR ready to work? — Reply to this email directly, view it on GitHub<#2391 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJSDNXWKTHLIQE35VTQLT23Z36AQDAVCNFSM6AAAAABPJCVJQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJZGA2DOMZWGE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

wxiwnd · 2024-10-22T09:57:53Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now
Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

qinxuye · 2024-10-22T14:31:50Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

wxiwnd · 2024-10-26T08:58:28Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

It now works properly even if vllm version < 0.6.3
All the guided encoding parameters will be ignored if vllm version is under 0.6.3

xinference/_compat.py

xinference/api/restful_api.py

Signed-off-by: wxiwnd <[email protected]>

wxiwnd · 2024-11-24T18:36:40Z

This feature has been tested on my machine and appears to be functioning properly. @qinxuye

qinxuye · 2024-11-28T12:00:13Z

This feature has been tested on my machine and appears to be functioning properly. @qinxuye

Thanks, hope we can merge this before tomorrow before new version released.

qinxuye

LGTM

qinxuye · 2024-11-28T12:01:49Z

Thanks, I think we can port the ability of guide decoding for other engines later.

wxiwnd · 2024-11-28T14:06:18Z

Thanks, I think we can port the ability of guide decoding for other engines later.

That's OK, we can use outlines directly for other engines.

XprobeBot added the feature label Oct 3, 2024

XprobeBot added this to the v0.15 milestone Oct 3, 2024

wxiwnd marked this pull request as draft October 3, 2024 06:18

wxiwnd marked this pull request as ready for review October 5, 2024 07:55

wxiwnd force-pushed the feat/guided_generation branch 2 times, most recently from 2968700 to cd0812a Compare October 15, 2024 10:35

wxiwnd force-pushed the feat/guided_generation branch 7 times, most recently from 4d9e044 to 852c86c Compare October 22, 2024 09:30

wxiwnd force-pushed the feat/guided_generation branch 3 times, most recently from 823887f to df849b1 Compare October 26, 2024 08:35

qinxuye reviewed Oct 30, 2024

View reviewed changes

xinference/_compat.py Outdated Show resolved Hide resolved

XprobeBot modified the milestones: v0.15, v0.16 Oct 30, 2024

wxiwnd force-pushed the feat/guided_generation branch 4 times, most recently from b8025ea to eb816c1 Compare November 5, 2024 17:49

wxiwnd force-pushed the feat/guided_generation branch 2 times, most recently from d1d41bf to 9d13391 Compare November 5, 2024 18:00

qinxuye reviewed Nov 19, 2024

View reviewed changes

xinference/_compat.py Outdated Show resolved Hide resolved

xinference/api/restful_api.py Outdated Show resolved Hide resolved

wxiwnd added 3 commits November 22, 2024 15:39

feat: support guided decoding for vllm async engine

dbfcc20

Signed-off-by: wxiwnd <[email protected]>

feat: support response_format

b59fe78

Signed-off-by: wxiwnd <[email protected]>

change(restful-api): add extract_guided_params()

60e3e3e

Signed-off-by: wxiwnd <[email protected]>

wxiwnd force-pushed the feat/guided_generation branch from 9d13391 to 60e3e3e Compare November 22, 2024 07:47

wxiwnd marked this pull request as draft November 22, 2024 08:32

wxiwnd added 2 commits November 22, 2024 17:08

revert: revert to 9d13391

71e9f5a

Signed-off-by: wxiwnd <[email protected]>

refactor: use pydantic model

dbf5216

Signed-off-by: wxiwnd <[email protected]>

wxiwnd marked this pull request as ready for review November 24, 2024 18:33

wxiwnd requested a review from qinxuye November 24, 2024 18:36

XprobeBot modified the milestones: v0.16, v1.x Nov 25, 2024

qinxuye approved these changes Nov 28, 2024

View reviewed changes

qinxuye merged commit 23f09f9 into xorbitsai:main Nov 28, 2024
12 of 13 checks passed

qinxuye changed the title ~~feat: support guided decoding for vllm async engine~~ FEAT: support guided decoding for vllm async engine Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: support guided decoding for vllm async engine #2391

FEAT: support guided decoding for vllm async engine #2391

wxiwnd commented Oct 3, 2024 •

edited

Loading

qinxuye commented Oct 11, 2024

wxiwnd commented Oct 11, 2024 •

edited

Loading

qinxuye commented Oct 17, 2024

wxiwnd commented Oct 17, 2024 via email

wxiwnd commented Oct 22, 2024

qinxuye commented Oct 22, 2024 •

edited

Loading

wxiwnd commented Oct 26, 2024 •

edited

Loading

wxiwnd commented Nov 24, 2024 •

edited

Loading

qinxuye commented Nov 28, 2024

qinxuye left a comment

qinxuye commented Nov 28, 2024

wxiwnd commented Nov 28, 2024

FEAT: support guided decoding for vllm async engine #2391

FEAT: support guided decoding for vllm async engine #2391

Conversation

wxiwnd commented Oct 3, 2024 • edited Loading

qinxuye commented Oct 11, 2024

wxiwnd commented Oct 11, 2024 • edited Loading

qinxuye commented Oct 17, 2024

wxiwnd commented Oct 17, 2024 via email

wxiwnd commented Oct 22, 2024

qinxuye commented Oct 22, 2024 • edited Loading

wxiwnd commented Oct 26, 2024 • edited Loading

wxiwnd commented Nov 24, 2024 • edited Loading

qinxuye commented Nov 28, 2024

qinxuye left a comment

Choose a reason for hiding this comment

qinxuye commented Nov 28, 2024

wxiwnd commented Nov 28, 2024

wxiwnd commented Oct 3, 2024 •

edited

Loading

wxiwnd commented Oct 11, 2024 •

edited

Loading

qinxuye commented Oct 22, 2024 •

edited

Loading

wxiwnd commented Oct 26, 2024 •

edited

Loading

wxiwnd commented Nov 24, 2024 •

edited

Loading