-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用vllm启动glm-4v-9b服务,调用/v1/chat/completions报错 #630
Comments
请提供一下调用api的脚本 |
启动后Postman调用,也可以使用curl命令 curl --location --request POST 'http://136.1.5.93:10085/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "glm-4v-9b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "如何用python读取一个文件"
}
]
}'
|
@sixsixcoder 似乎需要有chat template,GLM-4v-9b模型是否有模板呢 |
@sixsixcoder modelscope处提交了一个pr,添加了chat_template,修改了该问题 |
感谢您的贡献 |
应该是vllm一定会调用template,这个建议是好的,非常感谢,我们会验证这个template能不能在transfomers上使用,直接一并合并 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System Info / 系統信息
vllm 0.6.3.post1
transformers 4.46.1
glm-4v-9b
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
1、启动脚本:CUDA_VISIBLE_DEVICES=5,7 python -m vllm.entrypoints.openai.api_server --model=/beeb/ap/iaf/models/modelscope/hub/glm-4v-9b --served-model-name=glm-4v-9b --device=cuda --port=10085 --host=0.0.0.0 --tensor-parallel-size=2 --dtype=auto --trust-remote-code
2、调用API:/v1/chat/completions
3、出现报错
As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.
Expected behavior / 期待表现
调用接口后回答对应的结果
The text was updated successfully, but these errors were encountered: