Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

最新版本的xinference无法正常启动qwen2-vl-instruct模型 #2554

Open
1 of 3 tasks
majestichou opened this issue Nov 14, 2024 · 9 comments
Open
1 of 3 tasks
Labels
Milestone

Comments

@majestichou
Copy link

System Info / 系統信息

cuda 12.2,centos7

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

V0.16.3

The command used to start Xinference / 用以启动 xinference 的命令

docker run -d -v /home/llm-test/embedding_and_rerank_model:/root/models -p 9998:9997 --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0

Reproduction / 复现过程

  1. 下载模型Qwen2-VL-7B-Instruct到目标目录:/home/llm-test/embedding_and_rerank_model
  2. 采用命令 docker run -d -v /home/llm-test/embedding_and_rerank_model:/root/models -p 9998:9997 --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0启动Xinference
  3. 然后打开网页,选择Launch Model,选中qwen2-vl-instruct模型,Model Path填写为/root/models/Qwen2-VL-7B-Instruct,单击启动按钮
  4. 运行报错,报错信息如下
2024-11-14 08:35:27,521 xinference.core.worker 140 ERROR    Failed to load model qwen2-vl-instruct-0
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 897, in launch_builtin_model
    await model_ref.load()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 398, in load
    self._model.load()
  File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen2_vl.py", line 53, in load
    from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
ImportError: [address=0.0.0.0:43921, pid=1213] cannot import name 'Qwen2VLForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/__init__.py)

Expected behavior / 期待表现

能够正常启动模型

@XprobeBot XprobeBot added the gpu label Nov 14, 2024
@XprobeBot XprobeBot added this to the v0.16 milestone Nov 14, 2024
@codingl2k1
Copy link
Contributor

你的transformers版本是多少?可以尝试更新一下transformers.

@majestichou
Copy link
Author

@codingl2k1 啊?镜像里面自带的还不行吗?我用的docker镜像

@jacobdong
Copy link

@codingl2k1
qwen2-audio 模型也是一样

2024-11-17 16:25:41 ImportError: [address=0.0.0.0:46487, pid=176] cannot import name 'Qwen2AudioForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py)

@ChiayenGu
Copy link

我也遇到了这个问题,docker版本是0.16.0

@JumpNew
Copy link

JumpNew commented Nov 18, 2024

升级transformers到最新的版本,就可以启动了

@harryzwh
Copy link

Same here.
And I actually load the AWQ version, but why only the transformer engine available? Does vllm support vision model with AWQ quantization?

@cyhasuka
Copy link
Contributor

Same here. And I actually load the AWQ version, but why only the transformer engine available? Does vllm support vision model with AWQ quantization?

Yes. plz confirm vllm>=0.6.4

@harryzwh
Copy link

Same here. And I actually load the AWQ version, but why only the transformer engine available? Does vllm support vision model with AWQ quantization?

Yes. plz confirm vllm>=0.6.4

Comfirmed that updating transformers to >4.46 and rebuild the docker image fixed this issue. However, changing to the base docker image from vllm 0.6.0 to 0.6.4 introducts a number of errors, mainly because of the python is also updated from 3.10 to 3.12. Still figuring out how to build docker image based on vllm 0.6.4.

@cnrbi1
Copy link

cnrbi1 commented Nov 24, 2024

这个模型调用两次显存占用就翻倍了,有没有释放机制啊,很快就OOM了

@XprobeBot XprobeBot modified the milestones: v0.16, v1.x Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants