安装完成后xinference-local --host 0.0.0.0 --port 9997运行报错 #1835

pan-common · 2024-07-10T12:30:53Z

System Info / 系統信息

ubuntu20.0.4
NVIDIA-SMI 535.104.05
Driver Version: 535.104.05
CUDA Version: 12.2

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

Name: xinference
Version: 0.13.0
Summary: Model Serving Made Easy
Home-page: https://github.com/xorbitsai/inference
Author: Qin Xuye
Author-email: [email protected]
License: Apache License 2.0
Location: /root/anaconda3/envs/py311/lib/python3.11/site-packages
Requires: aioprometheus, async-timeout, click, fastapi, fsspec, gradio, huggingface-hub, modelscope, openai, opencv-contrib-python, passlib, peft, pillow, pydantic, pynvml, python-jose, requests, s3fs, sse-starlette, tabulate, timm, torch, tqdm, typer, typing-extensions, uvicorn, xoscar
Required-by:

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

(py311) root@b721c068038e:/opt/xinference# xinference-local --host 0.0.0.0 --port 9997
2024-07-10 12:28:08,395 xinference.core.supervisor 83095 INFO Xinference supervisor 0.0.0.0:44062 started
2024-07-10 12:28:08,425 xinference.core.worker 83095 INFO Starting metrics export server at 0.0.0.0:None
2024-07-10 12:28:08,431 xinference.core.worker 83095 INFO Checking metrics export server...
2024-07-10 12:28:09,600 xinference.core.worker 83095 INFO Metrics server is started at: http://0.0.0.0:41815
2024-07-10 12:28:09,601 xinference.core.worker 83095 INFO Xinference worker 0.0.0.0:44062 started
2024-07-10 12:28:09,602 xinference.core.worker 83095 INFO Purge cache directory: /root/.xinference/cache
2024-07-10 12:28:11,604 xinference.core.worker 83095 ERROR Report status got error.
Traceback (most recent call last):
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status
status = await asyncio.to_thread(gather_node_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 799, in report_status
async with timeout(2):
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit
self._do_exit(exc_type)
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit
raise asyncio.TimeoutError
TimeoutError
2024-07-10 12:28:14,296 xinference.api.restful_api 82961 INFO Starting Xinference at endpoint: http://0.0.0.0:9997
2024-07-10 12:28:14,648 uvicorn.error 82961 INFO Uvicorn running on http://0.0.0.0:9997 (Press CTRL+C to quit)
2024-07-10 12:28:18,618 xinference.core.worker 83095 ERROR Report status got error.
Traceback (most recent call last):
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status
status = await asyncio.to_thread(gather_node_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 799, in report_status
async with timeout(2):
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit
self._do_exit(exc_type)
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit
raise asyncio.TimeoutError
TimeoutError
2024-07-10 12:28:25,628 xinference.core.worker 83095 ERROR Report status got error.
Traceback (most recent call last):
File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status
status = await asyncio.to_thread(gather_node_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Expected behavior / 期待表现

可以正常使用gpu显卡运行

ChengjieLi28 · 2024-07-11T02:45:20Z

@pan-common worker向supervisor汇报状态时出错。
先尝试打开debug日志（另外你的错误没给全，请把完整的全贴上来，During handling of the above exception, another exception occurred:这句后面的都贴出来），看看有没有具体错误。
然后这样可以绕过汇报流程，看看能不能启动

XINFERENCE_DISABLE_HEALTH_CHECK=1 xinference-local --host 0.0.0.0 --port 9997

github-actions · 2024-07-19T19:03:45Z

This issue is stale because it has been open for 7 days with no activity.

github-actions · 2024-08-06T06:21:15Z

This issue is stale because it has been open for 7 days with no activity.

gs80140 · 2024-09-26T09:04:13Z

q

@pan-common worker向supervisor汇报状态时出错。先尝试打开debug日志（另外你的错误没给全，请把完整的全贴上来，During handling of the above exception, another exception occurred:这句后面的都贴出来），看看有没有具体错误。然后这样可以绕过汇报流程，看看能不能启动
XINFERENCE_DISABLE_HEALTH_CHECK=1 xinference-local --host 0.0.0.0 --port 9997

我也遇到这个问题, 按你说的增加XINFERENCE_DISABLE_HEALTH_CHECK=1 配置就可以启动了. 报错具体内容如下

`
WARNING 09-26 17:01:26 _custom_ops.py:18] Failed to import from vllm._C with ImportError('/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/vllm/_C.abi3.so: undefined symbol: cuTensorMapEncodeTiled')
2024-09-26 17:01:32,290 xinference.core.supervisor 667146 INFO Xinference supervisor 127.0.0.1:22599 started
/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
2024-09-26 17:01:32,316 xinference.core.worker 667146 INFO Starting metrics export server at 127.0.0.1:None
2024-09-26 17:01:32,322 xinference.core.worker 667146 INFO Checking metrics export server...
2024-09-26 17:01:34,445 xinference.core.worker 667146 INFO Metrics server is started at: http://127.0.0.1:34503
2024-09-26 17:01:34,446 xinference.core.worker 667146 INFO Purge cache directory: /home/hum/.xinference/cache
2024-09-26 17:01:34,449 xinference.core.supervisor 667146 DEBUG [request ee1ead84-7be5-11ef-9d4d-208810cdd0e8] Enter add_worker, args: <xinference.core.supervisor.SupervisorActor object at 0x7f7fa559aff0>,127.0.0.1:22599, kwargs:
2024-09-26 17:01:34,449 xinference.core.supervisor 667146 DEBUG Worker 127.0.0.1:22599 has been added successfully
2024-09-26 17:01:34,449 xinference.core.supervisor 667146 DEBUG [request ee1ead84-7be5-11ef-9d4d-208810cdd0e8] Leave add_worker, elapsed time: 0 s
2024-09-26 17:01:34,449 xinference.core.worker 667146 INFO Connected to supervisor as a fresh worker
2024-09-26 17:01:34,463 xinference.core.worker 667146 INFO Xinference worker 127.0.0.1:22599 started
2024-09-26 17:01:36,466 xinference.core.worker 667146 ERROR Report status got error.
Traceback (most recent call last):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/xinference/core/worker.py", line 1026, in report_status
status = await asyncio.to_thread(gather_node_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/xinference/core/worker.py", line 1025, in report_status
async with timeout(2):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit
self._do_exit(exc_type)
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit
raise asyncio.TimeoutError
TimeoutError
2024-09-26 17:01:36,477 xinference.core.supervisor 667146 DEBUG Worker 127.0.0.1:22599 resources: {}
2024-09-26 17:01:37,274 xinference.core.supervisor 667146 DEBUG Enter get_status, args: <xinference.core.supervisor.SupervisorActor object at 0x7f7fa559aff0>, kwargs:
2024-09-26 17:01:37,275 xinference.core.supervisor 667146 DEBUG Leave get_status, elapsed time: 0 s
2024-09-26 17:01:39,377 xinference.api.restful_api 666994 INFO Starting Xinference at endpoint: http://127.0.0.1:9997
2024-09-26 17:01:39,543 uvicorn.error 666994 INFO Uvicorn running on http://127.0.0.1:9997 (Press CTRL+C to quit)
2024-09-26 17:01:43,485 xinference.core.worker 667146 ERROR Report status got error.
Traceback (most recent call last):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/xinference/core/worker.py", line 1026, in report_status
status = await asyncio.to_thread(gather_node_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/xinference/core/worker.py", line 1025, in report_status
async with timeout(2):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit
self._do_exit(exc_type)
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit
raise asyncio.TimeoutError
TimeoutError
2024-09-26 17:01:50,493 xinference.core.worker 667146 ERROR Report status got error.
Traceback (most recent call last):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/xinference/core/worker.py", line 1026, in report_status
status = await asyncio.to_thread(gather_node_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/xinference/core/worker.py", line 1025, in report_status
async with timeout(2):
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit
self._do_exit(exc_type)
File "/home/hum/anaconda3/envs/xinf/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit
raise asyncio.TimeoutError
TimeoutError

`

gs80140 · 2024-09-26T09:15:06Z

对CUDA有要求的吧?

jasinliu · 2024-11-24T08:37:51Z

最新版本，同样报错，启动很慢，不知道什么原因

jasinliu · 2024-11-24T08:41:37Z

@pan-common worker向supervisor汇报状态时出错。先尝试打开debug日志（另外你的错误没给全，请把完整的全贴上来，During handling of the above exception, another exception occurred:这句后面的都贴出来），看看有没有具体错误。然后这样可以绕过汇报流程，看看能不能启动

XINFERENCE_DISABLE_HEALTH_CHECK=1 xinference-local --host 0.0.0.0 --port 9997

后续报错，就是重复TimeoutError，应该是在反复尝试。绕过汇报流程后可以很快开启。

XprobeBot added the gpu label Jul 10, 2024

XprobeBot added this to the v0.13.1 milestone Jul 10, 2024

XprobeBot modified the milestones: v0.13.1, v0.13.2 Jul 12, 2024

github-actions bot added the stale label Jul 19, 2024

XprobeBot modified the milestones: v0.13.2, v0.13.4 Jul 26, 2024

github-actions bot removed the stale label Jul 27, 2024

github-actions bot added the stale label Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

安装完成后xinference-local --host 0.0.0.0 --port 9997运行报错 #1835

安装完成后xinference-local --host 0.0.0.0 --port 9997运行报错 #1835

pan-common commented Jul 10, 2024

ChengjieLi28 commented Jul 11, 2024

github-actions bot commented Jul 19, 2024

github-actions bot commented Aug 6, 2024

gs80140 commented Sep 26, 2024

gs80140 commented Sep 26, 2024

jasinliu commented Nov 24, 2024 •

edited

Loading

jasinliu commented Nov 24, 2024

安装完成后xinference-local --host 0.0.0.0 --port 9997运行报错 #1835

安装完成后xinference-local --host 0.0.0.0 --port 9997运行报错 #1835

Comments

pan-common commented Jul 10, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

ChengjieLi28 commented Jul 11, 2024

github-actions bot commented Jul 19, 2024

github-actions bot commented Aug 6, 2024

gs80140 commented Sep 26, 2024

gs80140 commented Sep 26, 2024

jasinliu commented Nov 24, 2024 • edited Loading

jasinliu commented Nov 24, 2024

jasinliu commented Nov 24, 2024 •

edited

Loading