Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用户调用多,出现bug,到2500后自动卡死 #2001

Open
1 of 3 tasks
stevensy123 opened this issue Aug 1, 2024 · 14 comments
Open
1 of 3 tasks

用户调用多,出现bug,到2500后自动卡死 #2001

stevensy123 opened this issue Aug 1, 2024 · 14 comments
Assignees
Labels
Milestone

Comments

@stevensy123
Copy link

System Info / 系統信息

Ubuntu20.04 CUDA12.2.0

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

0.13.0

The command used to start Xinference / 用以启动 xinference 的命令

官方docker启动命令

Reproduction / 复现过程

log文件中的内容
-06-26 01:57:22,996 xoscar.backends.core 1 WARNING Actor caller has created too many clienta (1750 >= 100), the global router may not be set.2024-06-26 02:00:53,354 xoscar.backends.core 1 WARNING Actor caller has created too many clients (1760 >= 100), the global router may not be set.
用户调用数量多后 就会出现此问题,Clients到2500,运行模型自动注销了,且没有释放显存。
是否是docker安装无法支持多用户调用,pip安装是否有此问题

Expected behavior / 期待表现

多用户情况下,正常使用

@XprobeBot XprobeBot added the gpu label Aug 1, 2024
@XprobeBot XprobeBot added this to the v0.14.0 milestone Aug 1, 2024
@stevensy123
Copy link
Author

@qinxuye 请大神指点下

@codingl2k1
Copy link
Contributor

有没有更多日志?

@stevensy123
Copy link
Author

@codingl2k1 日志文件里没有其他的警告和报错 只是不断的输出上述内容 每增加10个client就增加一行 现在不清楚100的限制是在哪里添加的 如何可以修改100的限制还有2500,其实不影响使用

@masktone
Copy link

masktone commented Aug 2, 2024

xinference
类似情况

@codingl2k1
Copy link
Contributor

This InvalidStateError has been fixed by this PR: xorbitsai/xoscar#87 Are you using the latest xinference?

@stevensy123
Copy link
Author

This InvalidStateError has been fixed by this PR: xorbitsai/xoscar#87 Are you using the latest xinference?
v0.13.0,This problem does not affect the use. Too many clients is the problem I want to solve

@michaelxu1107
Copy link

我用的0.10.3版本,日志里也经常看到类似于WARNING Actor caller has created too many clienta (1750 >= 100), the global router may not be set的告警日志,这是因为客户端请求完成后没有释放连接资源吗

@michaelxu1107
Copy link

我用的0.10.3版本,日志里也经常看到类似于WARNING Actor caller has created too many clienta (1750 >= 100), the global router may not be set的告警日志,这是因为客户端请求完成后没有释放连接资源吗

用户并发很低的

@qinxuye
Copy link
Contributor

qinxuye commented Aug 6, 2024

这个问题我们一直没法重现,你们 pip list 下提供下版本。以及什么模型,什么引擎提供下。

@michaelxu1107
Copy link

这个问题我们一直没法重现,你们 pip list 下提供下版本。以及什么模型,什么引擎提供下。

你好,我们这边使用的是python3.10.11版本,xinference使用的是0.10.3版本,模型使用的是Qwen1.5-32B-Chat,推理引擎使用的vllm,python依赖如下
accelerate==0.29.3
addict==2.4.0
aiobotocore==2.7.0
aiofiles==23.2.1
aiohttp==3.9.4
aioitertools==0.11.0
aioprometheus==23.12.0
aiosignal==1.3.1
aliyun-python-sdk-core==2.15.1
aliyun-python-sdk-kms==2.16.2
altair==5.3.0
annotated-types==0.6.0
anthropic==0.25.2
anyio==4.3.0
async-timeout==4.0.3
attrdict==2.0.1
attrs==23.2.0
auto_gptq==0.7.1
autoawq==0.2.3
autoawq_kernels==0.0.6
bcrypt==4.1.2
bitsandbytes==0.42.0
blessed==1.20.0
blinker==1.7.0
botocore==1.31.64
Brotli==1.1.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
chatglm-cpp==0.3.1
click==8.1.7
cloudpickle==3.0.0
cmake==3.29.2
colorama==0.4.6
coloredlogs==15.0.1
ConfigArgParse==1.7
contourpy==1.2.1
controlnet-aux==0.0.7
crcmod==1.7
cryptography==42.0.5
cycler==0.12.1
Cython==3.0.10
dataclasses-json==0.6.4
datasets==2.19.0
diffusers==0.27.2
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
ecdsa==0.19.0
einops==0.7.0
exceptiongroup==1.2.1
fastapi==0.110.2
ffmpy==0.3.2
filelock==3.13.4
FlagEmbedding==1.2.9
Flask==3.0.3
Flask-Cors==4.0.0
Flask-Login==0.6.3
fonttools==4.51.0
frozenlist==1.4.1
fsspec==2023.10.0
gast==0.5.4
gekko==1.1.1
gevent==24.2.1
geventhttpclient==2.2.0
gradio==4.26.0
gradio_client==0.15.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.22.2
humanfriendly==10.0
idna==3.7
imageio==2.34.0
importlib_metadata==7.1.0
importlib_resources==6.4.0
iniconfig==2.0.0
interegular==0.3.3
itsdangerous==2.2.0
Jinja2==3.1.3
jmespath==0.10.0
joblib==1.4.0
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
langchain==0.1.16
langchain-community==0.0.34
langchain-core==0.1.45
langchain-text-splitters==0.0.1
langsmith==0.1.49
lark==1.1.9
lazy_loader==0.4
llama_cpp_python @ file:///home/aigc/iqas/llama_cpp_python-0.2.57-cp310-cp310-manylinux_2_17_x86_64.whl
llvmlite==0.42.0
locust==2.25.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.21.1
matplotlib==3.8.4
mdurl==0.1.2
modelscope==1.13.3
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.3
ninja==1.11.1.1
numba==0.59.1
numpy==1.26.4
nvidia-ml-py==12.550.52
nvidia-nccl-cu12==2.18.1
openai==1.17.1
opencv-contrib-python==4.9.0.80
opencv-python==4.9.0.80
optimum==1.19.0
orjson==3.10.1
oss2==2.18.4
outlines==0.0.34
packaging==23.2
pandas==2.2.2
passlib==1.7.4
peft==0.10.0
pillow==10.3.0
platformdirs==4.2.0
plumbum==1.8.2
prometheus_client==0.20.0
protobuf==5.26.1
psutil==5.9.8
py-cpuinfo==9.0.0
pyarrow==16.0.0
pyarrow-hotfix==0.6
pyasn1==0.6.0
pycparser==2.22
pycryptodome==3.20.0
pydantic==2.7.0
pydantic-settings==2.2.1
pydantic_core==2.18.1
pydub==0.25.1
Pygments==2.17.2
pynvml==11.5.0
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-jose==3.3.0
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.0
quantile-python==1.1
ray==2.11.0
referencing==0.34.0
regex==2024.4.16
requests==2.31.0
rich==13.7.1
rouge==1.0.1
roundrobin==0.0.4
rpds-py==0.18.0
rpyc==6.0.0
rsa==4.9
ruff==0.3.7
s3fs==2023.10.0
safetensors==0.4.3
scikit-image==0.23.1
scikit-learn==1.4.2
scipy==1.13.0
semantic-version==2.10.0
sentence-transformers==2.7.0
sentencepiece==0.2.0
sglang==0.1.14
shellingham==1.5.4
simplejson==3.19.2
six==1.16.0
sniffio==1.3.1
sortedcontainers==2.4.0
SQLAlchemy==2.0.29
sse-starlette==2.1.0
starlette==0.37.2
starlette-context==0.3.6
sympy==1.12
tabulate==0.9.0
tblib==3.0.0
tenacity==8.2.3
threadpoolctl==3.4.0
tifffile==2024.2.12
tiktoken==0.6.0
timm==0.9.16
tokenizers==0.15.2
tomli==2.0.1
tomlkit==0.12.0
toolz==0.12.1
torch @ file:///home/aigc/iqas/torch-2.1.2%2Bcu121-cp310-cp310-linux_x86_64.whl
torchvision @ file:///home/aigc/iqas/torchvision-0.16.2%2Bcu121-cp310-cp310-linux_x86_64.whl
tqdm==4.66.2
transformers==4.39.3
transformers-stream-generator==0.0.5
triton==2.1.0
typer==0.11.1
typing-inspect==0.9.0
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.0.7
uvicorn==0.29.0
uvloop==0.19.0
vllm==0.4.0.post1
watchfiles==0.21.0
wcwidth==0.2.13
websockets==11.0.3
Werkzeug==3.0.2
wrapt==1.16.0
xformers==0.0.23.post1
xinference==0.10.3
xoscar==0.3.0
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zipp==3.18.1
zmq==0.0.0
zope.event==5.0
zope.interface==6.3
zstandard==0.22.0

@XprobeBot XprobeBot modified the milestones: v0.14, v0.15 Sep 3, 2024
@TragedyN
Copy link

same issue

@codingl2k1
Copy link
Contributor

same issue

也是上面一样的日志吗?

@TragedyN
Copy link

TragedyN commented Oct 16, 2024

same issue

也是上面一样的日志吗?

logs/local_1729066531438/xinference.log打印:
2024-10-16 08:53:34,093 xoscar.backends.core 8 WARNING Actor caller has created too many clients (940 >= 100), the global router may not be set.
2024-10-16 08:53:34,977 xoscar.backends.core 8 WARNING Actor caller has created too many clients (950 >= 100), the global router may not be set.
2024-10-16 08:53:35,036 xoscar.backends.core 8 WARNING Actor caller has created too many clients (960 >= 100), the global router may not be set.
2024-10-16 08:53:35,528 xoscar.backends.core 8 WARNING Actor caller has created too many clients (970 >= 100), the global router may not be set.
2024-10-16 08:53:35,730 xoscar.backends.core 8 WARNING Actor caller has created too many clients (980 >= 100), the global router may not be set.
使用的是python3.10.12版本,xinference使用的是0.15.3版本,模型使用的是Qwen2.5-7B-Ins,推理引擎使用的vllm 0.6.2, torch 版本是2.4.0, xoscar版本是0.3.3

@XprobeBot XprobeBot modified the milestones: v0.15, v0.16 Oct 30, 2024
@mikck
Copy link

mikck commented Nov 18, 2024

same issue

image

@codingl2k1 codingl2k1 self-assigned this Nov 19, 2024
@XprobeBot XprobeBot modified the milestones: v0.16, v1.x Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants