We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
centos7
docker镜像 xinference v0.16.3
集群部署: master: docker run -itd --shm-size=4g --ulimit memlock=-1 -v /data2/sang/models:/models -v /data2/sang/xinference:/xinference -e XINFERENCE_HOME=/xinference -p 6000:6000 --name xinference-master --gpus all --net=host registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:v0.16.3 xinference-supervisor -H 172.20.107.176 --port 6000
worker1: docker run -itd --shm-size=4g --ulimit memlock=-1 -v /data2/sang/models:/models -v /data2/sang/xinference:/xinference -e XINFERENCE_HOME=/xinference -p 6001:6001 --name xinference-worker --gpus all --net=host registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:v0.16.3 xinference-worker -e "http://172.20.107.176:6000" -H 172.20.107.176 --worker-port 6001
worker2: docker run -itd --shm-size=4g --ulimit memlock=-1 -v /data2/sang/models:/models -v /data2/sang/xinference:/xinference -e XINFERENCE_HOME=/xinference -p 6001:6001 --name xinference-worker --gpus all --net=host registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:v0.16.3 xinference-worker -e "http://172.20.107.176:6000" -H 172.20.107.175 --worker-port 6001
运行任何模型,replica为2 (这样每个worker都会存一份),然后停止其中一个worker(docker stop xinference-worker),但是另1个worker是好的,但是整个集群都无法正常工作,模型无法提供服务。
我发现已经有其他用户提过此问题,希望能够继续提供服务,因为另1个worker还存活着,而且也是加载了这个模型的。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
System Info / 系統信息
centos7
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
docker镜像 xinference v0.16.3
The command used to start Xinference / 用以启动 xinference 的命令
集群部署:
master:
docker run -itd --shm-size=4g --ulimit memlock=-1 -v /data2/sang/models:/models -v /data2/sang/xinference:/xinference -e XINFERENCE_HOME=/xinference -p 6000:6000 --name xinference-master --gpus all --net=host registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:v0.16.3 xinference-supervisor -H 172.20.107.176 --port 6000
worker1:
docker run -itd --shm-size=4g --ulimit memlock=-1 -v /data2/sang/models:/models -v /data2/sang/xinference:/xinference -e XINFERENCE_HOME=/xinference -p 6001:6001 --name xinference-worker --gpus all --net=host registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:v0.16.3 xinference-worker -e "http://172.20.107.176:6000" -H 172.20.107.176 --worker-port 6001
worker2:
docker run -itd --shm-size=4g --ulimit memlock=-1 -v /data2/sang/models:/models -v /data2/sang/xinference:/xinference -e XINFERENCE_HOME=/xinference -p 6001:6001 --name xinference-worker --gpus all --net=host registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:v0.16.3 xinference-worker -e "http://172.20.107.176:6000" -H 172.20.107.175 --worker-port 6001
Reproduction / 复现过程
运行任何模型,replica为2 (这样每个worker都会存一份),然后停止其中一个worker(docker stop xinference-worker),但是另1个worker是好的,但是整个集群都无法正常工作,模型无法提供服务。
Expected behavior / 期待表现
我发现已经有其他用户提过此问题,希望能够继续提供服务,因为另1个worker还存活着,而且也是加载了这个模型的。
The text was updated successfully, but these errors were encountered: