Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to get response from manager, error rpc error: code = Unknown desc = can't find kubepods-besteffort-pod6f1c7606_fb47_4c34_82aa_9b9966435a65.slice from docker #184

Open
hyc-yuchen opened this issue May 26, 2023 · 2 comments

Comments

@hyc-yuchen
Copy link

when i use nvidia-smi in pod it comes err that :
fail to get response from manager, error rpc error: code = Unknown desc = can't find kubepods-besteffort-pod6f1c7606_fb47_4c34_82aa_9b9966435a65.slice from docker

@pandaoknight
Copy link

Same problem, already set '--container-runtime-endpoint=/var/run/containerd/containerd.sock'

image: tkestack/gpu-manager:v1.1.5
runtime: containerd
K8s: v1.24.17

Maybe it is ctr's namespace problem, but I don't know how to debug.

@xxsoul
Copy link

xxsoul commented Jul 26, 2024

Is the cgroup version used on the host machine v1 or v2? gpu-manager code uses the path of cgroup v1 to try to read the PID of the container process relative to the host machine, if the host machine is running cgroup v2 it will cause gpu-manager to not be able to read it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants