Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K3S compatibility #1

Open
dbkegley opened this issue Mar 27, 2019 · 1 comment
Open

K3S compatibility #1

dbkegley opened this issue Mar 27, 2019 · 1 comment

Comments

@dbkegley
Copy link

First of all thank you for sharing your work on this plugin, it has saved me a lot of time already.

I have started working on k3s compatibility for this device plugin and I have gotten to the point when the plugin discovers available TPUs and registers them with the kubelet.

$ kubectl describe nodes
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource              Requests   Limits
  --------              --------   ------
  cpu                   100m (2%)  0 (0%)
  memory                70Mi (7%)  170Mi (18%)
  ephemeral-storage     0 (0%)     0 (0%)
  kkohtaka.org/edgetpu  1          1
I0327 17:02:58.014104       1 plugin.go:98] Started gRPC service on plugin socket
I0327 17:02:58.014183       1 plugin.go:101] Started monitoring devices
I0327 17:02:58.014211       1 plugin.go:49] gRPC server started.
I0327 17:02:58.015416       1 plugin.go:118] Opened connection to kubelet socket
I0327 17:02:58.015486       1 plugin.go:121] Registering dpServer: &{[] 0xd58100 0xd58140}
I0327 17:02:58.017211       1 plugin.go:134] Registered device plugin
I0327 17:02:58.019309       1 server.go:56] Start watching devices
I0327 17:02:58.019419       1 server.go:66] Update a device list
I0327 17:02:58.019455       1 server.go:126] Starting Edge TPU device monitor
I0327 17:03:03.184672       1 server.go:155] Edge TPU became active.
I0327 17:03:03.185096       1 server.go:66] Update a device list
I0327 17:04:03.390627       1 server.go:79] Container TPU request: &ContainerAllocateRequest{DevicesIDs:[42],}
I0327 17:04:03.390765       1 server.go:80] Allocating devices... Device IDs: [42]

side note: it looks like the device id is hardcoded to 42 so only 1 TPU is currently allowed per node, do you plan to support multiple devices?

I am able to schedule a pod which requests a TPU but the container fails to start due to:

Events:
  Type     Reason     Age               From                  Message
  ----     ------     ----              ----                  -------
  Normal   Scheduled  86s               default-scheduler     Successfully assigned default/edgetpu-demo-54f5l to raspberrypi
  Normal   Pulling    2s (x2 over 84s)  kubelet, raspberrypi  pulling image "quay.io/kkohtaka/edgetpu-demo:arm32"
  Warning  Failed     2s                kubelet, raspberrypi  Error: failed to generate container "41e1245b846a2a815f54ca40741d10fc071f91b34eadf22309c52f949ea1d4ce" spec: failed to set devices mapping [&Device{ContainerPath:/dev/bus/usb,HostPath:/dev/bus/usb,Permissions:rw,}]: not a device node
  Normal   Pulled     1s (x2 over 2s)   kubelet, raspberrypi  Successfully pulled image "quay.io/kkohtaka/edgetpu-demo:arm32"
  Warning  Failed     1s                kubelet, raspberrypi  Error: failed to generate container "d72f48a0b04a560b1c7e81ec20d686b6ca3710f0a02fd38cecb1b937b52e8d05" spec: failed to set devices mapping [&Device{ContainerPath:/dev/bus/usb,HostPath:/dev/bus/usb,Permissions:rw,}]: not a device node

It looks like I need to specify an absolute path to the device but I'm not sure what that would be, could you point me in the right direction? I'm using a raspberrypiB+ with a coral accelerator usb for testing

If you're open to it, I'd be happy to submit a PR for k3s support once I get this working.

@kkohtaka
Copy link
Owner

Thanks for trying this project. This project was started as my hobby, but contributions are welcome.

It looks like I need to specify an absolute path to the device but I'm not sure what that would be, could you point me in the right direction?

I don't have an idea to solve the issue, because I haven't tried this device plugin on k3s yet. Currently, this plugin is tested only on k8s deployed by kubeadm on Raspberry Pi B.

side note: it looks like the device id is hardcoded to 42 so only 1 TPU is currently allowed per node, do you plan to support multiple devices?

You're correct. I will fix this issue, if I find a reliable way to generate an ID for each device.

If you could solve these issues, I would be happy to review your PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants