Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create pod sandbox #28

Closed
DragonHunter274 opened this issue Sep 16, 2024 · 9 comments
Closed

Failed to create pod sandbox #28

DragonHunter274 opened this issue Sep 16, 2024 · 9 comments

Comments

@DragonHunter274
Copy link

DragonHunter274 commented Sep 16, 2024

Trying to use zeropod on k3s, kubernetes version v1.27.7
getting this error: FailedCreatePodSandBox 13s (x11 over 2m20s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "zeropod" is configured

Kubernetes Manifest (click to reveal)
apiVersion: v1
kind: List
items:
- apiVersion: v1
  kind: Namespace
  metadata:
    labels:
      kubernetes.io/metadata.name: zeropod-system
    name: zeropod-system
  spec:
    finalizers:
    - kubernetes

- apiVersion: v1
  kind: ServiceAccount
  metadata:
    name: zeropod-node
    namespace: zeropod-system

- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRole
  metadata:
    name: zeropod:pod-updater
  rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "update"]

- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRole
  metadata:
    name: zeropod:runtimeclass-installer
  rules:
  - apiGroups: ["node.k8s.io"]
    resources: ["runtimeclasses"]
    verbs: ["create", "delete", "update"]

- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRoleBinding
  metadata:
    name: zeropod:pod-updater
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: zeropod:pod-updater
  subjects:
  - kind: ServiceAccount
    name: zeropod-node
    namespace: zeropod-system

- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRoleBinding
  metadata:
    name: zeropod:runtimeclass-installer
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: zeropod:runtimeclass-installer
  subjects:
  - kind: ServiceAccount
    name: zeropod-node
    namespace: zeropod-system

- apiVersion: apps/v1
  kind: DaemonSet
  metadata:
    labels:
      app.kubernetes.io/name: zeropod-node
    name: zeropod-node
    namespace: zeropod-system
  spec:
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app.kubernetes.io/name: zeropod-node
    template:
      metadata:
        labels:
          app.kubernetes.io/name: zeropod-node
      spec:
        containers:
        - args:
          - -metrics-addr=:8080
          - -status-labels=true
          command:
          - /zeropod-manager
          image: ghcr.io/ctrox/zeropod-manager:v0.4.1
          imagePullPolicy: IfNotPresent
          name: manager
          ports:
          - containerPort: 8080
            name: metrics
            protocol: TCP
          securityContext:
            capabilities:
              add:
              - SYS_PTRACE
              - SYS_ADMIN
              - NET_ADMIN
            privileged: true
          volumeMounts:
          - mountPath: /run/zeropod
            name: zeropod-run
          - mountPath: /hostproc
            name: hostproc
          - mountPath: /sys/fs/bpf
            name: bpf
        initContainers:
        - args:
          - -criu-image=ghcr.io/ctrox/zeropod-criu:v3.19
          image: ghcr.io/ctrox/zeropod-installer:v0.4.1
          imagePullPolicy: IfNotPresent
          name: installer
          volumeMounts:
          - mountPath: /etc/containerd
            name: containerd-etc
          - mountPath: /run/containerd
            name: containerd-run
          - mountPath: /opt/zeropod
            name: zeropod-opt
          - mountPath: /run/systemd
            name: systemd-run
          - mountPath: /etc/criu
            name: criu-etc
        - args:
          - mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf
          command:
          - /bin/sh
          - -c
          - --
          image: alpine:3.19.1
          imagePullPolicy: IfNotPresent
          name: prepare-bpf-fs
          securityContext:
            privileged: true
          volumeMounts:
          - mountPath: /sys/fs/bpf
            mountPropagation: Bidirectional
            name: bpf
        nodeSelector:
          zeropod.ctrox.dev/node: "true"
        serviceAccountName: zeropod-node
        tolerations:
        - operator: Exists
        volumes:
        - hostPath:
            path: /var/lib/rancher/k3s/agent/etc/containerd/
          name: containerd-etc
        - hostPath:
            path: /run/k3s/containerd
          name: containerd-run
        - hostPath:
            path: /var/lib/rancher/k3s/agent/containerd
          name: zeropod-opt
        - hostPath:
            path: /run/zeropod
          name: zeropod-run
        - hostPath:
            path: /run/systemd
          name: systemd-run
        - hostPath:
            path: /etc/criu
          name: criu-etc
        - hostPath:
            path: /proc
            type: Directory
          name: hostproc
        - hostPath:
            path: /sys/fs/bpf
            type: Directory
          name: bpf
    updateStrategy:
      rollingUpdate:
        maxSurge: 0
        maxUnavailable: 1
      type: RollingUpdate
@ctrox
Copy link
Owner

ctrox commented Sep 16, 2024

Hi, if the runtimeclass is not installed, I suspect the installer did not finish properly. Can you check the logs of the installer?

kubectl -n zeropod-system logs -l app.kubernetes.io/name=zeropod-node -c installer

If you have no zeropod-node pod, your node might simply be missing the required label: kubectl label node <your node> zeropod.ctrox.dev/node=true

@DragonHunter274
Copy link
Author

The weird thing is, the runtimeClass is installed,
the Installer log is:

2024/09/16 13:42:17 installed criu binaries
2024/09/16 13:42:17 installing runtime for containerd
2024/09/16 13:42:17 runtime already configured, no need to restart containerd
2024/09/16 13:42:17 installed runtime
2024/09/16 13:42:17 installed runtimeClass
2024/09/16 13:42:17 installer completed

kubectl get runtimeclass -A

NAME      HANDLER   AGE
zeropod   zeropod   31h

@ctrox
Copy link
Owner

ctrox commented Sep 18, 2024

Oh, then the problem is somewhere else, I think containerd does not know the zeropod runtime. This can happen if containerd got configured (as the log seems to confirm) but k3s/containerd was not restarted properly. Can you manually restart k3s agent/server on the affected node to see if that fixes the issue?

@DragonHunter274
Copy link
Author

I already tried that, it didn't help

@ctrox
Copy link
Owner

ctrox commented Sep 18, 2024

Can you post the containerd config?

cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml
# and the template
cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl

@DragonHunter274
Copy link
Author

Containerd config
# File generated by k3s. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/k3s/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = false
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true


[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "/var/lib/rancher/k3s/data/bf3548384eaabb3435bf08112f1b0cba1afc5add6a6f2f2372aa2906a598fd04/bin"
  conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"


[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true
config.toml.tmpl doesn't exist

@DragonHunter274
Copy link
Author

I forgot to configure -runtime=k3s, I just fixed that but it still doesn't create the .tmpl file

@DragonHunter274
Copy link
Author

another restart of k3s fixed it, the nginx example runs now but it never scales down

@DragonHunter274
Copy link
Author

Closing this, continuing the other issue in #29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants