Driver crashes unexpectedly with `Failed to read /host/proc/mounts` requiring pod restart #284

dienhartd · 2024-11-04T19:26:09Z

/kind bug

NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report

What happened?
Periodically without warning one of my s3 mountpoint driver pods will crash with GRPC errors until I delete it. It will usually cause a dependent pod to fail to start. The replacement immediately after this pod's deletion works fine, but requires manual intervention after noticing dependent pod crashes due to missing pv.

What you expected to happen?
Error not to occur.

How to reproduce it (as minimally and precisely as possible)?
Unclear.

Anything else we need to know?:
Logs

I1104 11:59:40.249998       1 credential.go:95] NodePublishVolume: Using driver identity
I1104 11:59:40.250015       1 node.go:146] NodePublishVolume: mounting d-cluster at /var/lib/kubelet/pods/97e71fea-b356-4d87-a086-5f06fe651ea7/volumes/kubernetes.io~csi/s3-pv/mount with options [--allow-delete --allow-other --gid=100 --uid=1000]
E1104 11:59:40.250106       1 mount.go:214] Failed to read /host/proc/mounts on try 1: open /host/proc/mounts: invalid argument
E1104 11:59:40.250106       1 mount.go:214] Failed to read /host/proc/mounts on try 1: open /host/proc/mounts: invalid argument
E1104 11:59:40.250106       1 mount.go:214] Failed to read /host/proc/mounts on try 1: open /host/proc/mounts: invalid argument
E1104 11:59:40.250106       1 mount.go:214] Failed to read /host/proc/mounts on try 1: open /host/proc/mounts: invalid argument
E1104 11:59:40.350345       1 mount.go:214] Failed to read /host/proc/mounts on try 2: open /host/proc/mounts: invalid argument
E1104 11:59:40.350345       1 mount.go:214] Failed to read /host/proc/mounts on try 2: open /host/proc/mounts: invalid argument
E1104 11:59:40.350345       1 mount.go:214] Failed to read /host/proc/mounts on
try 2: open /host/proc/mounts: invalid argument
E1104 11:59:40.350345       1 mount.go:214] Failed to read /host/proc/mounts on try 2: open /host/proc/mounts: invalid argument
E1104 11:59:40.450642       1 mount.go:214] Failed to read /host/proc/mounts on try 3: open /host/proc/mounts: invalid argument
E1104 11:59:40.450642       1 mount.go:214] Failed to read /host/proc/mounts on try 3: open /host/proc/mounts: invalid argument
E1104 11:59:40.450642       1 mount.go:214] Failed to read /host/proc/mounts on try 3: open /host/proc/mounts: invalid argument
E1104 11:59:40.450642       1 mount.go:214] Failed to read /host/proc/mounts on try 3: open /host/proc/mounts: invalid argument
E1104 11:59:40.550806       1 driver.go:136] GRPC error: rpc error: code = Internal desc = Could not mount "d-cluster" at "/var/lib/kubelet/pods/97e71fea-b35
6-4d87-a086-5f06fe651ea7/volumes/kubernetes.io~csi/s3-pv/mount": Could not check if "/var/lib/kubelet/pods/97e71fea-b356-4d87-a086-5f06fe651ea7/volumes/kubernetes.io~csi/s3-pv/mount" is a mount point: stat /var/lib/kubelet/pods/97e71fea-b356-4d87-a086-5f06fe651ea7/volumes/kubernetes.io~csi/s3-pv/mount: no such file or directory, Failed to read /host/proc/mounts after 3 tries: open /host/proc/mounts: invalid argument

Environment

Kubernetes version (use kubectl version):
Client Version: v1.31.1
Server Version: v1.30.5-eks-ce1d5eb
Driver version: v1.9.0
Installation of s3 mountpoint driver is through eksctl, i.e. eksctl create addon aws-mountpoint-s3-csi-driver

Was directed by @muddyfish to file this issue here: #174 (comment)

The text was updated successfully, but these errors were encountered:

dannycjones · 2024-11-06T15:27:40Z

Thanks for opening the bug report, @dienhartd. We'll investigate further.

Would you be able to review dmesg on the host and see if there are any error messages at the time of the issue, and share them if so? In particular, any error messages related to opening of /host/proc/mounts would be of interest.

dannycjones · 2024-11-06T16:00:20Z

Please can you let us know what operating system you're running on the cluster nodes too!

John-Funcity · 2024-11-25T11:10:47Z

Please can you let us know what operating system you're running on the cluster nodes too!

i have the same problem, i was runing on amazon linux 2

dannycjones · 2024-11-25T11:13:12Z

Please can you let us know what operating system you're running on the cluster nodes too!

i have the same problem, i was runing on amazon linux 2

Thanks for sharing, @John-Funcity. Please can you open a new issue so we can get logs relevant to your problem, and also include information such as the dmesg logs as mentioned in #284 (comment).

John-Funcity · 2024-11-25T15:49:58Z

Please can you let us know what operating system you're running on the cluster nodes too!

i have the same problem, i was runing on amazon linux 2

Thanks for sharing, @John-Funcity. Please can you open a new issue so we can get logs relevant to your problem, and also include information such as the dmesg logs as mentioned in #284 (comment).

  2.280531] systemd[1]: systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
[    2.291650] systemd[1]: Detected virtualization amazon.
[    2.295150] systemd[1]: Detected architecture x86-64.
[    2.298554] systemd[1]: Running in initial RAM disk.
[    2.302928] systemd[1]: No hostname configured.
[    2.306128] systemd[1]: Set hostname to <localhost>.
[    2.309546] systemd[1]: Initializing machine ID from VM UUID.
[    2.336041] systemd[1]: Reached target Local File Systems.
[    2.340338] systemd[1]: Reached target Swap.
[    2.344257] systemd[1]: Created slice Root Slice.
[    2.497890] XFS (nvme0n1p1): Mounting V5 Filesystem
[    2.666828] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    3.033970] XFS (nvme0n1p1): Ending clean mount
[    3.253141] systemd-journald[863]: Received SIGTERM from PID 1 (systemd).
[    3.309998] printk: systemd: 18 output lines suppressed due to ratelimiting
[    3.537461] SELinux:  Runtime disable is deprecated, use selinux=0 on the kernel cmdline.
[    3.543529] SELinux:  Disabled at runtime.
[    3.610275] audit: type=1404 audit(1732528464.939:2): enforcing=0 old_enforcing=0 auid=4294967

John-Funcity · 2024-11-25T15:51:10Z

muddyfish · 2024-11-25T17:11:47Z

Thanks @John-Funcity for the information, but could you please open a new issue so we're able to root cause the issues separately from this one. Please include the dmsg logs and other logs following the logging guide: https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/docs/LOGGING.md

John-Funcity · 2024-11-30T08:12:55Z

Maybe this problem?
https://karpenter.sh/v1.0/troubleshooting/

John-Funcity · 2024-11-30T08:13:59Z

MountVolume.SetUp failed for volume "s3-models-pv" : rpc error: code = Internal desc = Could not mount "xxxx-models-test" at "/var/lib/kubelet/pods/xxxxxxxxx/volumes/kubernetes.io~~csi/s3-models-pv/mount": Could not check if "/var/lib/kubelet/pods/xxxxxxxx/volumes/kubernetes.io~~csi/s3-models-pv/mount" is a mount point: stat /var/lib/kubelet/pods/xxxxxxxxx/volumes/kubernetes.io~csi/s3-models-pv/mount: no such file or directory, Failed to read /host/proc/mounts after 3 tries: open /host/proc/mounts: invalid argument

dannycjones added the bug Something isn't working label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Driver crashes unexpectedly with `Failed to read /host/proc/mounts` requiring pod restart #284

Driver crashes unexpectedly with `Failed to read /host/proc/mounts` requiring pod restart #284

dienhartd commented Nov 4, 2024

dannycjones commented Nov 6, 2024 •

edited

Loading

dannycjones commented Nov 6, 2024

John-Funcity commented Nov 25, 2024

dannycjones commented Nov 25, 2024

John-Funcity commented Nov 25, 2024

John-Funcity commented Nov 25, 2024

muddyfish commented Nov 25, 2024

John-Funcity commented Nov 30, 2024

John-Funcity commented Nov 30, 2024

Driver crashes unexpectedly with Failed to read /host/proc/mounts requiring pod restart #284

Driver crashes unexpectedly with Failed to read /host/proc/mounts requiring pod restart #284

Comments

dienhartd commented Nov 4, 2024

dannycjones commented Nov 6, 2024 • edited Loading

dannycjones commented Nov 6, 2024

John-Funcity commented Nov 25, 2024

dannycjones commented Nov 25, 2024

John-Funcity commented Nov 25, 2024

John-Funcity commented Nov 25, 2024

muddyfish commented Nov 25, 2024

John-Funcity commented Nov 30, 2024

John-Funcity commented Nov 30, 2024

Driver crashes unexpectedly with `Failed to read /host/proc/mounts` requiring pod restart #284

Driver crashes unexpectedly with `Failed to read /host/proc/mounts` requiring pod restart #284

dannycjones commented Nov 6, 2024 •

edited

Loading