-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executable file /catatonit
[et al] not found in $PATH: No such file or directory
#17042
Comments
New symptom, but I think it's likely to be the same root cause:
Worse, though, this is one of those "leave-everything-hosed" issues. Subsequent pod rm fails:
...then more:
...and then (almost) all subsequent tests fail. |
I got into this while trying to fix slow ZFS performace. I just installed fresh Fedora root on ZFS and got slow MySQL, podman build (COPY) and this one issue now. Still have no idea where it comes from, but i will leave this trail there while i'm there. First time i run Play, it went OK. Now it fails. |
Someone please pay attention here. See also #17216.
|
Another one, f37 remote rootless:
All subsequent tests fail. @containers/podman-maintainers pretty please TAL |
Another one:
I just created a Jira issue for it, in hopes that it will get some attention. Unfortunately Jira disappears issues as soon as they're created, so I have no way of actually linking to it, sorry. |
@edsantiago you can add the created issue to the current sprint. Then it'll pop up on the board. |
@vrothberg what happened is, I created a Jira issue, there was a brief popup saying "Successfully created issue", and it disappeared, and now I don't know how to find it. |
@edsantiago click on "Issues" in the top bar. Then select "Reported by me". The issue you've created should show up there. |
Thank you! Issue is: https://issues.redhat.com/browse/RUN-1773 (Sorry, I can find no way to add it to a sprint.) |
I moved it. Click on "Edit" then select the "Prioritization" field. Not intuitive but at some point I could recall. |
A friendly reminder that this issue had no activity for 30 days. |
@edsantiago still an issue? |
Yes, just today, f37 remote rootless, in a kube test:
|
Two yesterday, in the same CI run:
|
OK, this is not only @giuseppe could you take at a look at https://api.cirrus-ci.com/v1/artifact/task/4732541031153664/html/int-podman-fedora-37-rootless-host-boltdb.log.html#t--Aardvark-Test-2--Two-containers--same-subnet--1? |
Plot twist: it just happened in debian with runc:
Not only that, but we go into everything-hosed territory (#17216) after that failure. |
Maybe it's just another symptom of the "everything's hosed" issue? |
It seems to relate to |
this error usually happens when there is no storage mounted at that path and the |
Another one today, f37 rootless. So far we have this happening on debian, f36, f37. Root and rootless. Podman and podman-remote. boltdb and sqlite. |
This continues to happen. Yesterday I saw an interesting variation in f37 remote aarch64 root:
From there, of course, it goes into the unlinkat/EBUSY/hosed pit of despair. This is a different symptom -- the typical one is "executable not found" -- but this is too close to be coincidence. To me, it all points to a problem unpacking or mounting images. And this is starting to scare me because the behavior seems to manifest as "random files/directories missing from container", which in turn can lead to countless failure possibilities. |
/catatonit
not found in $PATH: No such file or directory/catatonit
[et al] not found in $PATH: No such file or directory
Updated title. Removed 'crun', because this happens with runc too. I'm pretty sure it has nothing to do with the runtime, or if it does, it's maybe a race between unpacking image and starting runtime. |
I am treating this as a weird variation of this bug:
|
Here's another variation, f38 rootless, where a subsequent
|
If these are only rootless then it could be the same problem (killed pause process). Basically the exec ends up in a different mountns so it cannot see the overlay container mount thus the mountpoint is empty and the exec fails. But I don't really see the connection the the other errors in the system test as these things shouldn't happen there. |
The overall issue is not rootless; here's the list of incidents as of right now. Please, huge grain of salt, because some of the logs below are different manifestations that I believe are the same issue, but I can easily be wrong.
|
I'm seeing a strong correlation between this bug and the unlinkat-ebusy one (#17216). |
Another one, debian root remote, and now this one is followed by unmounting/EINVAL (#18831) instead of unlinkat/EBUSY. |
Here's a completely baffling one:
I can't tell if the spurious message is coming from |
I have encounter something similar when trying a docker-compose with a podman 4.6.1 installed via podman desktop. Check out this issue-3560 in podman desktop repo pls. |
Another one (remote f39beta root) where "crun can't find touch", then everything gets hosed:
Then, after that, all tests fail with:
This looks a LOT like #18831 and all of its everything-is-hosed variations, but here the strings "umount" or "unmount" do not appear in the error logs, so, keeping separate. Anyhow, @giuseppe, maybe there's a connection? Here are the recent ones:
Seen in: sys podman/remote fedora-37/fedora-39? root/rootless host boltdb/sqlite |
Move the execution of RecordWrite() before the graphDriver Cleanup(). This addresses a longstanding issue that occurs when the Podman cleanup process is forcely terminated and on some occasions the termination happens after the Cleanup() but before the change is recorded. This causes that the next user is not notified about the change and will mount the container without the home directory below (the infamous /var/lib/containers/storage/overlay mount). Then when the next time the graphDriver is initialized, the home directory is mounted on top of the existing mounts causing some containers to fail with ENOENT since all files are hidden and some others cannot be cleaned up since their mount directory is covered by the home directory mount. Closes: containers/podman#18831 Closes: containers/podman#17216 Closes: containers/podman#17042 Signed-off-by: Giuseppe Scrivano <[email protected]>
Not fixed. Flaked in my PR with c/storage vendored in. remote f39 root:
|
yeah, this is a different flake. I think it is fixed with #20299 Without the PR, I could reproduce the flake with the following script (it would reproduce in around 5 minutes on a CI machine): #!/bin/sh
set -xeuo pipefail
PODMAN=bin/podman-remote
filename="/dev/shm/foo123"
$PODMAN create --init-ctr always --net host --pod new:foobar fedora /bin/sh -c "date +%T.%N > $filename"
$PODMAN create --name verify --net host --pod foobar -t alpine top
$PODMAN pod start foobar
$PODMAN pod stop foobar
$PODMAN pod start foobar
$PODMAN rm -fa
$PODMAN pod rm -fa and run it in a loop. It is already running for longer than an hour with the patch |
when running as a service, the c.state.Mounted flag could get out of sync if the container is cleaned up through the cleanup process. To avoid this, always check if the mountpoint is really present before skipping the mount. [NO NEW TESTS NEEDED] Closes: containers#17042 Signed-off-by: Giuseppe Scrivano <[email protected]>
Issue Description
Weird new flake, seen twice so far, both times in the same test:
The text was updated successfully, but these errors were encountered: