regression with kernel 6.9: user containers hang forever in pasta setup with podman-5.0.0~rc6 #22052

martinpitt · 2024-03-15T05:34:43Z

Issue Description

Since around yesterday, the "cockpit-podman revdeps" test has started to fail, in e.g. #22047, #22048, #22050 (these three were landed red). This happens both with the "podman-next" copr, e.g. this or this run, and last night it also started on pure rawhide runs without the -next COPR, as the newer podman versions are
being pushed into rawhide

Steps to reproduce the issue

Install the latest podman on rawhide. It's not yet on the mirrors, so need to install from koji:

dnf update -y https://kojipkgs.fedoraproject.org//packages/podman/5.0.0~rc6/2.eln136/x86_64/podman-5.0.0~rc6-2.eln136.x86_64.rpm

As user:

podman run -it --rm quay.io/libpod/busybox

Describe the results you received

It hangs forever without any message, and doesn't give a shell. There is no journal message or even a resource consuming process. It doesn't react to Control-C or Control-, the only way is to kill -9 them.

With --log-level=debug:

DEBU[0000] Check for idmapped mounts support create mapped mount: operation not permitted 
DEBU[0000] Created container "4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126" 
DEBU[0000] Container "4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126" has work directory "/home/admin/.local/share/containers/storage/overlay-containers/4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126/userdata" 
DEBU[0000] Container "4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126" has run directory "/run/user/1000/containers/overlay-containers/4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126/userdata" 
DEBU[0000] Handling terminal attach                     
DEBU[0000] Enabling signal proxying                     
DEBU[0000] overlay: test mount indicated that volatile is being used 
DEBU[0000] overlay: mount_data=lowerdir=/home/admin/.local/share/containers/storage/overlay/l/BUS2BM2KZRRKHOPMUIJI6SDTFS,upperdir=/home/admin/.local/share/containers/storage/overlay/679a40b64d1bb81c6b90f36167af4d19b3b3e08e09376c9525184b67b6216da5/diff,workdir=/home/admin/.local/share/containers/storage/overlay/679a40b64d1bb81c6b90f36167af4d19b3b3e08e09376c9525184b67b6216da5/work,userxattr,volatile,context="system_u:object_r:container_file_t:s0:c41,c43" 
DEBU[0000] Mounted container "4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126" at "/home/admin/.local/share/containers/storage/overlay/679a40b64d1bb81c6b90f36167af4d19b3b3e08e09376c9525184b67b6216da5/merged" 
DEBU[0000] Created root filesystem for container 4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126 at /home/admin/.local/share/containers/storage/overlay/679a40b64d1bb81c6b90f36167af4d19b3b3e08e09376c9525184b67b6216da5/merged 
INFO[0000] Received shutdown.Stop(), terminating!        PID=1705
DEBU[0000] Made network namespace at /run/user/1000/netns/netns-96db5179-aa12-e4df-1672-bb9eab89f633 for container 4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126 
DEBU[0000] pasta arguments: --config-net -t none -u none -T none -U none --no-map-gw --dns none --quiet --netns /run/user/1000/netns/netns-96db5179-aa12-e4df-1672-bb9eab89f633

So that "Received Stop()" is certainly unusual.

After that, podman is completely blocked. In a separate terminal, podman ps also hangs.

Describe the results you expected

Well, it should work 😉

It does work fine with system containers.

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc41.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 97.91
    systemPercent: 1.1
    userPercent: 0.99
  cpus: 1
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: cloud
    version: "41"
  eventLogger: journald
  freeLocks: 2048
  hostname: fedora-rawhide-127-0-0-2-2201
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 582688768
  memTotal: 1133514752
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-2.fc41.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.4-1.fc41.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240220.g1e6f92b-1.fc41.x86_64
    version: |
      pasta 0^20240220.g1e6f92b-1.fc41.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 1133506560
  swapTotal: 1133506560
  uptime: 0h 14m 41.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/admin/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/admin/.local/share/containers/storage
  graphRootAllocated: 12802043904
  graphRootUsed: 1898729472
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/admin/.local/share/containers/storage/volumes
version:
  APIVersion: 5.0.0-rc6
  Built: 1710288000
  BuiltTime: Wed Mar 13 00:00:00 2024
  GitCommit: ""
  GoVersion: go1.21.3
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.0-rc6

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Fedora rawhide cloud image in a VM, with latest dnf update:

podman-5.0.0~rc6-2.eln136.x86_64
containers-common-0.58.0-1.fc41.noarch
passt-0^20240220.g1e6f92b-1.fc41.x86_64
kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64

Additional information

No response

The text was updated successfully, but these errors were encountered:

martinpitt · 2024-03-15T05:37:02Z

What worries me: This doesn't seem to turn up in any of "your" integration tests. Certainly at least some of them cover user podman and pasta? Do they set up some additional magic which circumvents this bug? That may give a clue what happened. Thanks!

Luap99 · 2024-03-15T09:31:50Z

Well I doubt that either podman or pasta changed. I cannot see anything related to that on the podman side and the passt version is still the same that we use in our CI. Of course I have no proof yet but if I would think it is far more likely some kernel bug given that the version shows kernel 6.9 (unreleased)

Luap99 · 2024-03-15T09:39:48Z

I reproduced with --network pasta and podman 5-rc1 on the new 6.9 kernel. Then I downgraded to an older 6.8 kernel and there it didn't hang, thus confirming we have a kernel bug that seems to cause pasta not to exit...
cc @sbrivio-rh

martinpitt · 2024-03-15T09:41:59Z

Confirmed. I tested it with kernel-core-6.8.0-63.fc41.x86_64 (i.e. dnf update -y but without a reboot), and it works with that.

sbrivio-rh · 2024-03-15T09:46:17Z

Checking. @martinpitt could you tell me the upstream SHA that kernel package is sourced from?

Luap99 · 2024-03-15T09:54:07Z

@sbrivio-rh I used a kernel from yesterday, commit seems to be 480e035fc4

Luap99 · 2024-03-15T10:04:33Z

running strace I see

9  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1102, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
1099  waitid(P_ALL, 0, NULL, WNOHANG|WEXITED, NULL) = 0
1099  waitid(P_ALL, 0, NULL, WNOHANG|WEXITED, NULL) = -1 ECHILD (No child processes)
1099  rt_sigreturn({mask=[]})           = 1102
1099  sendto(7, [{nlmsg_len=32, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=6, nlmsg_pid=0}, "\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00"], 32, 0, NULL, 0) = 32
1099  recvfrom(7, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=6, nlmsg_pid=1101}, {error=0, msg={nlmsg_len=32, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=6, nlmsg_pid=0}}], 65536, 0, NULL, NULL) = 36
1099  sendto(7, [{nlmsg_len=32, nlmsg_type=0x12 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=7, nlmsg_pid=0}, "\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"], 32, 0, NULL, 0) = 32
1099  recvfrom(7, [{nlmsg_len=1492, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=7, nlmsg_pid=1101}, "\x00\x00\x01\x00\x02\x00\x00\x00\x02\x10\x00\x00\x00\x00\x00\x00\x0b\x00\x03\x00\x65\x6e\x70\x31\x73\x30\x00\x00\x08\x00\x0d\x00"...], 65536, 0, NULL, NULL) = 1492
1099  recvfrom(7, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=7, nlmsg_pid=1101}, {error=0, msg={nlmsg_len=32, nlmsg_type=0x12 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=7, nlmsg_pid=0}}], 65536, 0, NULL, NULL) = 36
1099  sendto(7, [{nlmsg_len=40, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=8, nlmsg_pid=0}, "\x00\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x08\x00\x04\x00\xf0\xff\x00\x00"], 40, 0, NULL, 0) = 40
1099  recvfrom(7, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=8, nlmsg_pid=1101}, {error=0, msg={nlmsg_len=40, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=8, nlmsg_pid=0}}], 65536, 0, NULL, NULL) = 36
1099  sendto(5, [{nlmsg_len=24, nlmsg_type=RTM_GETADDR, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_DUMP, nlmsg_seq=9, nlmsg_pid=0}, {ifa_family=AF_INET, ifa_prefixlen=0, ifa_flags=0, ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=if_nametoindex("enp1s0")}], 24, 0, NULL, 0) = 24
1099  recvfrom(5, [[{nlmsg_len=88, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=9, nlmsg_pid=1099}, {ifa_family=AF_INET, ifa_prefixlen=24, ifa_flags=0, ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=if_nametoindex("enp1s0")}, [[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("192.168.122.92")], [{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("192.168.122.92")], [{nla_len=8, nla_type=IFA_BROADCAST}, inet_addr("192.168.122.255")], [{nla_len=11, nla_type=IFA_LABEL}, "enp1s0"], [{nla_len=8, nla_type=IFA_FLAGS}, 0], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=3564, ifa_valid=3564, cstamp=577, tstamp=577}]]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=9, nlmsg_pid=1099}, 0]], 65536, 0, NULL, NULL) = 108
1099  sendto(7, [{nlmsg_len=88, nlmsg_type=0x14 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_MULTI|NLM_F_ACK|0x400, nlmsg_seq=10, nlmsg_pid=0}, "\x02\x18\x00\x00\x02\x00\x00\x00\x08\x00\x01\x00\xc0\xa8\x7a\x5c\x08\x00\x02\x00\xc0\xa8\x7a\x5c\x08\x00\x04\x00\xc0\xa8\x7a\xff"...], 88, 0, NULL, 0) = 88
1099  recvfrom(7, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=10, nlmsg_pid=1101}, {error=0, msg={nlmsg_len=88, nlmsg_type=0x14 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_MULTI|NLM_F_ACK|0x400, nlmsg_seq=10, nlmsg_pid=0}}], 65536, 0, NULL, NULL) = 36
1099  sendto(5, [{nlmsg_len=36, nlmsg_type=RTM_GETROUTE, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_DUMP, nlmsg_seq=11, nlmsg_pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("enp1s0")]], 36, 0, NULL, 0) = 36
1099  recvfrom(5, [[{nlmsg_len=68, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=11, nlmsg_pid=1099}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_DHCP, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN], [{nla_len=8, nla_type=RTA_PRIORITY}, 100], [{nla_len=8, nla_type=RTA_PREFSRC}, inet_addr("192.168.122.92")], [{nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("192.168.122.1")], [{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("enp1s0")]]], [{nlmsg_len=68, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=11, nlmsg_pid=1099}, {rtm_family=AF_INET, rtm_dst_len=24, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_KERNEL, rtm_scope=RT_SCOPE_LINK, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN], [{nla_len=8, nla_type=RTA_DST}, inet_addr("192.168.122.0")], [{nla_len=8, nla_type=RTA_PRIORITY}, 100], [{nla_len=8, nla_type=RTA_PREFSRC}, inet_addr("192.168.122.92")], [{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("enp1s0")]]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=11, nlmsg_pid=1099}, 0]], 65536, 0, NULL, NULL) = 156
1099  recvfrom(5,  <unfinished ...>

and then nothing until I kill it with SIGINT, full log attached
log.txt

sbrivio-rh · 2024-03-15T10:10:59Z

Reproduced using a procfs entry as reference, pasta properly sets up the timer:

openat(AT_FDCWD, "/proc/3513/ns", O_RDONLY|O_CLOEXEC) = 16
fstatfs(16, {f_type=PROC_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x16, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0
timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC) = 17
timerfd_settime(17, 0, {it_interval={tv_sec=1, tv_nsec=0}, it_value={tv_sec=1, tv_nsec=0}}, NULL) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 17, {events=EPOLLIN, data={u32=4360, u64=68719481096}}) = 0

and then also hangs on a recvfrom() on a RTM_GETROUTE request, similar to your case. It's some netlink change that blocks it completely. If you pass address and gateway explicitly it works. Looking into netlink / routing table front-end changes...

sbrivio-rh · 2024-03-15T10:13:03Z

Current suspicion of mine is commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families").

martinpitt · 2024-03-15T10:13:48Z

@sbrivio-rh I suppose your question to me is obsolete, but kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64 has a git reference in it. I suppose that's the upstream one? Otherwise it's somewhere in https://src.fedoraproject.org/rpms/kernel/c/77aed3137518b5c493265a6c9fa1eec2a13ea6ca?branch=rawhide

Luap99 · 2024-03-15T10:18:06Z

@martinpitt yes b0546776ad3f is a upstream commit

sbrivio-rh · 2024-03-15T10:18:50Z

@sbrivio-rh I suppose your question to me is obsolete, but kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64 has a git reference in it. I suppose that's the upstream one?

Ah, sorry, it is! I was just using a net.git tree which didn't include b0546776ad3f yet. Switched to mainline torvalds/linux.git now.

sbrivio-rh · 2024-03-15T10:35:16Z

Another workaround for pasta: --no-copy-routes.

sbrivio-rh · 2024-03-15T11:29:13Z

Patch at: https://archives.passt.top/passt-dev/[email protected]/.

I think we should report this as a possible kernel regression, even if the new behaviour is correct and 87d381973e49 made its way to mainline through net-next, and I doubt it will be included in any -stable branch, so probably releasing a fixed version of pasta in the near future is enough.

I didn't bisect though.

sbrivio-rh · 2024-03-15T11:52:58Z

Reported here: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth/.

In any case, I'll prepare a new release in a bit. I'm not aware of any other distribution already shipping that commit in their latest kernel package.

Luap99 · 2024-03-18T11:19:28Z

The pasta fix is in passt-0^20240318.g615d370 (https://bodhi.fedoraproject.org/updates/FEDORA-2024-4b5b35a749)

Thus I am going to close this one.

dustymabe · 2024-03-18T15:40:15Z

I'll note here that we had some other non-pasta related tests that were failing in CoreOS: coreos/fedora-coreos-tracker#1693

Though, it does appear these tests do now also pass with https://bodhi.fedoraproject.org/updates/FEDORA-2024-4b5b35a749

Not sure why container building would have been affected, but figured I'd mention it anyway.

Luap99 · 2024-03-18T15:55:57Z

Not sure why container building would have been affected, but figured I'd mention it anyway.

Because pasta is the default for rootless networking now. So podman run, podman build, etc.. they all configure the netns with pasta (assuming rootless of course)

dustymabe · 2024-03-18T16:07:48Z

Makes sense. Thanks

martinpitt · 2024-03-18T16:11:14Z

That kernel fix for the original issue here seems to have landed. But very recent runs now fail differently:

$ podman run -d --name test-sh-user --stop-timeout 0 localhost/test-alpine sh
Error: pasta failed with exit code 1:
netlink: Unexpected sequence number (6 != 10)

I'll file this as a proper/separate issue tomorrow morning, just a little pre-warning in case that rings a bell, and to explain why rawhide runs are still red everywhere.

sbrivio-rh · 2024-03-18T17:31:39Z

$ podman run -d --name test-sh-user --stop-timeout 0 localhost/test-alpine sh
Error: pasta failed with exit code 1:
netlink: Unexpected sequence number (6 != 10)

@dgibson

sbrivio-rh · 2024-03-18T18:46:22Z

That kernel fix for the original issue here seems to have landed.

It wasn't actually a kernel fix -- the kernel's behaviour was in fact correct, even though different from what it has been in the past ~5 (at least?) years.

But very recent runs now fail differently:

$ podman run -d --name test-sh-user --stop-timeout 0 localhost/test-alpine sh
Error: pasta failed with exit code 1:
netlink: Unexpected sequence number (6 != 10)

It looks like either a kernel issue, or we fail to account for four messages from the kernel (most likely) after kernel commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families").

Some relevant information meanwhile: the kernel version was meanwhile bumped to 6.9.0-0.rc0.20240315gite5eb28f6d1af.7.fc41.x86_64, matching upstream (torvalds/linux.git) commit e5eb28f6d1af. Addresses and routes:

[   14.040569] cloud-init[877]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
[   14.042445] cloud-init[877]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[   14.044223] cloud-init[877]: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
[   14.045991] cloud-init[877]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[   14.047739] cloud-init[877]: ci-info: |  eth0  | True |        172.31.17.200        | 255.255.240.0 | global | 06:4d:a6:73:32:bf |
[   14.049466] cloud-init[877]: ci-info: |  eth0  | True | fe80::44d:a6ff:fe73:32bf/64 |       .       |  link  | 06:4d:a6:73:32:bf |
[   14.051195] cloud-init[877]: ci-info: |   lo   | True |          127.0.0.1          |   255.0.0.0   |  host  |         .         |
[   14.052960] cloud-init[877]: ci-info: |   lo   | True |           ::1/128           |       .       |  host  |         .         |
[   14.054681] cloud-init[877]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[   14.056432] cloud-init[877]: ci-info: +++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++
[   14.057942] cloud-init[877]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   14.059449] cloud-init[877]: ci-info: | Route | Destination |   Gateway   |    Genmask    | Interface | Flags |
[   14.060985] cloud-init[877]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   14.062503] cloud-init[877]: ci-info: |   0   |   0.0.0.0   | 172.31.16.1 |    0.0.0.0    |    eth0   |   UG  |
[   14.064027] cloud-init[877]: ci-info: |   1   | 172.31.16.0 |   0.0.0.0   | 255.255.240.0 |    eth0   |   U   |
[   14.065565] cloud-init[877]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   14.067085] cloud-init[877]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   14.068352] cloud-init[877]: ci-info: +-------+-------------+---------+-----------+-------+
[   14.069670] cloud-init[877]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   14.070922] cloud-init[877]: ci-info: +-------+-------------+---------+-----------+-------+
[   14.072215] cloud-init[877]: ci-info: |   0   |  fe80::/64  |    ::   |    eth0   |   U   |
[   14.073488] cloud-init[877]: ci-info: |   2   |  multicast  |    ::   |    eth0   |   U   |
[   14.074849] cloud-init[877]: ci-info: +-------+-------------+---------+-----------+-------+

dgibson · 2024-03-19T00:52:47Z

@martinpitt I've installed the rawhide kernel and have reproduced... well, not exactly the problem you describe, but something wrong in the netlink handling. I'm debugging now.

Luap99 · 2024-03-19T13:26:52Z

FYI I just saw another user run into this on a fedora bz: https://bugzilla.redhat.com/show_bug.cgi?id=2270257#c9

He confirmed that the new passt version from today solves the issue for him.

martinpitt · 2024-03-19T13:38:01Z

The rawhide log came back, and bad news: It did install the latest passt-0^20240319.gd35bcbe-1.fc41.x86_64, but it still has some failures with "netlink: Unexpected sequence number (6 != 10)". However, that bugzilla also mentions "with podman 5.0.0~~rc7", and that run only saw 5.0.0~~rc6-2 . I'll try to update that as well. it also runs a previous kernel, but as I can't just reboot a TF instance while the test is running, I can't pull in that.

Validate containers/podman#22052

Luap99 · 2024-03-19T13:41:27Z

The podman version rc 6 or 7 should not make a difference.
Note the bz was reported against f40, not rawhide with the 6.9 kernel so that could also make a difference.

Validate containers/podman#22052

martinpitt · 2024-03-19T14:55:12Z

"fun" -- I changed cockpit-project/cockpit-podman#1636 to also install the latest podman and passt koji builds for F40, and now both F40 and rawhide fail on the "Unexpected sequence number (6 != 10)" issue.

dgibson · 2024-03-20T02:21:39Z

Drat the Unexpected sequence number error seems to be independent of the other one, and still present in the latest build. Unfortunately neither @sbrivio-rh nor I have any good theories on the cause so far. Nor have we been able to reproduce it, or even have good theories on what the conditions might be that trigger it.

@martinpitt @Luap99 have either of you seen this outside of the difficult to adjust CI context? At present my next step is to ask for some additional logging from the reporter of the BZ.

dgibson · 2024-03-20T02:32:58Z

I filed pasta upstream bug 83 to track the Unexpected sequence number problem.

dgibson · 2024-03-20T05:42:11Z

I now have a draft fix for upstream bug 83, here. I'm talking to @sbrivio-rh about a release.

martinpitt · 2024-03-20T06:04:47Z

Meh, I first tried to reproduce this with "tmt" itself, but that is annoyingly difficult. I then tried to boot an F40 cloud image out of thin air with just curl and QEMU:

Run F40 cloud VM:

curl -o fedora.qcow2 -L https://download.fedoraproject.org/pub/fedora/linux/development/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-20240319.n.0.qcow2
# nothing fancy, just admin:foobar and root:foobar
curl -L -O https://github.com/cockpit-project/bots/raw/main/machine/cloud-init.iso
qemu-system-x86_64 -cpu host -enable-kvm -nographic -m 2048 -drive file=fedora.qcow2,if=virtio -snapshot -cdrom cloud-init.iso -net nic,model=virtio -net user,hostfwd=tcp::2201-:22

Log in with SSH (more comfortable than the console), password "foobar":

ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o CheckHostIP=no  -p 2201 admin@localhost

inside the VM:

sudo dnf install --enablerepo=updates-testing podman
podman run -it --rm quay.io/libpod/busybox

But that again works fine. But it seems @dgibson is ahead of that anyway -- can you reproduce it now?

If not, the Testing Farm has a cli for "reserve" , which you can use to get an interactive real TF instance. It does require some initial approval by @thrix though.

dgibson · 2024-03-20T09:05:09Z

I have reproduced something that looks very similar, so I think so. To make it work, I've had to create a second network interface, which doesn't appear to be the case in the other cases we've seen. My working theory is that in those other cases something has created an additional interface after the ip addr etc. information was dumped.

sbrivio-rh · 2024-03-20T10:53:04Z

I now have a draft fix for upstream bug 83, here. I'm talking to @sbrivio-rh about a release.

Wednesday's release coming.

sbrivio-rh · 2024-03-20T11:18:29Z

There you go, passt-0^20240320.g71dd405-1.fc41.

Validate containers/podman#22052

martinpitt · 2024-03-20T13:29:01Z

I updated the test PR to that version. As expected, f40 still failed the same way, as there's no new passt there. But the rawhide run PASSED 💯 🥳 🎉

Thanks @dgibson and @sbrivio-rh ! Can you please release this to F40, too?

sbrivio-rh · 2024-03-20T15:25:07Z

Sure, here's passt-0^20240320.g71dd405-1.fc40. Thanks for testing and following up with this!

Validate containers/podman#22052

Luap99 · 2024-03-20T15:38:27Z

Thanks all, I am going to close given it is fixed in the latest release

martinpitt · 2024-03-20T16:29:47Z

Updated cockpit-project/cockpit-podman#1636 to that F40 update and it's squeaky green now 💚 Thanks all!

edsantiago · 2024-03-20T19:29:59Z

Pasta 03-20 is now in CI VMs, PR #22082

Since f919dc7 ("conf, netlink: Don't require a default route to start"), if there is only one host interface with routes, we will pick that as the template interface, even if there are no default routes for an IP version. Unfortunately this selection had a serious flaw: in some cases it would 'return' in the middle of an nl_foreach() loop, meaning we wouldn't consume all the netlink responses for our query. This could cause later netlink operations to fail as we read leftover responses from the aborted query. Rewrite the interface detection to avoid this problem. While we're there: * Perform detection of both default and non-default routes in a single pass, avoiding an ugly goto * Give more detail on error and working but unusual paths about the situation (no suitable interface, multiple possible candidates, etc.). Fixes: f919dc7 ("conf, netlink: Don't require a default route to start") Link: https://bugs.passt.top/show_bug.cgi?id=83 Link: containers/podman#22052 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257 Signed-off-by: David Gibson <[email protected]> [sbrivio: Use info(), not warn() for somewhat expected cases where one IP version has no default routes, or no routes at all] Signed-off-by: Stefano Brivio <[email protected]>

A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families") changed netlink behaviour so that the NLMSG_DONE terminating a bunch of responses can go in the same datagram as those responses, rather than in a separate one. Our netlink code is supposed to handle that behaviour, and indeed does so for most cases, using the nl_foreach() macro. However, there was a subtle error in nl_route_dup() which doesn't work with this change. f00b153 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own subtle error. The problem arises because nl_route_dup(), unlike other cases doesn't just make a single pass through all the responses to a netlink request. It needs to get all the routes, then make multiple passes through them. We don't really have anywhere to buffer multiple datagrams, so we only support the case where all the routes fit in a single datagram - but we need to fail gracefully when that's not the case. After receiving the first datagram of responses (with nl_next()) we have a first loop scanning them. It needs to exit when either we run out of messages in the datagram (!NLMSG_OK()) or when we get a message indicating the last response (nl_status() <= 0). What we do after the loop depends on which exit case we had. If we saw the last response, we're done, but otherwise we need to receive more datagrams to discard the rest of the responses. We attempt to check for that second case by re-checking NLMSG_OK(nh, status). However in the got-last-response case, we've altered status from the number of remaining bytes to the error code (usually 0). That means NLMSG_OK() now returns false even if it didn't during the loop check. To fix this we need separate variables for the number of bytes left and the final status code. We also checked status after the loop, but this was redundant: we can only exit the loop with NLMSG_OK() == true if status <= 0. Reported-by: Martin Pitt <[email protected]> Fixes: f00b153 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") Fixes: 4d6e9d0 ("netlink: Always process all responses to a netlink request") Link: containers/podman#22052 Signed-off-by: David Gibson <[email protected]> Signed-off-by: Stefano Brivio <[email protected]>

martinpitt added the kind/bug Categorizes issue or PR as related to a bug. label Mar 15, 2024

martinpitt mentioned this issue Mar 15, 2024

[v5.0] podman machine start: make checkExclusiveActiveVM check race free #22047

Merged

Luap99 added the pasta pasta(1) bugs or features label Mar 15, 2024

martinpitt changed the title ~~user containers hang forever in pasta setup with podman-5.0.0~rc6, lock up all other commands~~ regression with kernel 6.9: user containers hang forever in pasta setup with podman-5.0.0~rc6 Mar 15, 2024

dustymabe mentioned this issue Mar 15, 2024

[rawhide] kernel 6.9 regressions causing podman test failures coreos/fedora-coreos-tracker#1693

Closed

Luap99 closed this as completed Mar 18, 2024

martinpitt mentioned this issue Mar 18, 2024

test/browser: run tests inside the tasks container cockpit-project/cockpit-podman#1628

Merged

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 19, 2024

TEST: New passt/podman version

bfe414c

Validate containers/podman#22052

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 19, 2024

TEST: New passt/podman version

5877662

Validate containers/podman#22052

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 19, 2024

TEST: New passt/podman version

980e3b9

Validate containers/podman#22052

martinpitt mentioned this issue Mar 20, 2024

The virtual provision does not work with fedora-40 teemtee/tmt#2771

Closed

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 20, 2024

TEST: New passt/podman version

2149c8d

Validate containers/podman#22052

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 20, 2024

TEST: New passt/podman version

52a37ae

Validate containers/podman#22052

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 20, 2024

TEST: New passt/podman version

399f12d

Validate containers/podman#22052

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 20, 2024

TEST: New passt/podman version

33852da

Validate containers/podman#22052

Luap99 closed this as completed Mar 20, 2024

bc-lee mentioned this issue Mar 21, 2024

Podman 5.0.0 fails to build image on rootful machine #22109

Closed

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jun 19, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Jun 19, 2024

regression with kernel 6.9: user containers hang forever in pasta setup with podman-5.0.0~rc6 #22052

regression with kernel 6.9: user containers hang forever in pasta setup with podman-5.0.0~rc6 #22052

Comments

martinpitt commented Mar 15, 2024 • edited Loading

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

martinpitt commented Mar 15, 2024

Luap99 commented Mar 15, 2024

Luap99 commented Mar 15, 2024

martinpitt commented Mar 15, 2024

sbrivio-rh commented Mar 15, 2024

Luap99 commented Mar 15, 2024

Luap99 commented Mar 15, 2024

sbrivio-rh commented Mar 15, 2024

sbrivio-rh commented Mar 15, 2024

martinpitt commented Mar 15, 2024

Luap99 commented Mar 15, 2024

sbrivio-rh commented Mar 15, 2024

sbrivio-rh commented Mar 15, 2024

sbrivio-rh commented Mar 15, 2024

sbrivio-rh commented Mar 15, 2024

Luap99 commented Mar 18, 2024

dustymabe commented Mar 18, 2024

Luap99 commented Mar 18, 2024

dustymabe commented Mar 18, 2024

martinpitt commented Mar 18, 2024

sbrivio-rh commented Mar 18, 2024

sbrivio-rh commented Mar 18, 2024

dgibson commented Mar 19, 2024

Luap99 commented Mar 19, 2024

martinpitt commented Mar 19, 2024

Luap99 commented Mar 19, 2024

martinpitt commented Mar 19, 2024

dgibson commented Mar 20, 2024

dgibson commented Mar 20, 2024

dgibson commented Mar 20, 2024 • edited Loading

martinpitt commented Mar 20, 2024

dgibson commented Mar 20, 2024

sbrivio-rh commented Mar 20, 2024

sbrivio-rh commented Mar 20, 2024

martinpitt commented Mar 20, 2024

sbrivio-rh commented Mar 20, 2024

Luap99 commented Mar 20, 2024

martinpitt commented Mar 20, 2024

edsantiago commented Mar 20, 2024

martinpitt commented Mar 15, 2024 •

edited

Loading

dgibson commented Mar 20, 2024 •

edited

Loading