Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression with kernel 6.9: user containers hang forever in pasta setup with podman-5.0.0~rc6 #22052

Closed
martinpitt opened this issue Mar 15, 2024 · 50 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. pasta pasta(1) bugs or features

Comments

@martinpitt
Copy link
Contributor

martinpitt commented Mar 15, 2024

Issue Description

Since around yesterday, the "cockpit-podman revdeps" test has started to fail, in e.g. #22047, #22048, #22050 (these three were landed red). This happens both with the "podman-next" copr, e.g. this or this run, and last night it also started on pure rawhide runs without the -next COPR, as the newer podman versions are
being pushed into rawhide

Steps to reproduce the issue

Install the latest podman on rawhide. It's not yet on the mirrors, so need to install from koji:

dnf update -y https://kojipkgs.fedoraproject.org//packages/podman/5.0.0~rc6/2.eln136/x86_64/podman-5.0.0~rc6-2.eln136.x86_64.rpm

As user:

podman run -it --rm quay.io/libpod/busybox

Describe the results you received

It hangs forever without any message, and doesn't give a shell. There is no journal message or even a resource consuming process. It doesn't react to Control-C or Control-, the only way is to kill -9 them.

With --log-level=debug:

DEBU[0000] Check for idmapped mounts support create mapped mount: operation not permitted 
DEBU[0000] Created container "4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126" 
DEBU[0000] Container "4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126" has work directory "/home/admin/.local/share/containers/storage/overlay-containers/4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126/userdata" 
DEBU[0000] Container "4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126" has run directory "/run/user/1000/containers/overlay-containers/4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126/userdata" 
DEBU[0000] Handling terminal attach                     
DEBU[0000] Enabling signal proxying                     
DEBU[0000] overlay: test mount indicated that volatile is being used 
DEBU[0000] overlay: mount_data=lowerdir=/home/admin/.local/share/containers/storage/overlay/l/BUS2BM2KZRRKHOPMUIJI6SDTFS,upperdir=/home/admin/.local/share/containers/storage/overlay/679a40b64d1bb81c6b90f36167af4d19b3b3e08e09376c9525184b67b6216da5/diff,workdir=/home/admin/.local/share/containers/storage/overlay/679a40b64d1bb81c6b90f36167af4d19b3b3e08e09376c9525184b67b6216da5/work,userxattr,volatile,context="system_u:object_r:container_file_t:s0:c41,c43" 
DEBU[0000] Mounted container "4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126" at "/home/admin/.local/share/containers/storage/overlay/679a40b64d1bb81c6b90f36167af4d19b3b3e08e09376c9525184b67b6216da5/merged" 
DEBU[0000] Created root filesystem for container 4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126 at /home/admin/.local/share/containers/storage/overlay/679a40b64d1bb81c6b90f36167af4d19b3b3e08e09376c9525184b67b6216da5/merged 
INFO[0000] Received shutdown.Stop(), terminating!        PID=1705
DEBU[0000] Made network namespace at /run/user/1000/netns/netns-96db5179-aa12-e4df-1672-bb9eab89f633 for container 4855a61b527c337ce0f5fea32a46a74a1b24c562d4f388f88c6e60c26125e126 
DEBU[0000] pasta arguments: --config-net -t none -u none -T none -U none --no-map-gw --dns none --quiet --netns /run/user/1000/netns/netns-96db5179-aa12-e4df-1672-bb9eab89f633 

So that "Received Stop()" is certainly unusual.

After that, podman is completely blocked. In a separate terminal, podman ps also hangs.

Describe the results you expected

Well, it should work 😉

It does work fine with system containers.

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc41.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 97.91
    systemPercent: 1.1
    userPercent: 0.99
  cpus: 1
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: cloud
    version: "41"
  eventLogger: journald
  freeLocks: 2048
  hostname: fedora-rawhide-127-0-0-2-2201
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 582688768
  memTotal: 1133514752
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-2.fc41.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.4-1.fc41.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240220.g1e6f92b-1.fc41.x86_64
    version: |
      pasta 0^20240220.g1e6f92b-1.fc41.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 1133506560
  swapTotal: 1133506560
  uptime: 0h 14m 41.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/admin/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/admin/.local/share/containers/storage
  graphRootAllocated: 12802043904
  graphRootUsed: 1898729472
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/admin/.local/share/containers/storage/volumes
version:
  APIVersion: 5.0.0-rc6
  Built: 1710288000
  BuiltTime: Wed Mar 13 00:00:00 2024
  GitCommit: ""
  GoVersion: go1.21.3
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.0-rc6

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Fedora rawhide cloud image in a VM, with latest dnf update:

podman-5.0.0~rc6-2.eln136.x86_64
containers-common-0.58.0-1.fc41.noarch
passt-0^20240220.g1e6f92b-1.fc41.x86_64
kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64

Additional information

No response

@martinpitt
Copy link
Contributor Author

What worries me: This doesn't seem to turn up in any of "your" integration tests. Certainly at least some of them cover user podman and pasta? Do they set up some additional magic which circumvents this bug? That may give a clue what happened. Thanks!

@Luap99
Copy link
Member

Luap99 commented Mar 15, 2024

Well I doubt that either podman or pasta changed. I cannot see anything related to that on the podman side and the passt version is still the same that we use in our CI. Of course I have no proof yet but if I would think it is far more likely some kernel bug given that the version shows kernel 6.9 (unreleased)

@Luap99 Luap99 added the pasta pasta(1) bugs or features label Mar 15, 2024
@Luap99
Copy link
Member

Luap99 commented Mar 15, 2024

I reproduced with --network pasta and podman 5-rc1 on the new 6.9 kernel. Then I downgraded to an older 6.8 kernel and there it didn't hang, thus confirming we have a kernel bug that seems to cause pasta not to exit...
cc @sbrivio-rh

@martinpitt
Copy link
Contributor Author

Confirmed. I tested it with kernel-core-6.8.0-63.fc41.x86_64 (i.e. dnf update -y but without a reboot), and it works with that.

@martinpitt martinpitt changed the title user containers hang forever in pasta setup with podman-5.0.0~rc6, lock up all other commands regression with kernel 6.9: user containers hang forever in pasta setup with podman-5.0.0~rc6 Mar 15, 2024
@sbrivio-rh
Copy link
Collaborator

Checking. @martinpitt could you tell me the upstream SHA that kernel package is sourced from?

@Luap99
Copy link
Member

Luap99 commented Mar 15, 2024

@sbrivio-rh I used a kernel from yesterday, commit seems to be 480e035fc4

@Luap99
Copy link
Member

Luap99 commented Mar 15, 2024

running strace I see

9  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1102, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
1099  waitid(P_ALL, 0, NULL, WNOHANG|WEXITED, NULL) = 0
1099  waitid(P_ALL, 0, NULL, WNOHANG|WEXITED, NULL) = -1 ECHILD (No child processes)
1099  rt_sigreturn({mask=[]})           = 1102
1099  sendto(7, [{nlmsg_len=32, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=6, nlmsg_pid=0}, "\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00"], 32, 0, NULL, 0) = 32
1099  recvfrom(7, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=6, nlmsg_pid=1101}, {error=0, msg={nlmsg_len=32, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=6, nlmsg_pid=0}}], 65536, 0, NULL, NULL) = 36
1099  sendto(7, [{nlmsg_len=32, nlmsg_type=0x12 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=7, nlmsg_pid=0}, "\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"], 32, 0, NULL, 0) = 32
1099  recvfrom(7, [{nlmsg_len=1492, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=7, nlmsg_pid=1101}, "\x00\x00\x01\x00\x02\x00\x00\x00\x02\x10\x00\x00\x00\x00\x00\x00\x0b\x00\x03\x00\x65\x6e\x70\x31\x73\x30\x00\x00\x08\x00\x0d\x00"...], 65536, 0, NULL, NULL) = 1492
1099  recvfrom(7, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=7, nlmsg_pid=1101}, {error=0, msg={nlmsg_len=32, nlmsg_type=0x12 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=7, nlmsg_pid=0}}], 65536, 0, NULL, NULL) = 36
1099  sendto(7, [{nlmsg_len=40, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=8, nlmsg_pid=0}, "\x00\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x08\x00\x04\x00\xf0\xff\x00\x00"], 40, 0, NULL, 0) = 40
1099  recvfrom(7, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=8, nlmsg_pid=1101}, {error=0, msg={nlmsg_len=40, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=8, nlmsg_pid=0}}], 65536, 0, NULL, NULL) = 36
1099  sendto(5, [{nlmsg_len=24, nlmsg_type=RTM_GETADDR, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_DUMP, nlmsg_seq=9, nlmsg_pid=0}, {ifa_family=AF_INET, ifa_prefixlen=0, ifa_flags=0, ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=if_nametoindex("enp1s0")}], 24, 0, NULL, 0) = 24
1099  recvfrom(5, [[{nlmsg_len=88, nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=9, nlmsg_pid=1099}, {ifa_family=AF_INET, ifa_prefixlen=24, ifa_flags=0, ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=if_nametoindex("enp1s0")}, [[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("192.168.122.92")], [{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("192.168.122.92")], [{nla_len=8, nla_type=IFA_BROADCAST}, inet_addr("192.168.122.255")], [{nla_len=11, nla_type=IFA_LABEL}, "enp1s0"], [{nla_len=8, nla_type=IFA_FLAGS}, 0], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=3564, ifa_valid=3564, cstamp=577, tstamp=577}]]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=9, nlmsg_pid=1099}, 0]], 65536, 0, NULL, NULL) = 108
1099  sendto(7, [{nlmsg_len=88, nlmsg_type=0x14 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_MULTI|NLM_F_ACK|0x400, nlmsg_seq=10, nlmsg_pid=0}, "\x02\x18\x00\x00\x02\x00\x00\x00\x08\x00\x01\x00\xc0\xa8\x7a\x5c\x08\x00\x02\x00\xc0\xa8\x7a\x5c\x08\x00\x04\x00\xc0\xa8\x7a\xff"...], 88, 0, NULL, 0) = 88
1099  recvfrom(7, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=10, nlmsg_pid=1101}, {error=0, msg={nlmsg_len=88, nlmsg_type=0x14 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_MULTI|NLM_F_ACK|0x400, nlmsg_seq=10, nlmsg_pid=0}}], 65536, 0, NULL, NULL) = 36
1099  sendto(5, [{nlmsg_len=36, nlmsg_type=RTM_GETROUTE, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_DUMP, nlmsg_seq=11, nlmsg_pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("enp1s0")]], 36, 0, NULL, 0) = 36
1099  recvfrom(5, [[{nlmsg_len=68, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=11, nlmsg_pid=1099}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_DHCP, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN], [{nla_len=8, nla_type=RTA_PRIORITY}, 100], [{nla_len=8, nla_type=RTA_PREFSRC}, inet_addr("192.168.122.92")], [{nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("192.168.122.1")], [{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("enp1s0")]]], [{nlmsg_len=68, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=11, nlmsg_pid=1099}, {rtm_family=AF_INET, rtm_dst_len=24, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_KERNEL, rtm_scope=RT_SCOPE_LINK, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN], [{nla_len=8, nla_type=RTA_DST}, inet_addr("192.168.122.0")], [{nla_len=8, nla_type=RTA_PRIORITY}, 100], [{nla_len=8, nla_type=RTA_PREFSRC}, inet_addr("192.168.122.92")], [{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("enp1s0")]]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=11, nlmsg_pid=1099}, 0]], 65536, 0, NULL, NULL) = 156
1099  recvfrom(5,  <unfinished ...>

and then nothing until I kill it with SIGINT, full log attached
log.txt

@sbrivio-rh
Copy link
Collaborator

Reproduced using a procfs entry as reference, pasta properly sets up the timer:

openat(AT_FDCWD, "/proc/3513/ns", O_RDONLY|O_CLOEXEC) = 16
fstatfs(16, {f_type=PROC_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0x16, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0
timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC) = 17
timerfd_settime(17, 0, {it_interval={tv_sec=1, tv_nsec=0}, it_value={tv_sec=1, tv_nsec=0}}, NULL) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 17, {events=EPOLLIN, data={u32=4360, u64=68719481096}}) = 0

and then also hangs on a recvfrom() on a RTM_GETROUTE request, similar to your case. It's some netlink change that blocks it completely. If you pass address and gateway explicitly it works. Looking into netlink / routing table front-end changes...

@sbrivio-rh
Copy link
Collaborator

Current suspicion of mine is commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families").

@martinpitt
Copy link
Contributor Author

@sbrivio-rh I suppose your question to me is obsolete, but kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64 has a git reference in it. I suppose that's the upstream one? Otherwise it's somewhere in https://src.fedoraproject.org/rpms/kernel/c/77aed3137518b5c493265a6c9fa1eec2a13ea6ca?branch=rawhide

@Luap99
Copy link
Member

Luap99 commented Mar 15, 2024

@martinpitt yes b0546776ad3f is a upstream commit

@sbrivio-rh
Copy link
Collaborator

@sbrivio-rh I suppose your question to me is obsolete, but kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64 has a git reference in it. I suppose that's the upstream one?

Ah, sorry, it is! I was just using a net.git tree which didn't include b0546776ad3f yet. Switched to mainline torvalds/linux.git now.

@sbrivio-rh
Copy link
Collaborator

Another workaround for pasta: --no-copy-routes.

@sbrivio-rh
Copy link
Collaborator

Patch at: https://archives.passt.top/passt-dev/[email protected]/.

I think we should report this as a possible kernel regression, even if the new behaviour is correct and 87d381973e49 made its way to mainline through net-next, and I doubt it will be included in any -stable branch, so probably releasing a fixed version of pasta in the near future is enough.

I didn't bisect though.

@sbrivio-rh
Copy link
Collaborator

Reported here: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth/.

In any case, I'll prepare a new release in a bit. I'm not aware of any other distribution already shipping that commit in their latest kernel package.

@Luap99
Copy link
Member

Luap99 commented Mar 18, 2024

The pasta fix is in passt-0^20240318.g615d370 (https://bodhi.fedoraproject.org/updates/FEDORA-2024-4b5b35a749)

Thus I am going to close this one.

@Luap99 Luap99 closed this as completed Mar 18, 2024
@dustymabe
Copy link
Contributor

I'll note here that we had some other non-pasta related tests that were failing in CoreOS: coreos/fedora-coreos-tracker#1693

Though, it does appear these tests do now also pass with https://bodhi.fedoraproject.org/updates/FEDORA-2024-4b5b35a749

Not sure why container building would have been affected, but figured I'd mention it anyway.

@Luap99
Copy link
Member

Luap99 commented Mar 18, 2024

Not sure why container building would have been affected, but figured I'd mention it anyway.

Because pasta is the default for rootless networking now. So podman run, podman build, etc.. they all configure the netns with pasta (assuming rootless of course)

@dustymabe
Copy link
Contributor

Makes sense. Thanks

@martinpitt
Copy link
Contributor Author

That kernel fix for the original issue here seems to have landed. But very recent runs now fail differently:

$ podman run -d --name test-sh-user --stop-timeout 0 localhost/test-alpine sh
Error: pasta failed with exit code 1:
netlink: Unexpected sequence number (6 != 10)

I'll file this as a proper/separate issue tomorrow morning, just a little pre-warning in case that rings a bell, and to explain why rawhide runs are still red everywhere.

@sbrivio-rh
Copy link
Collaborator

$ podman run -d --name test-sh-user --stop-timeout 0 localhost/test-alpine sh
Error: pasta failed with exit code 1:
netlink: Unexpected sequence number (6 != 10)

@dgibson

@sbrivio-rh
Copy link
Collaborator

That kernel fix for the original issue here seems to have landed.

It wasn't actually a kernel fix -- the kernel's behaviour was in fact correct, even though different from what it has been in the past ~5 (at least?) years.

But very recent runs now fail differently:

$ podman run -d --name test-sh-user --stop-timeout 0 localhost/test-alpine sh
Error: pasta failed with exit code 1:
netlink: Unexpected sequence number (6 != 10)

It looks like either a kernel issue, or we fail to account for four messages from the kernel (most likely) after kernel commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families").

Some relevant information meanwhile: the kernel version was meanwhile bumped to 6.9.0-0.rc0.20240315gite5eb28f6d1af.7.fc41.x86_64, matching upstream (torvalds/linux.git) commit e5eb28f6d1af. Addresses and routes:

[   14.040569] cloud-init[877]: ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
[   14.042445] cloud-init[877]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[   14.044223] cloud-init[877]: ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
[   14.045991] cloud-init[877]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[   14.047739] cloud-init[877]: ci-info: |  eth0  | True |        172.31.17.200        | 255.255.240.0 | global | 06:4d:a6:73:32:bf |
[   14.049466] cloud-init[877]: ci-info: |  eth0  | True | fe80::44d:a6ff:fe73:32bf/64 |       .       |  link  | 06:4d:a6:73:32:bf |
[   14.051195] cloud-init[877]: ci-info: |   lo   | True |          127.0.0.1          |   255.0.0.0   |  host  |         .         |
[   14.052960] cloud-init[877]: ci-info: |   lo   | True |           ::1/128           |       .       |  host  |         .         |
[   14.054681] cloud-init[877]: ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[   14.056432] cloud-init[877]: ci-info: +++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++
[   14.057942] cloud-init[877]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   14.059449] cloud-init[877]: ci-info: | Route | Destination |   Gateway   |    Genmask    | Interface | Flags |
[   14.060985] cloud-init[877]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   14.062503] cloud-init[877]: ci-info: |   0   |   0.0.0.0   | 172.31.16.1 |    0.0.0.0    |    eth0   |   UG  |
[   14.064027] cloud-init[877]: ci-info: |   1   | 172.31.16.0 |   0.0.0.0   | 255.255.240.0 |    eth0   |   U   |
[   14.065565] cloud-init[877]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   14.067085] cloud-init[877]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   14.068352] cloud-init[877]: ci-info: +-------+-------------+---------+-----------+-------+
[   14.069670] cloud-init[877]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   14.070922] cloud-init[877]: ci-info: +-------+-------------+---------+-----------+-------+
[   14.072215] cloud-init[877]: ci-info: |   0   |  fe80::/64  |    ::   |    eth0   |   U   |
[   14.073488] cloud-init[877]: ci-info: |   2   |  multicast  |    ::   |    eth0   |   U   |
[   14.074849] cloud-init[877]: ci-info: +-------+-------------+---------+-----------+-------+

@dgibson
Copy link
Collaborator

dgibson commented Mar 19, 2024

@martinpitt I've installed the rawhide kernel and have reproduced... well, not exactly the problem you describe, but something wrong in the netlink handling. I'm debugging now.

@Luap99
Copy link
Member

Luap99 commented Mar 19, 2024

FYI I just saw another user run into this on a fedora bz: https://bugzilla.redhat.com/show_bug.cgi?id=2270257#c9

He confirmed that the new passt version from today solves the issue for him.

@martinpitt
Copy link
Contributor Author

The rawhide log came back, and bad news: It did install the latest passt-0^20240319.gd35bcbe-1.fc41.x86_64, but it still has some failures with "netlink: Unexpected sequence number (6 != 10)". However, that bugzilla also mentions "with podman 5.0.0rc7", and that run only saw 5.0.0rc6-2 . I'll try to update that as well. it also runs a previous kernel, but as I can't just reboot a TF instance while the test is running, I can't pull in that.

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 19, 2024
@Luap99
Copy link
Member

Luap99 commented Mar 19, 2024

The podman version rc 6 or 7 should not make a difference.
Note the bz was reported against f40, not rawhide with the 6.9 kernel so that could also make a difference.

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 19, 2024
martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 19, 2024
@martinpitt
Copy link
Contributor Author

"fun" -- I changed cockpit-project/cockpit-podman#1636 to also install the latest podman and passt koji builds for F40, and now both F40 and rawhide fail on the "Unexpected sequence number (6 != 10)" issue.

@dgibson
Copy link
Collaborator

dgibson commented Mar 20, 2024

Drat the Unexpected sequence number error seems to be independent of the other one, and still present in the latest build. Unfortunately neither @sbrivio-rh nor I have any good theories on the cause so far. Nor have we been able to reproduce it, or even have good theories on what the conditions might be that trigger it.

@martinpitt @Luap99 have either of you seen this outside of the difficult to adjust CI context? At present my next step is to ask for some additional logging from the reporter of the BZ.

@dgibson
Copy link
Collaborator

dgibson commented Mar 20, 2024

I filed pasta upstream bug 83 to track the Unexpected sequence number problem.

@dgibson
Copy link
Collaborator

dgibson commented Mar 20, 2024

I now have a draft fix for upstream bug 83, here. I'm talking to @sbrivio-rh about a release.

@martinpitt
Copy link
Contributor Author

Meh, I first tried to reproduce this with "tmt" itself, but that is annoyingly difficult. I then tried to boot an F40 cloud image out of thin air with just curl and QEMU:

Run F40 cloud VM:

curl -o fedora.qcow2 -L https://download.fedoraproject.org/pub/fedora/linux/development/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-20240319.n.0.qcow2
# nothing fancy, just admin:foobar and root:foobar
curl -L -O https://github.com/cockpit-project/bots/raw/main/machine/cloud-init.iso
qemu-system-x86_64 -cpu host -enable-kvm -nographic -m 2048 -drive file=fedora.qcow2,if=virtio -snapshot -cdrom cloud-init.iso -net nic,model=virtio -net user,hostfwd=tcp::2201-:22

Log in with SSH (more comfortable than the console), password "foobar":

ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o CheckHostIP=no  -p 2201 admin@localhost

inside the VM:

sudo dnf install --enablerepo=updates-testing podman
podman run -it --rm quay.io/libpod/busybox

But that again works fine. But it seems @dgibson is ahead of that anyway -- can you reproduce it now?

If not, the Testing Farm has a cli for "reserve" , which you can use to get an interactive real TF instance. It does require some initial approval by @thrix though.

@dgibson
Copy link
Collaborator

dgibson commented Mar 20, 2024

I have reproduced something that looks very similar, so I think so. To make it work, I've had to create a second network interface, which doesn't appear to be the case in the other cases we've seen. My working theory is that in those other cases something has created an additional interface after the ip addr etc. information was dumped.

@sbrivio-rh
Copy link
Collaborator

I now have a draft fix for upstream bug 83, here. I'm talking to @sbrivio-rh about a release.

Wednesday's release coming.

@sbrivio-rh
Copy link
Collaborator

There you go, passt-0^20240320.g71dd405-1.fc41.

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 20, 2024
martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 20, 2024
martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 20, 2024
@martinpitt
Copy link
Contributor Author

I updated the test PR to that version. As expected, f40 still failed the same way, as there's no new passt there. But the rawhide run PASSED 💯 🥳 🎉

Thanks @dgibson and @sbrivio-rh ! Can you please release this to F40, too?

@sbrivio-rh
Copy link
Collaborator

Sure, here's passt-0^20240320.g71dd405-1.fc40. Thanks for testing and following up with this!

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 20, 2024
@Luap99
Copy link
Member

Luap99 commented Mar 20, 2024

Thanks all, I am going to close given it is fixed in the latest release

@Luap99 Luap99 closed this as completed Mar 20, 2024
@martinpitt
Copy link
Contributor Author

Updated cockpit-project/cockpit-podman#1636 to that F40 update and it's squeaky green now 💚 Thanks all!

@edsantiago
Copy link
Member

Pasta 03-20 is now in CI VMs, PR #22082

hswong3i pushed a commit to alvistack/passt-top-passt that referenced this issue Mar 22, 2024
Since f919dc7 ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version.  Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query.  This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.

Rewrite the interface detection to avoid this problem.  While we're there:
  * Perform detection of both default and non-default routes in a single
    pass, avoiding an ugly goto
  * Give more detail on error and working but unusual paths about the
    situation (no suitable interface, multiple possible candidates, etc.).

Fixes: f919dc7 ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: containers/podman#22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <[email protected]>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
 IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <[email protected]>
hswong3i pushed a commit to alvistack/passt-top-passt that referenced this issue Mar 22, 2024
A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.

Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro.  However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b153 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.

The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request.  It needs to get all the routes, then make multiple passes
through them.  We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.

After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them.  It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).

What we do after the loop depends on which exit case we had.  If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.

We attempt to check for that second case by re-checking NLMSG_OK(nh,
status).  However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check.  To fix this we need separate variables for the number of bytes
left and the final status code.

We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.

Reported-by: Martin Pitt <[email protected]>
Fixes: f00b153 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0 ("netlink: Always process all responses to a netlink request")
Link: containers/podman#22052
Signed-off-by: David Gibson <[email protected]>
Signed-off-by: Stefano Brivio <[email protected]>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jun 19, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jun 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. pasta pasta(1) bugs or features
Projects
None yet
Development

No branches or pull requests

6 participants