Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman Container Bridge Networking Doesn't Work IF docker commands have been run on a booted new Next node #1822

Open
fifofonix opened this issue Oct 28, 2024 · 6 comments

Comments

@fifofonix
Copy link

Describe the bug

Podman bridge network created for a new container does not operate successfully IF the node has first run a docker command (as simple as docker ps). Can't even access DNS servers.

Note: Concurrent docker/podman usage is not recommended per FCOS FAQ for several years. However, up until now operations as simple as this have succeeded allowing users in most cases to use both in parallel (up to current test)

Reproduction steps

  1. Provision a minimal new next node on any platform VMWare/AWS.
  2. Launch a podman container and verify its networking: sudo podman run -it fedora:41 curl -v google.com # Succeeds
  3. Reboot node
  4. Run docker ps activating docker and possibly triggering some networking operation of some sort
  5. Simple podman container networking now fails with no ability to access DNS servers even: sudo podman run -it fedora:41 curl -v google.com # Fails

Expected behavior

Podman container can access the internet

Actual behavior

Podman container can not even access DNS servers.

System details

  • Reproduced on vSphere and AWS.

Butane or Ignition config

variant: fcos
version: 1.4.0
passwd:
  users:
    # The core user has sudo privileges by default.
    - name: core
storage:
  files:
    - path: /etc/hostname
      mode: 0420
      overwrite: true
      contents:
        inline: "d-mini-1"
  files:
    - path: /etc/NetworkManager/system-connections/Wired\ connection\ 1
      mode: 0600
      contents:
        inline: |
          [connection]
          id=Wired connection 1
          type=ethernet
          [ipv4]
          dns-search=ec2.internal;amce.com;ACME.COM;
          method=auto
          [ipv6]
          addr-gen-mode=default
          method=auto
    - path: /etc/ssh/sshd_config.d/05-acme.conf
      mode: 0644
      overwrite: true
      contents:
        inline: |
          TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem
          HostCertificate /etc/ssh/ssh_host_ecdsa_key-cert.pub
    - path: /etc/sysconfig/docker
      mode: 0644
      overwrite: true
      contents:
        # We customise this because we want to suppress --live-restore which
        # is incompatible with docker swarm.
        inline: |
          OPTIONS="--selinux-enabled
          --default-ulimit nofile=64000:64000
          --init-path /usr/libexec/docker/docker-init
          --userland-proxy-path /usr/libexec/docker/docker-proxy
          "
    - path: /etc/sysconfig/chronyd
      mode: 0644
      overwrite: true
      contents:
        inline: |
          OPTIONS="'server acme-ntp-1'
          "

Additional information

Non-working output:

core@d-pdm-mini-1:~$ sudo podman run -it fedora:41 curl -v google.com
* Could not resolve host: google.com
* shutting down connection #0
curl: (6) Could not resolve host: google.com
@dustymabe
Copy link
Member

If your node is in this bad state what happens if you reboot and then don't run docker ps before trying to run podman?

@fifofonix
Copy link
Author

fifofonix commented Oct 29, 2024

Podman bridge networking will work again post the reboot as expected.

Transcript for this on a newly provisioned node.

Fedora CoreOS 41.20241024.1.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos

core@d-mini-1:~$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
core@d-mini-1:~$ sudo podman run -it fedora:41 curl google.com
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:41...
Getting image source signatures
Copying blob 7dc6d0552bc8 done   |
Copying config a7e63d9477 done   |
Writing manifest to image destination
curl: (6) Could not resolve host: google.com
core@d-mini-1:~$ sudo systemctl reboot

Broadcast message from root@d-mini-1 on pts/1 (Tue 2024-10-29 12:40:40 UTC):

The system will reboot now!

core@d-mini-1:~$ Shared connection to 172.25.81.77 closed.
(pdm-3.12) me@MY-MAC:~/.ssh$ ssh 172.25.81.77
Fedora CoreOS 41.20241024.1.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos

Last login: Tue Oct 29 12:39:31 2024 from 156.111.209.171
core@d-mini-1:~$ sudo podman run -it fedora:41 curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
core@d-mini-1:~$

@fifofonix
Copy link
Author

fifofonix commented Oct 29, 2024

I should also note that I am running several long-lived next nodes that run podman/docker concurrently without issue, so maybe some legacy settings carried forward in older nodes when they upgrade works fine, but that newly provisioned nodes defaulting to some newer networking aspects are having the issue...

@dustymabe
Copy link
Member

@fifofonix considering it's against recommendation and also corrects itself on reboot I'm not sure there is much for us to do here other than characterize the problem so if others encounter it we can point them here.

Do you agree?

@fifofonix
Copy link
Author

I think because it is against recommendation just characterizing the problem is fine.

Correcting the problem on reboot is only until someone uses docker. I can anticipate that a docker user seeking to experiment with podman for a oneshot systemd unit, on one of their docker nodes, and not realising that parallel use is not recommended, is going to end up confused and give up on their experiment.

If there were some more details on how to separate namespaces that would be useful to me.

Personally, I used podman to run some higher security workloads with additional capabilities that I did not want to expose to the docker process. For now these dual-workload nodes run fine but only until I choose to reprovision them. I will have to relocate worklods which is not an issue now that I know the issue at hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
@dustymabe @fifofonix and others