netbird 0.32.0 breaks K3s 1.32.2+k3s1 with flannel due to iptables conflicts #2926

christian-schlichtherle · 2024-11-21T19:26:01Z

Describe the problem

We're operating an IoT project where some K3s nodes are placed at customer premises. So we are installing Netbird 0.32.0 on each node first and then install K3s v1.32.2+k3s1 using flannel next. When installing k3s, we are providing flannel-iface=wt0 to tell it to use the Netbird interface for node-to-node communication.

This works great to some extent but there is a problem: When the Netbird service starts, it sets up its iptables rules. Also, flannel sets up its iptables rules. However, there seem to be conflicts in those rules, resulting in communication being broken after every restart of the Netbird service, e.g. when installing an upgrade. As a workaround, I have to restart the k3s(-agent) service after every restart of the Netbird service.

Summing it up, to restart all Netbird services in the cluster, I have to do something like this:

ansible k3s_server -b -m shell --forks 1 -a 'systemctl restart netbird && sleep 3 && systemctl restart k3s'
ansible k3s_agent -b -m shell -a 'systemctl restart netbird && sleep 3 && systemctl restart k3s-agent'

As you can imagine, this is not a sustainable solution, just a hacky workaround.

Is this a known issue? What are my options? Wait for a fix or try another CNI like cilium?

To Reproduce

Steps to reproduce the behavior:

Install Netbird on a bunch of nodes
Install K3s on the nodes with flannel-iface=wt0
Restart the netbird service only and watch the in-cluster communication to break, e.g. you can't kubectl logs <any-pod> anymore.

Expected behavior

Not breaking the in-cluster communication by leaving flannel's iptable rules alone.

Are you using NetBird Cloud?

Yes

NetBird version

0.32.0

NetBird status -dA output:

n/a

Do you face any (non-mobile) client issues?

Yes.

Screenshots

n/a

Additional context

See above.

The text was updated successfully, but these errors were encountered:

christian-schlichtherle · 2024-11-21T19:26:30Z

BTW: This is a long-standing problem, I just had no time to report it earlier.

lixmal · 2024-11-22T09:17:44Z

Hi @christian-schlichtherle, can you post your iptables/nftables before and after your workaround?

iptables-save
nft list ruleset

You might need to install nftables for the nft tool to be available

christian-schlichtherle · 2024-11-24T19:53:59Z

@lixmal I have run these commands. Unfortunately, the output reveals too much sensitive information to share it here, but in order to have a meaningful diff, I processed the output as follows:

ansible my-worker-node -b -a 'iptables-save' > 10_iptables-save_before
ansible my-worker-node -b -a 'nft list ruleset' > 10_nft_list_ruleset_before
ansible my-worker-node -b -m service -a 'name=netbird state=restarted'
ansible my-worker-node -b -a 'iptables-save' > 20_iptables-save_after_netbird_restart
ansible my-worker-node -b -a 'nft list ruleset' > 20_nft_list_ruleset_after_netbird_restart
ansible my-worker-node -b -m service -a 'name=k3s-agent state=restarted'
ansible my-worker-node -b -a 'iptables-save' > 30_iptables-save_after_k3s_agent_restart
ansible my-worker-node -b -a 'nft list ruleset' > 30_nft_list_ruleset_after_k3s_agent_restart
for file in ??_iptables-save_*; do grep -v -e '^#' -e '^*' -e 'COMMIT' < $file | sort > $file.sorted; done
for file in ??_nft_list_ruleset_*; do grep -v -e '^#' -e '^\s*$' -e '^\s*table' < $file | sort > $file.sorted; done

This results in a bunch of *.sorted files which I could compare using text diff. The result was that the only difference between the files 10_*.sorted and 20_*.sorted was in the packet counters. Yet, the pod-to-pod communication is definitely broken after restarting the netbird service. So now we know that it has nothing to do with iptables/nftables rules. I'm sorry for the misleading title of this issue.

Another mistake I have done in my original posting is to say that a restart of the netbird service breaks kubectl logs: That's not correct - this command still works (it doesn't require flannel). However, pod-to-pod communication is definitely broken. In our case, a client could not connect to another service anymore. After a final restart of the k3s-agent service, it works again.

Summing it up, a restart of the netbird service does break flannel, although the information given in my original posting is not exactly correct. I hope this information helps to reproduce the issue.

christian-schlichtherle added the triage-needed label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

netbird 0.32.0 breaks K3s 1.32.2+k3s1 with flannel due to iptables conflicts #2926

netbird 0.32.0 breaks K3s 1.32.2+k3s1 with flannel due to iptables conflicts #2926

christian-schlichtherle commented Nov 21, 2024

christian-schlichtherle commented Nov 21, 2024

lixmal commented Nov 22, 2024

christian-schlichtherle commented Nov 24, 2024

netbird 0.32.0 breaks K3s 1.32.2+k3s1 with flannel due to iptables conflicts #2926

netbird 0.32.0 breaks K3s 1.32.2+k3s1 with flannel due to iptables conflicts #2926

Comments

christian-schlichtherle commented Nov 21, 2024

christian-schlichtherle commented Nov 21, 2024

lixmal commented Nov 22, 2024

christian-schlichtherle commented Nov 24, 2024