-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network connectivity lost when kube-router starts after upgrade to Fedora CoreOS 37 #1415
Comments
Not sure if it means anything yet, but it seems that iptables is dropping my return pings. Specifically this rule:
This rule looks different on (working) CoreOS 36:
iptables isn't my forte, but it seems that on CoreOS 36 we're only dropping marked packets, but on CoreOS 37 we're dropping all packets? |
Removing |
Hmm, looks like |
AFAICT it's set here: https://github.com/kubernetes/kubernetes/blob/9edd4d86c8badea712b44013c156ad57778268eb/pkg/kubelet/kubelet_network_linux.go#L152-L159
|
If I manually replace the rule on a CoreOS 37 node, the node appears to come up normally. Edit: False alarm. Something 'fixes' it a few minutes later. |
I'm guessing that this is probably the same issue as: #1370 Essentially, the bug happens because iptables 1.8.8 is not backwards compatible with iptables 1.8.7 which was bundled in the kube-router container. The above issue goes into it in great detail but the TL;DR; is:
|
Yep, that looks identical! However, I'm already using kube-router-1.5.3, so perhaps it's not fixed, or perhaps it was bumped again? |
Oh... In the issue description you said that you were running 1.5.1. |
So I did, and that's more likely to be correct than what I have specified in the DS. I'll bet it's not rolling out because one of the nodes is hosed 🤔 I will fix it and report back (tomorrow). Thanks! |
Absolutely! Sorry this one bit ya. 😞 The ultimate fix is dependent on upstream, but hopefully we'll have a chance to get around to #1372 soon as well which should greatly reduce the likelihood of this happening. |
Confirmed this is working in v1.5.3. Thanks again, and sorry for the noise! |
What happened?
This is mostly a placeholder as I don't yet have any useful idea what's going on other than some coarse symptoms.
One of my nodes dropped out of my cluster. It had no external network connectivity at all. Can't even ping in or out. After some investigation the trigger was that Zincati had upgraded it from Fedora CoreOS 36 to Fedora CoreOS 37. The node comes up fine and kubelet starts. As soon as kubelet starts kube-router it loses network connectivity. This doesn't happen on Fedora CoreOS 36. The workaround was to downgrade to Fedora CoreOS 36 and temporarily disable upgrades. This happens to any node which upgrades to Fedora CoreOS 37.
Debugging a little it seems to relate to iptables rules. Logging into an affected node, I can get network connectivity back by flushing iptables rules. This obviously isn't very useful, but it's an observation. My next step is to comb through the generates iptables rules to try to find a single trigger.
How can we reproduce the behavior you experienced?
🤷
**Screenshots / Architecture Diagrams / Network Topologies **
If applicable, add those here to help explain your problem.
** System Information (please complete the following information):**
kube-router --version
): Running kube-router version v1.5.1, built on 2022-07-29T22:25:31+0000, go1.17.10- --run-router=true
- --run-firewall=true
- --run-service-proxy=true
- --bgp-graceful-restart=true
- --kubeconfig=/var/lib/kube-router/kubeconfig
- --runtime-endpoint=/var/run/crio/crio.sock
kubectl version
) : 1.25.3** Logs, other output, metrics **
I'm using https://github.com/cloudnativelabs/kube-router/blob/v1.5/daemonset/kubeadm-kuberouter-all-features-dsr.yaml
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: