Node behind NAT - Flannel setup #6941
koxu1996
started this conversation in
Show and tell
Replies: 1 comment
-
Thanks Thanks a lot for sharing this! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
ℹ️ I am sharing this, because I had networking problem that was bugging me for over 6 months, and today I finally found the solution 😌. Hope it helps someone!
TL;DR: If you are behind NAT, then enable persistent keepalive for WireGuard.
My story
I basically wanted to offload some traffic to my home machine (behind NAT), as it was idle for the most of time. In my cluster I aready had serviceLB enabled and wireguard backend for Flannel. Therefore I set up RKE2 agent, and I hoped it would work right away with Traefik ingress.
However, since the beginning I had the weird and non-persistent connection issues. For example cert-manager was failing with
context deadline exceeded
which was caused by request timeout. The workaround for this was ridiculous, but always working: create some pod, exec into it and ping the target URL. In the beginning I thought it was a DNS issue, but later I discovered that the whole inter-node connectivity is somehow broken, at least for nodes trying to reach my home machine.I have dynamic public IP, so I configured my Mikrotik router to forward all required port according to docs. The most important was port 51820 UDP - Canal CNI with WireGuard IPv4. I spent hours setting up firewall rules (even added rule for accepting all traffic), but still my home node was sometimes unreachable.
NAT workaround
I did a lot of things to track down the issue: analyzed iptables, observed traffic with
tcpdump
, logged packets in router, but nothing helped. While playing around with network interfaces - mostly trying to understand WireGuard connection - I discovered thatping
from my home machine to one of nodes unblocked connection between those nodes for next 3 minutes 🤔. This got me into more time spent on Mikrotik configuration, as this was clearly NAT issue.Final solution
Breakthrough was result of running
wg show
command, where I discovered thatlatest handshake
was sent over hour ago. I thought it should be sent much more frequently... quick research and I discovered that WireGuard has persistent keepalives ✨:So I adjusted
/var/lib/rancher/rke2/server/manifests/rke2-canal-config.yaml
:apiVersion: helm.cattle.io/v1 kind: HelmChartConfig metadata: name: rke2-canal namespace: kube-system spec: valuesContent: |- flannel: backend: "wireguard" + keepaliveInterval: 25
Canal daemon set restarted with
kubectl rollout restart ds rke2-canal -n kube-system
and boom: inter-node traffic is now working as expected 🚀!Beta Was this translation helpful? Give feedback.
All reactions