Replies: 13 comments 8 replies
-
Can you please share the "broken" config? I see references to moving from ens22 -> ens22.2, but it doesn't state explicitly where in the config that change is made. |
Beta Was this translation helpful? Give feedback.
-
I change it in the bond definition on the nodes:
so that the tagged interface gets enslaved to multihoming bond. (I also do in on the multihomed server)
But unfortunately it does not work. I also found out that if I create veth pair, put one end to mhbond0 the other end to a vlan aware bridge with pvid 2, and add the ens22 to that bridge - It works:
It's strange that it works with veth interface but does not with the vlan interface. |
Beta Was this translation helpful? Give feedback.
-
Try leaving the bond config alone - just add a separate vlan subinterface and put that vlan in the bridge.
|
Beta Was this translation helpful? Give feedback.
-
Thank you for the suggestion. I suppose I leave frr.conf as is. |
Beta Was this translation helpful? Give feedback.
-
An Ethernet Segment represents an attachment circuit - in this case an LACP LAG. Multiple vlans can be carried over the same AC without needing a separate ES-ID. To bring that back to the config, that means they'd all just be different vlans carried over the same bond interface. So you'd only need 1 ES-ID per MH bond regardless of how many vlans you configure on each MH bond. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the clarification!
|
Beta Was this translation helpful? Give feedback.
-
There's not enough info here to help isolate what's going on. Right now all we know about your network is that you've configured EVPN-MH. At the very least you should provide a network diagram showing where the test hosts are, an idea of what you've deployed (e.g. EVPN Symmetric vs Centralized, who is the GW, what VRFs exist and whether/where they're being leaked, etc.) and specific details around working/non-working flows (SMAC/DMAC/SIP/DIP). |
Beta Was this translation helpful? Give feedback.
-
Thanks for the response. I'm sorry that I missed it somehow. Sure. Here is the info.
Inter-subnet routing occurs in VRF 'vrflan', there is an anycast default gateway on each hypervisor:
Each of the hypervisors is connected to each VNI via veth pair. No leaking is configured. When more than one multihoming links connected I see random packet loss and TCP connection freezes/drops when accessing hosts/VMs across the multihoming bond from different VLAN/VNI. For example host0 <-> vm101 My current config: at1:/etc/network/interfaces
at1:/etc/frr/frr.conf
The configuration on the other hypervisors (at2, at3) are identical (except IP addresses .1 -> .2, .3, etc) switch:/etc/network/interfaces
|
Beta Was this translation helpful? Give feedback.
-
Thanks you very much for the answer! I had a chance to test it today and this really did the trick. Except for DHCP.
It seems it working now for IP protocol, but I found out that DHCP does not work. If there's only one multihoming link (no matter which one), DHCP woks as expected. But if there are more than one mh link - DHCP doesn't work. To be precise it partially working. I use dnsmasq as DHCP server on the one of the nodes, for example on at1, on the interface lan1000:
Half of my diskless stations on this subnet doesn't boot, although there are records in the log file of the obtained addresses. The other half boots up to the stage of IP address configuration by DHCP, which fails. My laptop also fails to get the address by DHCP. I set up 2 multihoming links to at1 and at2 for testing and tried running dhcrelay with different options on at2, while dnsmasq was running on at1. But it did not improve the situation. |
Beta Was this translation helpful? Give feedback.
-
It would be good to understand functionally what is failing, that way we can try to understand why it's failing. e.g. When DHCP fails, is it due to missing packets or something else? When DHCP succeeds but the boot fails, is there a corresponding network failure? The DHCP failure for your laptop seems like it would be easiest to diagnose (since you can get a packet capture + DHCP client logs), so I'd probably recommend starting from there first. If you can narrow down specifically what is causing the failure, we can help look at why those failures are occurring. |
Beta Was this translation helpful? Give feedback.
-
Logs via
Client logs via
I also sniffed the packets on the switch and found out that in the good situation communication goes in both directions, and in bad situation I only see request packets from DHCP client. So it seems to me that DHCP responses get filtered out on the host where dnsmasq is running (i. e. on at1). There is no firewall configured. All I do is enable second multihoming link on the switch, which is enough to break DHCP. |
Beta Was this translation helpful? Give feedback.
-
Did you ever get past this? I have run into pretty much the exact same problem with dhcp on a MH setup. From my tracing of the traffic, split horizon filters are not actually being applied on the tagged interfaces of the multi-homed bonds. This is especially problematic with DHCP when the multi-homed bond comes from a switch...
|
Beta Was this translation helpful? Give feedback.
-
Unfortunately no, the company head decided to switch to a simpler setup. I was hoping to get back to this issue, but haven't done so yet. If I have any updates on this in the future, I'll post it here. Please let me know If you manage to resolve this issue. |
Beta Was this translation helpful? Give feedback.
-
FRR VERSION: 8.2.2-1+pve1
OPERATING SYSTEM VERSION: Debian 11 Bullseye (Proxmox 7.2)
KERNEL VERSION: 5.15.35-1-pve
Hello, my goal is to allow several VNIs to be accessible as VLANs to multihomed server.
Before attempting to do that I tried doing multihoming with single VLAN.
I use these config statements in frr.conf on the nodes:
es-df-pref
is different between nodes.I tried setting up multihoming and it works when it is configured on physical interface
ens22
:But as soon as I change
ens22
toens22.2
on the nodes and appropriate bond on the server, it does not work, although I can access server via this VLAN (when taking it out of the bond and assigning the IP)/etc/frr/frr.conf (node 1)
/etc/network/interfaces (node 1)
The other nodes have similar configurations.
Steps To Reproduce
Expected behavior
I expect server to be accessible on step 3
Versions
Beta Was this translation helpful? Give feedback.
All reactions