Multi-Chassis Link Aggregation Groups, often abbreviated MLAG, but also known under the acronyms CLAG, MC-LAG, MCT, SMLT, VLT, or vPC, provide a relatively simple method for active/active Layer 2 (Ethernet) redundancy. The usual implementation is based on two switches that work with independent control planes and use some (proprietary) protocol to negotiate MLAG operations.
Basically every vendor of networking equipment, i.e., switches in the context of MLAG, uses different terminology to describe more or less the same thing. I will use the terms MLAG, peer link, and split brain as follows:
This potential problem in a split brain situation, i.e., when
the peer link between the two MLAG peer switches of an MLAG pair
fails, pertains to a simple MLAG setup comprising just two switches
Sw2, and two servers
Srv2. Two servers provide some motivation for first
hop gateway functionality on the MLAG peer switches, because they could
be in different VLANs with different subnets.
As long as there are no link failures and no single-attached devices (also known as orphan ports), the connection between the two switches of an MLAG pair is nearly unused. Most data frames received by one of the MLAG peer switches are forwarded to another local (MLAG) port.
But as soon as the MLAG peer switches start to perform some Layer 3 (IP) functionality, the peer link is required. For example, ARP resolution by one of the MLAG peer switches may require the response frame to traverse the peer link, depending on the load sharing decision performed locally on the responding device.
If the peer link between the two MLAG peer Switches fails, ARP resolution may fail, inhibiting some IP communication.
Device local load sharing decisions determine if ARP resolution works or fails for some other device. Some IP communication will work, some will fail. Mitigations for this problem exist and should be considered for any MLAG deployment.
One way to mitigate this problem (and similar ones) is to disable the MLAG ports of one of the MLAG peers in a split brain situation. This requires defining one of the peers to fulfill a primary role, and the other to fulfill a secondary role. Then the secondary MLAG peer is configured to disable its MLAG ports if a split brain situation is detected.
Usually an additional, potentially logical, link between the MLAG peers is established to allow the secondary MLAG peer to distinguish between failure of the MLAG peer link and failure of the primary MLAG peer (this is an additional keepalive connection). An MLAG setup without this additional connection cannot provide mitigation of both the failure of any MLAG peer switch and failure of just the MLAG peer connection. The same basic idea is supported in some switch stacking and many chassis bonding solutions as well.
If the MLAG implementation is combined with an anycast VTEP for redundant server connectivity to a VXLAN fabric, the secondary MLAG peer's anycast VTEP needs to be disabled as well as any (other) MLAG port in a split brain situation. The important part is to no longer advertise the anycast VTEP IP into the underlay.
Another mitigation would be to use the additional keepalive connection for synchronization of ARP cache contents instead of disabling the MLAG ports of the secondary peer.
If the VXLAN deployment uses a control plane protocol like EVPN, information about MAC address to IP address association can be distributed via this protocol. This not only potentially allows to locally answer ARP requests allowing to suppress ARP flooding, it can also mitigate the ARP resolution problem described above as long as the control plane protocol provides this feature and still works correctly after a failure of the MLAG peer link, e.g., via Layer 3 uplinks. This mitigation is a possible replacement for disabling the MLAG ports of the secondary MLAG peer by preventing a split brain situation in the case of a failed MLAG peer link.
In a network comprising more than just two switches, i.e., switches in addition to the MLAG pair (possibly additional MLAG pairs), use of a virtual peer link, i.e., peer link functionality implemented by encapsulating frames for the MLAG peer for transport over the switch's uplinks, allows to treat the MLAG peers equally without primary or secondary designation. If an MLAG peer loses all uplinks, it needs to disable all MLAG ports. As long as at least one uplink of the other peer is still working, the MLAG construct provides connectivity. This obviously does not work if there are only direct connections between the two MLAG peer switches and thus no uplinks at all.
It may in general be helpful to disable all downlinks of a switch that has lost all its uplinks in order to signal downstream devices (e.g., servers) to switch to another port (unless there are single-attached devices). Some hypervisor vendors expect this network behavior.
If there are any single-attached devices connected to any of the MLAG peer switches, a failure of the connection between the MLAG peers cannot always be mitigated. Thus the network design should avoid single-attached devices, e.g., by adding an additional switch to connect all hosts that cannot use an MLAG connection.
A single-attached device is similar to an MLAG attached device where all but one links have failed. An MLAG construct cannot protect against arbitrary multiple failures, but a single-attached device is equivalent to a dual-attached device where some failure has already occurred, thus the first real failure is actually a multiple failures situation for single-attached devices.
back to my homepage.