I thought VXLAN was just another type of encapsulation when I first saw it. There was a payload that existed behind a series of headers, not unlike GRE. Except the header was bigger than GRE and VXLAN itself is coupled with other protocols like LISP or BGP EVPN L2VPN. There was emphasis on decoupling IP with physical location. The network assumed a different, more glamorous moniker: the fabric. Even when I only had a glimpse of what VXLAN was, I understood that this was a proposal for the future of networking.
Whether all of the networks in the future will be fabrics is yet to be determined. There are a lot of pieces to a VXLAN fabric and that was the major reason why learning about it was so rewarding. I don’t have access to an SD controller running LISP nor do I have a working version of DNA center so I studied VXLAN in the data center. Cisco modeling labs provides access to nexus 9K images and that was all I needed to play with VXLAN.
Long before I typed the first commands into the CLI I had to learn how VXLAN worked. From a broad perspective, it is similar to other overlay solutions. The idea is to build a logical network that exists on top of a physical network. Whenever you build a GRE tunnel, you’re configuring two endpoints with two virtual IPs that are then encapsulated in a header containing the physical IPs. The outer header is what the physical or underlay network interacts with and therefor the devices in between the endpoints are agnostic of the inner or overlay IPs. The traceroute from one endpoint to the other using the overlay IP is one hop. To the overlay devices, they are directly connected to each other.
If VXLAN did only that, build a logical connection between two endpoints, it wouldn’t be a solution for future networks. The VXLAN header itself is basically an extension of the 802.1q tag. The number of devices that exist within the modern data center has exploded due to virtual machines and cloud services so the 4096 VLANs that 802.1q offers is insufficient. VXLAN extends that number to 16 million different segments though the actual number that the hardware supports will vary.
Extending the number of segments is great but is not sufficient for VXLAN to do its magic. Two other pieces contribute to the flexibility of VXLAN: BGP EVPN L2VPN and anycast routing. Start with anycast. I always thought it was the odd child of IP routing. Broadcast, though wasteful, has its uses. Unknown unicast has a clear usage and of course so does multicast. Anycast is a nearest node address that is shared between many devices. That’s the definition I remember reading on my first encounter with it. I could vaguely see why it would be used but the benefits of anycast didn’t become apparent until I started to study VXLAN.
Remember that the goal of VXLAN is to decouple IP with physical location. This means that if I had an address of 192.168.1.100, I can be behind any switch in the network. Traditionally, subnets existed in fixed spaces. 192.168.1.0/24 is over here in this part of the building and 192.168.3.0/24 is over there on that part of the building. A device that was moved from one place to another would require a different address. This is not so with VXLAN. How this actually works requires that the default gateway for devices that are moving to move with the device. Clearly, physically moving the gateway with the device is impractical but logically moving it is possible.
This is where anycast enters the solution. L3 switches acting as default gateways are configured with anycast addresses. As long as they are connected to hosts on a particular VXLAN segment, they also have that segment’s default gateway IP configured in the SVI. For example if a switch was hosting segments 30010 and 30020, SVI 30010 would have the default IP for segment 30010 and SVI 30020 would have the default IP for segment 30020. Any device from segments 30010 and 30020 can be moved to this switch and still maintain reachability to the rest of the network. From the device’s perspective, the default gateway hasn’t changed at all because it has the gateway’s IP and MAC information are unchanged.
The other major piece of this solution is BGP EVPN L2VPN. It has an awesome name and is used in other places like VPLS/VPWS MPLS networks. Like all MP-BGP protocols, there are extended communities and several different route types that comprise the NLRIs. The big picture here is that BGP is used to distribute MAC information. When a host device first contacts its default gateway, it provides its MAC information in the message. The gateway then advertises that MAC information into the BGP EVPN L2VPN control plane, essentially saying “If you are part of this VXLAN segment listen up. I have this device with this MAC address and this IP so if you need to reach this device, send your packets my way.”
Of course, there are a lot of technical details like what route type the gateway uses to advertise that information, whether the peering between the gateway and the spine switches is an iBGP or eBGP peering and what type of mechanism is used for BUM (Broadcast, Unknown unicast and Multicast) traffic, but I’ll leave those details out. Practically, this MAC-IP association allows host device mobility. Let’s walk through how that process occurs.
- Host device behind gateway 1 is moved to gateway 2.
- Host device sends a packet to the default gateway, now gateway 2.
- Gateway 2 realizes that a remotely reachable host is now a locally reachable host and advertises that information into the BGP EVPN L2VPN control plane.
- All gateways update their routing tables to reflect that information.
- Traffic to the host device is routed to gateway 2 until the next mobility event.
This is just a broad view of how VXLAN works. There are many other considerations that need to be made and the actual configuration is no joke either. When everything is configured and running, it’s a treat to see. I used a combination of Cisco configuration guides along with Cisco published Building Data Centers with VXLAN BGP EVPN: A Cisco NX-OS Perspective for most of the theory.