VXLAN L3 applies EVPN to present a complete overlay network

Preface

VXLAN (Virtual eXtensible LAN) is an overlay network technology defined by RFC7348. VXLAN is essentially MAC in IP (or MAC in UDP), so that an L2 network can be built on an L3 IP network. The L2 network here is the overlay network, the L3 network is the underlay network, and the L3 network is transparent (imperceptible) to the overlay network. The L3 network can be an IP network in a data center or a VPN-based IP network across data centers. Therefore, using VXLAN can achieve the following:

[[204421]]

Provide L2 connections between multiple racks or computer rooms in a DC (Data Center). Using VXLAN can avoid building physical L2 connections between racks or computer rooms, which makes wiring simpler.
Provide L2 connectivity between multiple remote DCs as long as there is L3 connectivity between these DCs.

VXLAN provides an L2 service on top of the L3 network for hosts (physical and virtual) in the overlay network, which is usually called VXLAN bridging. At the same time, the overlay L2 network supports multi-tenancy, that is, it supports multiple isolated L2 networks. How do these isolated L2 networks communicate with each other? Like other isolation technologies, such as VLAN, routing is required. So this time let's talk about VXLAN routing and EVPN's support for VXLAN routing. VXLAN routing is traditionally done through a centralized L3GW (VXLAN Router). All L3 forwarding needs to pass through the L3 GW before it can be forwarded from one L2 network to another.

If two hosts are in different VXLAN L2 networks but under the same VTEP (VXLAN Tunnel Endpoint), when they want to communicate with each other, the network traffic must first flow from VTEP to L3GW, complete the Layer 3 forwarding at L3GW, and then be sent back to the same VTEP. This kind of roundabout traffic is a hair-pin traffic. In this way, VXLAN routing has been implemented, but there are two problems here. One is the traffic bottleneck. The performance of the centralized L3GW determines the maximum rate of the Layer 3 traffic; the other is the hair-pin traffic, which will bring unnecessary network load.

EVPN can be used as the control plane of VXLAN, and EVPN technology also provides a solution for L3 optimization of VXLAN. Let's take a look at how EVPN completes VXLAN routing. First, let's look at the optimization of the data plane proposed by EVPN, and then introduce how EVPN as the control plane supports the optimized data plane.

1. Routing and Bridging Integration

VTEP itself supports L2 bridging. L2 communication between hosts connected to the same VTEP will not generate VXLAN data, but will be forwarded directly at VTEP. VXLAN IRB (Integrated Routing and Bridging) refers to the simultaneous implementation of L2 bridging and L3 routing functions on VTEP. L3 traffic under the same VTEP will not generate VXLAN data, but will be forwarded at VTEP.

The implementation of IRB is described in Integrated Routing and Bridging in EVPN, an EVPN application draft.

1.1 VRF

VRF can be called Virtual Routing & Forwarding, sometimes also called VPN Routing & Forward, which is something similar to Linux network Namespace. VRF generally runs on dedicated network devices, and each VRF has independent forwarding information. In this way, you don't need multiple devices, but only one device to create an isolated environment for multiple tenants. Each tenant uses its own VRF to complete routing and forwarding independently. In the figure below, three isolated VRFs can be implemented for three tenants on one device.

Since it is called VRF, there is no doubt that its function is related to routing/forwarding. In EVPN, there are two types of VRF, MAC-VRF and IP-VRF. MAC-VRF can be regarded as an L2 switch, and IP-VRF can be regarded as an L3 router. The relationship between them is shown in the following figure:

1.2 Distributed Gateway DAG

The working mode of the router is to perform routing forwarding between gateways. Since IP-VRF can be regarded as a router, IP-VRF also needs a gateway. The problem of centralized gateways was mentioned earlier. Generally, centralized problems can be solved by distribution, so a distributed gateway is defined here: DAG (Distributed Anycast Gateway). DAG exists in the relevant IP-VRF on all VTEPs and has the same MAC/IP address. In other words, the original centralized VXLAN L3 GW has been copied multiple times, placed in each VTEP, and inserted into the IP-VRF. The corresponding schematic diagram is shown below. The same Gateway of the same tenant is on each VTEP.

Since DAG exists on every VTEP, the two previous problems no longer exist. First, the bottleneck problem no longer exists because there are more bottlenecks now; second, the card issuance traffic problem no longer exists because VTEP has a gateway, and L3 forwarding under the same VTEP does not need to go out of VTEP (no VXLAN is generated). DAG uses the same IP/MAC on all VTEPs, which can reduce IP usage on the one hand, and on the other hand, when a host migrates from one VTEP to another, the Gateway information in the host does not need to be updated.

When the host connected to the VTEP needs to perform L3 routing forwarding, the nearest DAG, that is, the DAG on the current VTEP, is always selected to complete the routing forwarding. How is the routing forwarding here completed? There are two ways.

1.3 Asymmetric Routing IRB

Asymmetric routing. Taking the following figure as an example, we can see that the round-trip paths of a complete request are different, so this mode is called asymmetric routing (Asymmetric IRB).

Let's take ping as an example. When Host A accesses Host B, Host A sends a ping packet to VTEP1 (V1) to which it is connected. Since this is a cross-subnet request, the destination MAC address of the ping packet is encapsulated into the gateway address (DAG MAC address) in Host A. Based on the destination MAC address, VTEP1 can find that this is a request that needs to be forwarded at L3, so it is transferred to IP-VRF for processing. IP-VRF will store the VXLAN ID and MAC address corresponding to Host B. Why is it there? This is done by the EVPN control layer, which will be introduced later. Next, VTEP1 will replace the destination MAC in the ping packet (originally DAG MAC) with Host B's MAC, and encapsulate the ping packet in a VXLAN packet with the VXLAN ID, and send it to VTEP3 (V3) through the yellow VXLAN tunnel. The yellow VXLAN is actually the VXLAN where Host B is located.

VTEP3 receives the data from the yellow VXLAN tunnel and processes it directly in its own MAC-VRF. As mentioned earlier, MAC-VRF is equivalent to an L2 switch, so MAC-VRF can directly forward the request to Host B, which is directly connected to the current VTEP. The overall process is as follows:

When Host B returns data to Host A, it is first sent to VTEP3. Similarly, VTEP3 directly encapsulates the data into blue VXLAN data where Host A is located and sends it to VTEP1. Similarly, VTEP1 receives the blue VXLAN data, searches the local MAC-VRF, finds the corresponding MAC address record, and sends it to Host A.

I found a picture of ARISTA (delete if infringement), so it can be seen that routing occurs at the source VTEP, and then VXLAN encapsulation is performed, and the Layer 2 network of VXLAN is used:

1.4 Symmetric IRB

Symmetric routing, taking the following figure as an example, we can see that the round-trip path of a complete request is the same, so this mode is called symmetric routing (Symmetric IRB).

The difference between symmetric routing and asymmetric routing is that symmetric routing has an additional black L3 VNI.

Let's take ping as an example. When Host A accesses Host Y, Host A sends a ping packet to VTEP1 (V1) to which it is connected. Since this is a cross-subnet request, the destination MAC address of the ping packet will be encapsulated into the gateway address (DAG MAC address) in Host A. Based on the destination MAC address, VTEP1 can find that this is a request that needs to be forwarded at L3, so it is transferred to IP-VRF for processing. So far, it is the same as asymmetric routing. The next part is different. The L3 VNI and the MAC address of VTEP3 (V3) where Host Y is located are stored in IP-VRF. Next, VTEP1 will replace the destination MAC of the ping packet (originally DAG MAC) with the MAC address of VTEP3, and encapsulate the VXLAN data with L3VNI, and send it to VTEP3 through the black VXLAN tunnel.

VTEP3 receives the black VXLAN data, first routes it in its own IP-VRF, then sends it to the corresponding MAC-VRF, and finally forwards it to Host Y. The return journey is a similar process.

The overall process is shown in the figure below:

Comparing these two routing methods, asymmetric routing (Asymmetric IRB) is simple to implement and does not require additional VXLAN allocation, but it is necessary to create gateways for all subnets on all VTEPs, even if the current VTEP is not connected to the host in the subnet. Because as can be seen from the above figure, VTEP needs to connect to two asymmetric VXLAN tunnels at the same time, even if VTEP1 does not have the host in the yellow VXLAN. Symmetric routing (Symmetric IRB) is more complicated to implement, but VTEP only needs to connect to the VXLAN and L3 VNI of the host it manages. These two distributed routing methods have their own advantages and disadvantages, and neither can replace the other. Some manufacturers have implemented both methods, and some have only implemented one.

2. VXLAN Routing Control Layer

The previous introduction mentioned MAC-VRF and IP-VRF. MAC-VRF implements bridging. MAC-VRF stores L2 forwarding information. The transmission of L2 forwarding information is introduced in the previous article VXLAN with EVPN as Control Plane. I will not repeat it here. If you are interested, you can go back and read it. Next, let's take a look at how EVPN transmits the L3 forwarding information required in IP-VRF.

Let's first review the format of MAC/IP Route (Route type 2) when EVPN is used as the VXLAN control layer.

For the transmission of L2 information, there are two options: IP Address and L3 VXLAN ID. If L3 information needs to be transmitted, these two options will no longer be optional. First, the IP address must be filled in so that when IP-VRF is encapsulating the packet, it can obtain information based on the destination IP. If Symmetric IRB is used, L3 VXLAN ID must also be filled in because symmetric routing requires a dedicated VXLAN channel to transmit L3 data.

Next, let's go over the control layer data transmission process. VTEP still obtains the IP/MAC of the locally connected host through Local learning (detecting ARP or other packets) and then generates a BGP Route. This BGP Route actually contains MAC forwarding information and IP forwarding information. Therefore, after the BGP Route, there will be two RTs (Route Targets), one for MAC-VRF and the other for IP-VRF. The VTEP on the other end receives the corresponding BGP Route based on the RT. MAC-VRF records the MAC forwarding information, and IP-VRF records the IP forwarding information, as follows:

Asymmetric IRB: IP-VRF records the host's IP, the host's corresponding L2 VNI, the host's VTEP information, and the host's MAC address. With this information, IP-VRF can complete L3 forwarding and VXLAN data encapsulation based on the destination IP address.
Symmetric IRB: IP-VRF records the host's IP, the host's corresponding L3 VNI, and the host's VTEP information (which is more complicated). With this information, IP-VRF can complete the corresponding L3 forwarding VXLAN data encapsulation based on the destination IP address.

Therefore, it can be seen that by injecting the optional IP address and L3 VXLAN ID into the original Route type2, the information required for VXLAN routing can be transmitted.

This mode is suitable for hosts that are directly connected to VTEP, and the hosts of a tenant network are scattered under various VTEPs. Each VTEP is connected to hosts with discrete IP addresses.

What if the VTEP is not connected to a host but to other forwarding devices, such as a router? At this time, the VTEP will see a large number of consecutive IP addresses sent from the same MAC address (router gateway MAC address). If Route Type 2 is still used at this time, the following problems will occur:

The IP address segment can be represented by an IP prefix (CIDR of other subnets connected by the router), but now each IP can only send a separate BGP route. In other words, the forwarding that can be handled by a single BGP route can now only be divided into thousands of BGP routes for forwarding.
When the router gateway is a floating IP, if migration occurs, the IP address will remain unchanged, but the corresponding MAC address may change. In this case, all IP addresses forwarded through the gateway need to send a BGP Route to update the MAC address changes.

3. Route Type 5

Therefore, in order to support the connection of forwarding devices under VTEP and support VXLAN L3, EVPN needs to be further improved (expanded). Another EVPN application draft, IP Prefix Advertisement in EVPN, solves the above problems. Let's review all the new MP-BGP routing types of EVPN.

So this time we come to Route type 5, let’s take a look at the format of Route Type 5.

For VXLAN, you only need to pay attention to the following:

IP Prefix Length and IP Prefix: destination IP address segment.
GW IP Address: The gateway reachable by VTEP. The IP address segment represented by IP Prefix is further forwarded through this gateway.

Route Type 5 is not used alone, but is used in conjunction with the previous Route Type 2. VTEP first encapsulates the GW IP (such as the router gateway) to which it is connected and its corresponding MAC address and other information in Route Type 2, and then sends it to other VTEPs. In this way, other VTEPs first know how to reach the GW IP. Here, the GW IP can be regarded as a Host IP.

Next, Route Type 5 is sent. As can be seen from the figure above, the only thing Route Type 5 does is to associate an IP address (IP Prefix) with the GW IP.

Let’s review the two questions above:

L3 forwarding information propagation about IP Prefix. Now, no matter how long the IP Prefix is, only one Route Type 2 is needed to pass the gateway, and then one Route Type 5 is needed to pass the IP Prefix information. When VTEP (IP-VRF) receives a network packet with a destination IP address in the IP Prefix, it finds the GW IP according to the IP Prefix and then sends the network packet to the GW IP. As long as the network packet reaches the GW IP, it can rely on the forwarding capability of the GW itself to continue forwarding the network packet to the real device. This is actually similar to the forwarding of multiple routers in reality.
If the GW IP is a floating IP, when its MAC address changes, you only need to send a BGP withdraw command to withdraw the previous Route Type 2 information about the GW IP, and then send a new BGP Route Type 2 to bring the new MAC address of the GW IP. There is no need to deal with the update of the IP Prefix at all.

IV. Conclusion

The application of EVPN in VXLAN L3 has been discussed in general. Looking back, EVPN was first proposed to solve the problem of MPLS L2 VPN caused by data layer learning (flood-learn). The solution is to add the EVPN protocol address family on the basis of MP-BGP, where E stands for Ethernet. With the development of technology, EVPN is no longer limited to the scope of L2, and when discussing EVPN, it no longer refers to a specific L2 VPN, but to a protocol address family in MP-BGP.

On the other hand, since VXLAN itself does not have a control layer, and VXLAN was originally proposed to obtain forwarding information through data layer learning, EVPN can also be used to provide a control layer for VXLAN, suppress VXLAN BUM packets, and improve the efficiency of the VXLAN data layer. As the control layer of VXLAN, EVPN also provides the transmission of L2 and L3 forwarding information. Therefore, combining EVPN and VXLAN can provide a complete overlay basic network.

These are currently still within a data center. VXLAN itself can build an overlay network on top of any underlay L3 network. As mentioned at the beginning of the article, if there is an L3 network connection between DCs, VXLAN can also be built. Therefore, another major use of EVPN combined with VXLAN is to interconnect DCs. I hope to have the opportunity to talk about this aspect next time.

About the author: Xiao Honghui, graduated from the Graduate School of the Chinese Academy of Sciences, has 8 years of work experience, including 6 years of experience in cloud computing development. He is active in the OpenStack community and has contributed more than 300 commits and more than 30,000 lines of code. He is currently focusing on virtual network technologies such as SDN/NFV. All opinions in this article represent the author's personal opinions only and have nothing to do with the author's current or previous company.

<<: Unlocking the shackles of 5G network development: Cloud-native NFV is indispensable

>>: Liu Guangyi: Spectrum unification promotes early commercial use of 5G and smooth evolution of 4G networks to 5G

Don't understand the network I/O model? How to get started with Netty

Yecao Cloud year-end promotion: Hong Kong cloud server CN2+BGP line starts from 138 yuan/year, Hong Kong independent server starts from 299 yuan/month

Blog

Expert: It’s time to promote 5G application innovation

Blog

Huawei Cloud Online Education Innovation Season is launched, 365 innovation upgrade package fully empowers education upgrade in the 5G era

[51CTO.com original article] Recently, Internet e...

VXLAN L3 applies EVPN to present a complete overlay network

Don't understand the network I/O model? How to get started with Netty

Hosteons Limited Time Offer: US 100G High-Defense KVM Special Offer Starting at $11 Per Year

Why does Wi-Fi need 6GHz?

Communication styles in microservices architecture

Don’t be pessimistic: 5G has started quickly, but the power has just begun to show

Huawei has entered Jiangmen to help build a smart city and big data industry chain

SmartHost: AMD Ryzen series 35% off, starting at $2.57/month, Los Angeles/New Jersey/Las Vegas data centers

Unleashing the power of the tactile internet through 5G networks

Yecao Cloud year-end promotion: Hong Kong cloud server CN2+BGP line starts from 138 yuan/year, Hong Kong independent server starts from 299 yuan/month

Expert: It’s time to promote 5G application innovation

Recommend

Wi-Fi 6 forces basic network equipment to upgrade

Can T-Mobile and Sprint's merger ease their 5G anxiety?

With so many mobile payment options available, which one will dominate the market?

Insufficient coverage: South Korean 5G users can access 5G networks only 15% of the time

Huawei Cloud Online Education Innovation Season is launched, 365 innovation upgrade package fully empowers education upgrade in the 5G era

Do you know how much power 5G actually consumes?

Gartner: Enterprises rethink software security strategies

What exactly is the “computing power network”?

How to Understand and Evaluate Potential Colocation Data Center Providers?

Small gateways and big integration - Tsinghua Unigroup's H3C ICT converged gateway leads the intelligent transformation

What is DNS? Why are there only 13 DNS root servers? Is it really that difficult to give one to China?

Omdia: Traditional PON equipment vendors face three major challengers

Hostodo Independence Day special package starts at $13.99 per year, Las Vegas/Miami data center

CheapWindowsVPS: $4.5/month KVM-4GB/60GB/1Gbps unlimited traffic/7 data centers available