Combining VXLAN and EVPN

Combining VXLAN and EVPN

EVPN is one of the hottest network technologies in recent years. If you haven't heard of EVPN, your network skills may be outdated. Hurry up and read the previous "EVPN Introduction"! EVPN stands for Ethernet VPN. From the name, it is an implementation of L2 VPN. In fact, when it was first proposed, it was also used as L2 VPN, known as the next generation L2 VPN, to improve the original VPLS (Virtual Private LAN Service). Therefore, the initial EVPN is a set of control layer and data layer technologies for L2 VPN across WAN (Wide Area Network), and the data layer specifically refers to MPLS. All of these are introduced in the EVPN Introduction. Today we will look at the changes in an EVPN application, combining the control layer of EVPN with VXLAN.

As a control layer, EVPN can usually connect to three data layers: MPLS, PBB, and NVO. NVO (Network Virtualization Overlay) includes VXLAN (Virtual eXtensible LAN).

Traditional VXLAN Working Mode

VXLAN is an overlay network technology. I believe everyone has a certain understanding of overlay and VXLAN, so I will not explain them here. Let's take a look at how the VXLAN network works.

VTEP

First, let's look at the important component of the VXLAN network, VTEP (VXLAN Tunnel Endpoint). VTEP is a network device. VXLAN data is transmitted between VTEPs. Logically, VTEP contains two interfaces: uplink and downlink. Uplink connects to the Underlay network, and the original data is encapsulated into VXLAN format and transmitted on the Underlay network through uplink; downlink connects to the Overlay network, and the original data is transmitted from downlink. Therefore, VTEP can be regarded as an edge device connecting the Overlay and Underlay networks.

For example, when a VLAN 100 packet in the Overlay is sent to the VTEP through the downlink, it is first mapped to the VXLAN ID 1001. After that, the VTEP searches the VTEP L2 Table for the corresponding Remote VTEP based on the destination MAC address of the original packet and the VXLAN ID just converted. If it can be found, the original Ethernet Frame is encapsulated into a VXLAN packet and then sent out through the uplink.

The uplink of the peer VTEP receives the VXLAN data packet, decapsulates it to obtain the original Ethernet Frame, maps the VXLAN ID to the VLAN ID, adds the information of VLAN100, and finally sends the data packet out through the downlink. In this way, the VLAN 100 networks under the two VTEPs are connected. (Note: Although both are VLAN 100, the VLAN IDs corresponding to the same VXLAN ID under the two VTEPs can actually be different)

The original Ethernet Frame is encapsulated into an IP/UDP packet, and the data transmission becomes the IP/UDP packet transmission between VTEPs. The network between VTEPs can be a Layer 2 network, a Layer 3 network, or even more complex, but this is transparent to VLAN 100.

flood-learn

In the previous example, if the corresponding Remote VTEP is not found in the VTEP L2 Table, flood-learn is used to obtain the VTEP of the other end.

To better describe flood-learn, we assume that the leftmost VM already knows the destination MAC (the L2 Table in the VTEP has aged, but the ARP cache in the VM has not aged). When the leftmost VM wants to ping the rightmost VM, the ping packet is sent to the VTEP. Since the corresponding Remote VTEP cannot be found in the VTEP, the VTEP will do the following:

The original Ethernet Frame is encapsulated into the VXLAN format, and the outer destination IP address of the VXLAN packet is the multicast address.

The VXLAN packet is sent to all other VTEPs in the multicast.

In fact, this is the flood process. Because all VTEPs in the multicast are receivers, the rightmost virtual machine can receive the multicast ping packet. The rightmost VTEP first learns the MAC address, VXLAN ID and corresponding VTEP of the leftmost virtual machine from the ping packet. Because of this information, when the rightmost virtual machine returns, it will be sent directly to the leftmost VTEP. In this way, the leftmost VTEP can also learn the MAC address, VXLAN ID and corresponding VTEP of the rightmost virtual machine from the return packet, and record it in its own L2 Table. This is the learn process. Unlike the flood-learn in the switch, the switch records the relationship between the corresponding switch port and MAC, and here records the relationship between the Remote VTEP (IP Address) and MAC.

Next time the leftmost VM wants to access the rightmost VM, it does not need to flood anymore. It can directly check the VTEP L2 Table to find the corresponding remote VTEP.

So from here we can see that VXLAN forwarding information is also obtained through flood-learn at the data layer. VXLAN can work without a control layer, which is very similar to the situation of VPLS!

EVPN control plane

VXLAN is defined by RFC7348. In RFC, only the behavior of the data layer is defined, and the VXLAN control layer is not specified. In the early days of VXLAN technology, forwarding information was obtained through the data layer, which was relatively simple to implement and had a low technical threshold, which was conducive to manufacturers implementing VXLAN. However, as the scale of the network grows, relying entirely on the data layer for control will cause broadcast and multicast storms in the network, so VXLAN also needs a control layer.

SDN controller can also be used as the control layer of VXLAN. SDN controller is commonly used in OpenStack to control OpenVSwitch, and VTEP is directly managed through OVSDB and OpenFlow flow tables. These contents are very interesting and powerful, but they are two different things from this article. This article only discusses the situation where EVPN is used as the control layer.

EVPN as the control layer of NVO is defined by the IETF draft: draft-ietf-bess-evpn-overlay. The previous article mentioned that the implementation of EVPN refers to BGP/MPLS L3 VPN. When EVPN is used as the control layer of VXLAN, it still uses the same architecture, but the components of the architecture have changed.

The specific transformations include:

  • The PE device becomes a VTEP, sometimes also called NVE (Network Virtualization Endpoint). The corresponding MP-BGP connection is also established between VTEPs.
  • The data layer becomes VXLAN, which is transmitted on the Underlay network.
  • The CE device becomes a server, which can be a virtual server or a physical server.

Control layer data transmission

Basically the same as traditional EVPN, the server to VTEP still uses local learning, and VTEP obtains the MAC address and corresponding port of the local connection device by reading the Ethernet Frame. The MAC/IP route of Route Type 2 is transmitted between VTEPs through MP-BGP. There is a little difference here. When MPLS is used as the data layer, the MAC/IP route transmits the MPLS Label, and when VXLAN is used as the data layer, the MAC/IP route transmits the VXLAN ID. It happens that the VXLAN ID is also 3 bytes, which can match the space of the original MPLS Label. The corresponding NLRI information is as follows:

MAC/IP route is transmitted to the peer VTEP via MP-BGP. In reality, BGP connection is required to be full mesh (any two-to-one connection), and in order to reduce configuration pressure, BGP RR (Router Reflector) is usually introduced. The function of BGP RR is to reflect the data of a BGP Speaker to all other connected BGP peers. Using BGP RR allows all BGP Speakers to establish a connection with BGP RR only. Otherwise, according to the full mesh, any BGP Speaker must establish a BGP connection with all other BGP peers.

So, in an environment with BGP RR, the network topology is as shown in the figure below. Is it very similar to the Spine-Leaf network structure?

After all VTEPs learn the local MAC address, they send it to BGP RR through MP-BGP. BGP RR then sends the received MAC forwarding information to all other VTEPs. After reflection from BGP RR, each VTEP already has the MAC forwarding information of all other VTEPs, as shown in the following figure:

Take a look at the L2 Table of each VTEP in the figure. The first column is the MAC address, the second column is the corresponding Remote VTEP (remote MAC) or the port to which the current VTEP is connected (local MAC), and the third column is the VXLAN ID. These three columns were mentioned when introducing VTEP. The fourth column is used for MAC Mobility, which is used for MAC migration, which will be introduced separately later.

In this way, the control plane data is distributed to each VTEP.

Data layer data transmission

With the control layer data, the data layer is much simpler. Server A wants to access Server B. It finds VTEP2 by searching the local VTEP L2 Table, then encapsulates it into VXLAN data and sends it to VTEP2. VTEP2 decapsulates the VXLAN and forwards it to the local Server B. Therefore, it can be seen that from the perspective of the data layer, the effect is the same whether there is EVPN or not. EVPN is only responsible for the control layer of VXLAN, that is, the transmission of MAC forwarding information, and has no impact on the VXLAN data layer.

This is how EVPN works as the control plane of VXLAN. Isn’t it too complicated? Next, let’s take a look at MAC Mobility.

MAC Mobility

MAC Mobility has been defined in RFC7432, which means it is not specifically defined for VXLAN. Let's first look at what problems MAC Mobility solves.

In reality, we often face server migration scenarios, such as virtual machine migration and migration caused by the transformation of physical computer rooms. Take the above figure as an example. When Server A migrates from VTEP1 to VTEP3, VTEP3 discovers Server A through learning at the local data level (reading the Ethernet Header of ARP or DHCP). Originally, the MP-BGP process on VTEP3 should send this newly learned MAC to other VTEPs through Route type 2. But now there are several problems. First, VTEP3 itself already has Server A's MAC forwarding information, indicating that Server A is on VTEP1, so VTEP3's local data has already conflicted. Second, VTEP1 and VTEP2 also have Server A's MAC forwarding information. How will they handle the forwarding information of Server A sent by VTEP3? You can say that the later one covers the earlier one, but EVPN, or MP-BGP, is a L7 protocol, and the later one is not necessarily the newer data. For example, Server A migrates to VTEP3, VTEP3 learns Server A's MAC locally and sends it out, VTEP2 receives this information, but due to network congestion, VTEP1's information about Server A is also sent to VTEP2 after a while. If the later information covers the earlier information, then the old information covers the new information.

Before discussing the solution, let's first review the MP-BGP Route type and BGP Extended community newly added by EVPN.

MAC Mobility is implemented based on the Extended community marked in the figure. BGP Extended community is auxiliary information following BGP NLRI information. RT (Route Target) is the most commonly used BGP Extended community. The format of MAC Mobility Extended Community is defined as follows:

So, the specific working process is as follows: when VTEP obtains a MAC address through local data layer learning, if there is no record of this MAC address in the local L2 Table, the MAC/IP route issued by VTEP next will carry a MAC Mobility Extended Community, in which the Sequence Number is 0. (It can also be omitted, in which case the default is 0) This is the meaning of the 0 in the fourth column in the above diagram.

When VTEP obtains a MAC address through local data layer learning, if there is already a record of this MAC address in the local L2 Table, then VTEP will first update its own L2 Table to overwrite the previous MAC forwarding information. The MAC/IP route released by VTEP afterwards will inevitably carry a MAC Mobility Extended Community, in which the Sequence Number is the value of the original record plus 1. In this way, when other VTEPs receive this MAC/IP route and compare it with the local record, they will find that this is an updated MAC forwarding information and the original record will be overwritten. If the Sequence Number in the MAC/IP route received by VTEP is smaller than the current recorded information, then this MAC/IP route will be discarded.

The Sequence Number in MAC Mobility can be regarded as the version number of MAC forwarding information. A higher version can overwrite a lower version.

From another perspective, if there is no MAC Mobility mechanism at the control layer, after the server is migrated, it can only wait for the table entries in the L2 Table to age and then re-flood-learn to obtain the updated MAC forwarding information, which takes a relatively long time. This is one of the benefits of EVPN as a control layer.

at last

EVPN is proposed as the next generation of L2 VPN. L2 VPN is actually a logical Layer 2 network on the WAN. If L2 VPN is compared with Overlay technology, such as VXLAN, there are actually many similarities. For example, both have Overlay and Underlay, Overlay is an L2 network, and both have corresponding edge devices. It is precisely because of these similarities that EVPN, which was originally the control layer of L2 VPN, is used as an Overlay network, such as the control layer of VXLAN network. VXLAN network with EVPN as the control layer can not only reduce the number of broadcasts and multicasts in the network, but also bring some advantages of EVPN as the control layer, such as MAC Mobility introduced in this article. In a VXLAN Fabric architecture, using EVPN as the control layer, with the help of the fully functional BGP (to be precise, MP-BGP) protocol, different PODs and even different sites can be efficiently connected. So from this perspective, the application of EVPN as the control layer of VXLAN is not inferior to its application as L2 VPN.

About the author: Xiao Honghui, graduated from the Graduate School of the Chinese Academy of Sciences, has 8 years of work experience, including 6 years of experience in cloud computing development. He is active in the OpenStack community and has contributed more than 300 commits and more than 30,000 lines of code. He is currently focusing on virtual network technologies such as SDN/NFV. All opinions in this article represent the author's personal opinions only and have nothing to do with the author's current or previous company.

<<:  Say goodbye to being trapped by walls: three magic tools to make your home WiFi full of

>>:  Microsoft blocks IE10 and other older browsers from accessing official websites

Recommend

Six pictures to help you evolve from HTTP/0.9 to HTTP3.0

[[422169]] One day, Xiaolin went to an interview ...

Web3.0 Technology: Unlocking the Future of the Internet

The Internet, the dynamic force that has reshaped...

Wi-Fi 6: What's different and why does it matter?

Wi-Fi 6 is the next generation wireless standard ...

China Mobile launches A-share listing: "Making money" but not "cutting leeks"

On the evening of May 17, World Telecommunication...

Huawei Cloud: Enterprise-level cloud host 2C4G5M 707 yuan/year

Huawei Cloud's various activities are also on...

Will the next 5G data package be an “unlimited data package”?

In the past two years, 4G unlimited data packages...

Message bus for communication between processes

[[381755]] 1. Inter-process communication (IPC) in...

Is SD-WAN dead? The answer is of course no

​At first glance, everyone must be shocked by thi...