Why does the cluster need an Overlay network?

Why does the cluster need an Overlay network?

Engineers who have a little knowledge of computer networks or Kubernetes networks should have heard of overlay networks. Overlay networks are not actually a new technology. They refer to computer networks built on another network. This is a form of network virtualization technology. The evolution of cloud computing virtualization technology in recent years has promoted the application of network virtualization technology.

Figure 1 - Extended Network

Because the Overlay network is a virtual network built on top of another computer network, it cannot appear independently. The network that the Overlay relies on at the bottom is the Underlay network. These two concepts often appear in pairs.

The underlay network is an infrastructure layer specifically used to carry user IP traffic. The relationship between the underlay network and the overlay network is somewhat similar to that between physical machines and virtual machines. Both the underlay network and the physical machine are real entities, corresponding to real network devices and computing devices respectively, while the overlay network and virtual machines are virtual layers based on the underlying entities using software.

Figure 2 - Networking and computing

Before analyzing the role of Overlay network, we need to have a general understanding of its common implementation. In practice, we generally use Virtual Extensible LAN (VxLAN) technology to build Overlay network. In the figure below, two physical machines can access each other through the three-layer IP network:

Figure 3 - Overlay network composed of VxLAN

VxLAN uses a virtual tunnel endpoint (VTEP) device to perform secondary encapsulation and decapsulation on data packets sent and received by the server.

In the figure above, the two VTEPs will connect to each other and obtain information such as MAC addresses and IP addresses in the network. For example, the VTEP in Server 1 needs to know that if it wants to access the 10.0.0.2 virtual machine in the green network, it must first access Server 2 with an IP address of 204.79.197.200. These configurations can be manually configured by the network administrator, automatically learned, or set by the upper-level manager. When the green 10.0.0.1 virtual machine wants to send data to the green 10.0.0.2, it will go through the following steps:

Figure 4 - Data packets in the overlay network

(1) The green 10.0.0.1 will send the IP packet to the VTEP;

(2) After the VTEP of Server 1 receives the data packet sent by 10.0.0.1;

  • Get the MAC address of the destination virtual machine from the received IP packet;
  • Find the IP address of the server where the MAC address is located in the local forwarding table, which is 204.79.197.200;
  • Use the virtual network identifier (VxLAN Network Identifier, VNI) where the green virtual machine is located and the original IP data packet as the payload to construct a new UDP data packet;
  • Send a new UDP packet to the network;

(3) After the VTEP of Server 2 receives the UDP packet;

  • Remove the protocol header from the UDP data packet;
  • Check the VNI in the data packet;
  • Forward the IP packet to the target green server 10.0.0.2;

(4) The green 10.0.0.2 will receive the data packet sent by the green server 10.0.0.1;

During the transmission of data packets, neither party knows about the conversions made by the underlying network. They think they can access each other through the Layer 2 network, but in fact, they are connected through the tunnel established between VTEPs after being transferred through the Layer 3 IP network. In addition to VxLAN, there are many implementation schemes for Overlay networks, but they are all similar. Although Overlay networks can use the underlying network to form a Layer 2 network between multiple data centers, its packet encapsulation and unpacking process will also bring additional overhead, so why do our clusters need Overlay networks? This article will introduce three problems solved by Overlay networks:

  • Migration of virtual machines and instances within a cluster, across clusters, or between data centers is common in cloud computing;
  • The size of virtual machines in a single cluster may be very large, and a large number of MAC addresses and ARP requests will put tremendous pressure on network devices;
  • Traditional network isolation technology VLAN can only establish 4096 virtual networks. Public clouds and large-scale virtualization clusters require more virtual networks to meet network isolation requirements.

Virtual Machine Migration

Kubernetes is now the de facto standard in the field of container orchestration. Although many traditional industries still use physical machines to deploy services, more and more computing tasks will run on virtual machines in the future. Virtual machine migration is the process of moving a virtual machine from one physical hardware device to another. Due to daily updates and maintenance, large-scale virtual machine migration in a cluster is a common thing. A large cluster consisting of thousands of physical machines makes it easier to schedule resources within the cluster. We can use virtual machine migration to improve resource utilization, tolerate virtual machine errors, and improve node portability.

When the host machine where the virtual machine is located goes down due to maintenance or other reasons, the current instance needs to be migrated to another host machine. In order to ensure that the business is not interrupted, we need to ensure that the IP address remains unchanged during the migration process. Because Overlay implements the Layer 2 network at the network layer, a virtual LAN can be formed between multiple physical machines as long as the network layer is reachable. After the virtual machine or container is migrated, it is still in the same Layer 2 network, so there is no need to change the IP address.

Figure 5 - VM migration across data centers

As shown in the figure above, although the migrated virtual machine and other virtual machines are located in different data centers, since the two data centers can be connected through the IP protocol, the migrated virtual machine can still form a second-layer network with the virtual machines in the original cluster through the Overlay network. The hosts within the cluster are completely unaware of and do not care about the underlying network architecture. They only know that different virtual machines can be connected.

Virtual machine scale

We have introduced in Why Mac addresses do not need to be globally unique that communication in the Layer 2 network depends on MAC addresses. A traditional Layer 2 network requires the network device to store a translation table from IP addresses to MAC addresses.

Currently, the largest cluster officially supported by Kuberentes is 5,000 nodes. If each of these 5,000 nodes contains only one container, this does not put much pressure on the internal network equipment. However, in reality, a cluster of 5,000 nodes contains tens of thousands or even hundreds of thousands of containers. When a container sends an ARP request to the cluster, all containers in the cluster will receive the ARP request, which will bring extremely high network load.

In the Overlay network built using VxLAN, the network will re-encapsulate the data sent by the virtual machine into IP packets, so that the network only needs to know the MAC addresses of different VTEPs, thereby reducing the hundreds of thousands of data in the MAC address table to a few thousand, and ARP requests will only spread between the VTEPs in the cluster. After the remote VTEP unpacks the data, it will only broadcast it locally and will not affect other VTEPs. Although this still has high requirements for the network devices in the cluster, it has greatly reduced the pressure on the core network devices.

Figure 6 - ARP request in the overlay network

Overlay networks are actually closely related to software-defined networking (SDN)[^4], and SDN introduces the data plane and the control plane, where the data plane is responsible for forwarding data, and the control plane is responsible for calculating and distributing routing tables. VxLAN's RFC7348 only defines the content of the data plane. The network composed of this technology can learn the MAC and ARP table entries in the network through the traditional self-learning mode[^5], but in large-scale clusters, we still need to introduce a control plane to distribute routing tables.

Network Isolation

Large-scale data centers often provide cloud computing services to the outside world. The same physical cluster may be split into multiple small blocks and assigned to different tenants. Because the data frames of the Layer 2 network may be broadcast, these different tenants need to be isolated from each other for security reasons to prevent traffic between tenants from affecting each other or even malicious attacks. Traditional network isolation uses virtual LAN technology (VLAN). VLAN uses 12 bits to represent the virtual network ID, and the upper limit of virtual networks is 4096.

Figure 7 - VLAN protocol header

4096 virtual networks are far from enough for large-scale data centers. VxLAN uses 24-bit VNI to represent the number of virtual networks, which can represent a total of 16,777,216 virtual networks, which can meet the needs of multi-tenant network isolation in data centers.

Figure 8 - VxLAN protocol header

More virtual networks are actually a benefit brought by VxLAN, and it should not be the decisive factor for using VxLAN. The extension protocol of VLAN protocol, IEEE 802.1ad, allows us to add two 802.1Q protocol headers to Ethernet frames. The 24 bits composed of two VLAN IDs can also represent 16,777,216 virtual networks[^6], so solving network isolation is not a sufficient condition for using VxLAN or Overlay network.

Summarize

Today's data centers contain multiple clusters and a large number of physical machines. The Overlay network is the middle layer between the virtual machines and the underlying network devices. Through the Overlay network, we can solve the migration problem of virtual machines, reduce the pressure on the second-layer core network devices, and provide a larger number of virtual networks:

  • When using VxLAN to form a Layer 2 network, the reachability of the Layer 2 network can still be guaranteed after the virtual machines are migrated between different clusters, different availability zones, and different data centers. This can help us ensure the availability of online services, improve cluster resource utilization, and tolerate virtual machine and node failures.
  • The scale of virtual machines in a cluster may be dozens of times that of physical machines. Compared with traditional clusters composed of physical machines, the number of MAC addresses contained in a cluster composed of virtual machines may be one or two orders of magnitude more. It is difficult for network devices to bear such a large number of Layer 2 network requests. Overlay networks can reduce MAC address table entries and ARP requests in clusters through IP packets and control planes.
  • The VxLAN protocol header uses a 24-bit VNI to represent a virtual network, which can represent a total of 16 million virtual networks. We can allocate network bandwidth to different virtual networks separately to meet the network isolation requirements of multiple tenants.

It should be noted that the Overlay network is only a virtual network on the physical network. Using this technology cannot directly solve the scalability problem in the cluster, and VxLAN is not the only way to build an Overlay network. We can consider using different technologies in different scenarios, such as NVGRE, GRE, etc. Finally, let's look at some relatively open related issues. Interested readers can think carefully about the following questions:

  • VxLAN encapsulates the original data packet into UDP and distributes it on the network. So what methods do NVGRE and STT use to transmit data?
  • What technology or software should be used to deploy Overlay network in Kubernetes?

<<:  Seize the opportunity of enterprise applications with network slicing

>>:  To develop new infrastructure, major cities across the country have pressed the "start button"!

Recommend

5G is not here yet, but it is within reach

5G is currently the most eye-catching new technol...

802.11be (Wi-Fi 7) Technology Outlook

1. Overview of Wi-Fi 7 New Features Figure 1 is a...

Russia launches first ultra-fast 5G network

According to foreign media reports, Russian telec...

Ten Limitations of MU-MIMO in WiFi

MIMO technology has continued to evolve since its...

Development Trend of International 5G Private Network Applications

As an important driving force for the digital tra...

How often does an Ethernet cable lose signal?

While many of us connect to Wi-Fi to browse the w...

What is One Network Management? Finally someone explained it clearly

1. Definition of One Network Management Definitio...