XDP technology for high-performance network framework

XDP technology for high-performance network framework

1. Basic Concepts of XDP

XDP stands for eXpress Data Path, which is a high-performance, programmable network packet processing framework provided by the Linux kernel. XDP will directly take over the RX direction data packets of the network card, quickly process the packets by running eBPF instructions in the kernel, and seamlessly connect to the kernel protocol stack.

XDP is not a kernel bypass, but adds a fast data path between the network card and the kernel protocol stack. XDP uses eBPF technology to inherit its excellent features such as programmability, real-time implementation, and security.

XDP Smart NIC is an extension of the XDP concept. On a Smart NIC that supports eBPF instructions, the eBPF instructions corresponding to XDP on the CPU are loaded onto the Smart NIC, which can save CPU resources and unload rule hardware at the same time.

XDP provides a high-performance network processing framework with the help of eBPF technology. Users can customize network processing behaviors according to the standard eBPF programming guide. At the same time, the kernel adds the AF_XDP protocol family, and the matched data packets in the kernel XDP framework are sent to the user state through it, which expands the support of XDP from the kernel to the user state application scenarios.

2. XDP overall framework

As shown in Figure 1 below, we use the XDP overall framework diagram to show its relative position in the kernel system and how it meets the requirements of the data plane development framework.

Figure 1 includes various layers such as the network card device, XDP framework, TCP/IP protocol stack, Socket interface, and application layer, covering the entire data flow of network packets from the network card to the server. The gray part in the middle of Figure 1 (XDP Packet Processor) is the XDP framework, whose data plane processing unit is located between the network card driver and the protocol stack in the kernel, and actually runs in the driver layer. The network data packet from the network card to the CPU processor first reaches the XDP framework via the network card driver and is processed by the user-defined eBPF program running in the XDP framework. The processing results of the data packet are drop, forward, receive local, etc. The network data packet with the result of local acceptance continues to enter the TCP/IP protocol stack for processing along the original kernel path, and the network data packet with the result of forwarding or dropping is directly processed in the XDP framework (this part of the traffic occupies a large part in the real network, and its execution path is greatly shortened compared to the traditional kernel path). The black dotted line in the gray part (XDP Packet Processor) in the middle of Figure 1 shows the upper-layer control channel capabilities for loading/updating/configuring eBPF programs in the XDP framework, and the kernel provides corresponding system calls to implement control of the control plane over the data plane. Figure 1 shows the overall framework of XDP as a high-performance network data plane and control plane combination.

3. Introduction to XDP application development

The XDP framework is based on eBPF technology. BPF is a general-purpose RISC instruction set. BPF was first proposed at the Berkeley Laboratory in 1992. In 2013, BPF was enhanced to obtain eBPF, and was officially incorporated into the Linux kernel in 2014. eBPF provides a mechanism to run a small program when various kernel and application events occur. As shown in Figure 2 below, we describe the development/operation process of eBPF and its specific application on XDP.

Figure 2 shows a typical eBPF development and operation process. Developers develop programs using a subset of the C language (kernel operation, standard C library is not available), and then use the LLVM/clang compiler to compile them into eBPF instructions (Bytecode). After the eBPF verifier passes the inspection, the JIT compiler in the kernel maps the eBPF instructions into the processor's native instructions (opcode) and then loads them into the preset hooks of each kernel module. Among them, the XDP framework is a hook for a network data fast path opened by the kernel in the network card driver. Other typical kernel hooks are kernel functions (kprobes), user space functions (uprobes), system calls, fentry/fexit, tracepoints, network routing, TC, TCP congestion algorithm, sockets and other modules.

XDP is a specific application of the kernel's implementation of the network fast path based on eBPF. In Figure 3 below, we list the typical applications of the kernel that support eBPF hooks nodes.

Compared with traditional user-mode/kernel-mode programs, eBPF/XDP has the following typical features:

(1) As shown in Figure 4 below, the in-kernel JIT Compiler maps the eBPF bytecode into processor native instructions with better performance to achieve high performance. At the same time, its program verifier verifies program security and provides a sandbox running environment. Its security checks include determining whether there are loops, whether the program length exceeds the limit, whether the program memory access is out of bounds, whether the program contains unreachable instructions, etc. The greatest advantage is that it can be updated in real time without interrupting the workload.

(2) The kernel-mode and user-mode data exchange of eBPF programs is implemented through BPF maps, which is similar to the shared memory access of inter-process communication. The supported data types include hash tables, arrays, LRU caches (Least Recently Used), circular queues, stack traces, and LPM routing tables (Longest Prefix match). As shown in Figure 5 below, BPF Map plays the role of data exchange between user mode and kernel mode.

(3) eBPF compensates for the lack of the standard C library by providing auxiliary functions. Common functions include obtaining random numbers, obtaining the current time, map access, obtaining process/cgroup context, processing network packets and forwarding, accessing socket data, executing tail calls, accessing process stacks, accessing system call parameters, etc. In actual development, you can get more help information through the man bpf-helpers command. Figure 6 below shows an auxiliary function for obtaining random numbers that starts with bpf.

(4) Compared with pure kernel kmodule and other development modes, eBPF provides a unique tail call and function call mechanism. Due to the precious kernel stack space and the fact that eBPF does not support loops and has a recursion depth limit (maximum 32), eBPF introduces tail calls and function calls to implement jumps between eBPF programs. The tail call and function call mechanisms are fully designed for performance optimization. The tail call can reuse the current stack frame and jump to another eBPF program. For details, please refer to the bpf_tail_call auxiliary function manual. Since eBPF programs are independent of each other, the tail call mechanism actually provides developers with the ability to orchestrate functions as units. Starting from Linux 4.16 and LLVM 6.0, eBPF began to support function calls, and after kernel 5.9, it supports the collaboration of tail calls and function calls. The disadvantage of tail calls is that the generated program image is large but saves memory; the advantage of function calls is that the image is small but consumes a lot of memory. Developers can flexibly choose different methods according to actual needs. Figure 7 below shows the mixed collaboration process of tail calls and function calls in the eBPF program, where tail call is a tail call and bpf2bpf call is a function call.

Figure 7 Mixed collaboration of eBPF program tail calls and function calls

4. Comparison of similar technologies of XDP

Taking DPDK, the most widely used user-mode data plane development framework, as a benchmark, we use the following data flow diagram to illustrate the implementation differences between XDP and DPDK. As shown in Figure 8, DPDK completely bypasses the kernel and runs in user mode, while XDP runs in the kernel between the network card and the kernel protocol stack. DPDK is a new data plane development framework that is independent of the kernel, while XDP is a fast data path attached to the kernel (compared to the original kernel network slow path).

The following is a specific comparison between XDP and DPDK:

(1) DPDK will monopolize CPU resources and require large page memory. XDP does not monopolize the CPU and does not require large page memory. XDP has lower hardware requirements than DPDK.

(2) Projects that use DPDK as the data plane framework will require heavy development of human resources. For reference, typical projects include FD.IO (VPP) and OVS-DPDK. XDP is a fast data channel native to the kernel and is a lightweight data plane framework.

(3) DPDK requires code support and license support at all levels, such as network card drivers and user-mode protocol stacks. XDP is directly maintained and released by the Linux Foundation, and the specific technical ecosystem is maintained by its sub-project IO Visor.

(4) DPDK has advantages in scenarios such as large capacity and high throughput. XDP has advantages in scenarios such as cloud native.

Currently, XDP has the following typical projects for application scenarios:

  • DDoS Protection
  • Firewall
  • Load balancing based on XDP
  • Pre-protocol stack processing
  • Cloud native application service optimization (such as K8S, OpenStack, Docker and other service improvement projects)
  • Flow Control

5. Famous open source projects based on eBPF/XDP

Cilium is an open source project that uses eBPF and XDP to provide fast in-kernel network and security policy enforcement for containers. The Cilium project implements distributed load balancing for traffic between Pods and external services, and is able to completely replace kube-proxy, using efficient hash tables in eBPF, allowing almost unlimited expansion. It also supports advanced features such as integrated ingress and egress gateways, bandwidth management, and service meshes, and provides deep network and security visibility and monitoring.

As shown in Figure 9 below, eBPF/XDP (Little Bee) is located between services such as containers and Pods and network cards. It uses XDP technology to improve the performance and security of upper-layer services. It is very clever and safe to dynamically complete the work that the kernel cannot complete before in the kernel data flow node.

Figure 10 shows a specific example of how to improve functions in the kernel network card and socket layers using XDP and eBPF in the Cilium project. On the left side of Figure 10, the user-mode network processing code is implanted in the XDP framework of the network card driver layer, and on the right side, the socket processing code is inserted in the socket layer. This allows for dynamic function expansion without modifying the kernel and for non-perceptual function upgrades of typical node applications such as upper-layer containers and Pods.


The Cilium project provides a very good model solution for improving service performance and security in cloud native scenarios. As shown in Figure 11, various common cloud native services benefit from eBPF/XDP to achieve performance and security improvements.

Figure 11 The core value of eBPF/XDP in the Cilium project

6. Development prospects of DP

In order to realize flexible data plane and accelerate NFV applications, the Linux Foundation established the sub-project IO Visor to realize an open and programmable network data plane open source project based on the Linux kernel. XDP is a sub-project of the IO Visor project. The lack of virtualization in the Linux kernel is the biggest challenge for IO Visor in NFV scenarios. XDP makes up for this defect through the real-time implementation technology of the eBPF virtual machine. However, almost all virtual machines run in user space. Due to the security requirements of the eBPF virtual machine running in the kernel, it is a big challenge to port virtualization-related tasks to the kernel space.

In terms of performance improvement, Sebastiano Miano and others used XDP and TC hooks to mount eBPF programs to implement Linux's firewall iptable in 2019, providing performance several times or even dozens of times higher than the original iptable when the number of rules increased. In 2021, Yoann Ghigoff and others implemented a layer of Memcached cache in the kernel based on eBPF, XDP, and TC, achieving higher performance than the DPDK kernel bypass solution.

The XDP project has opened a new path between the traditional kernel model and the new user-mode framework to fill the resource investment trap caused by the large span of new technologies. We have seen Microsoft announce plans to support XDP technology on the Windows platform in 2022. As the entire ecosystem gradually improves, the lightweight, instant implementation, high-performance channel, security and reliability capabilities brought by XDP will increasingly play a greater role.

China Mobile Smart Home Center will keep close track of XDP technology, continue to track the development direction of the industry from a technical perspective, maintain an open mind towards emerging technologies and actively embrace new technologies, and promote the industry to bring tangible digital services to the general public through emerging technologies.

<<:  What you need to know about Wi-Fi 7

>>:  Introduction to Socks5 Proxy Protocol

Recommend

5G helps digital transformation of smart railways

In recent years, under the guidance of the "...

When WiFi6 collides with 5G, is it a crisis or a business opportunity?

September 16, 2019 WiFi Alliance announces WiFi6 ...

How Apple's iCloud Private Relay powers enterprise VPNs

Apple's iCloud Private Relay service offers p...

Starting next year, your home router should be upgraded to Wi-Fi 6!

This article is reproduced from Leiphone.com. If ...

What are the excellent designs worth learning in NS?

I used to be a student, and now when I think back...

In addition to the ping command, these network commands are also very useful

What we are going to talk about today is other co...

Teach you how to accurately calculate the I2C pull-up resistor value

[[438676]] How many devices can be connected to t...

NAT Technology for IPv4 Extension

When our company releases application systems or ...

Why is “open source” so important to the Internet of Things?

For the development of the Internet of Things, &q...