Author | Chen Yunhao (Huanhe) 1. Background1. Why did container networks emerge?In a car engine production workshop, the various components of the car engine will be assembled in a certain order, which requires that directly related components must know the specific location of the next component. When a car engine is assembled, there are still many parts missing from the final finished car, such as chassis, body, etc. At this time, the engine needs to be sent to an assembly center for assembly and installation, so we must know the geographical location of the assembly center. These locations can be understood as IP addresses in the container, which is the case with container networks. In the above example, both the intercommunication scenario within the container network node and the communication scenario across nodes are described. With the development of cloud computing, communication between applications has evolved from physical machine networks, virtual machine networks, to the current container networks. Since containers are different from physical machines and virtual machines, containers can be understood as standard, lightweight, portable, independent containers. Containers are isolated from each other and use their own environment and resources. However, as the environment becomes more and more complex, containers will need to transmit information between containers or between containers and the outside of the cluster during operation. At this time, the container must have a name (i.e., IP address) at the network level, and thus container networks came into being. Let's talk about the origin of container networking from a technical perspective. First, we need to talk about the essence of containers. It is achieved by the following points:
IPC: System V IPC and POSIX message queues; Network: network equipment, network protocol stack, network port, etc .; PID: process ; Mount: mount point ; UTS: host name and domain name ; USR: users and user groups ; Since the network stacks between the host and the container, and between the containers, are not connected, and there is no unified control plane, containers cannot directly perceive each other. To solve this problem, the container network that we will discuss in this article has emerged, and combined with different network virtualization technologies, it has brought about a variety of container network solutions. 2. Basic requirements for container networksIP-per-Pod, each Pod has an independent IP address, and all containers in the Pod share a network namespace. All Pods in the cluster are in a directly connected flat network and can be directly accessed via IP . All containers can access each other directly without NAT . All nodes and all containers can directly access each other without NAT . The container itself sees the same IP as other containers . The Service cluster IP can only be accessed within the cluster. External requests must be accessed through NodePort, LoadBalance, or Ingress . 2. Introduction to Network Plug-ins1. Network plug-in overviewThe container and the host where the container is located are two separate places. If they need to be connected, a bridge must be established. However, since the container side has no name, the bridge cannot be established. At this time, the container side needs to be named first, so that the bridge can be successfully established. The network plug-in plays the role of naming the container side and establishing the bridge. That is, the network plugin inserts the network interface into the container network namespace (for example, one end of the veth pair) and makes any necessary changes on the host (for example, connecting the other end of the veth to the bridge). Then it assigns a free IP address to the interface by calling the appropriate IPAM plugin (IP address management plugin) and sets up routing rules corresponding to this IP address. For K8s, the network is one of the most important functions, because without a good network, pods between different nodes in the cluster or even between the same node cannot run well. However, when designing the network, K8s adopted only one principle: "flexibility"! So how can it be flexible? That is, K8s itself does not implement too many network-related operations, but instead formulates a specification:
That’s right, there are only these three points. Such a simple and flexible specification is the famous CNI specification. However, precisely because K8s itself "does nothing", everyone can freely play and freely implement different CNI plug-ins, that is, network plug-ins. In addition to the well-known Calico and Bifrost network plug-ins in the community, Alibaba has also developed a network plug-in Hybridnet with excellent functions and performance.
Hybridnet is an open source container networking solution designed for hybrid cloud, integrated with Kubernetes and used by the following PaaS platforms:
Hybridnet focuses on efficient large-scale clustering, heterogeneous infrastructure, and user-friendliness.
Calico is a widely adopted, proven open source networking and network security solution for Kubernetes, virtual machines, and bare metal workloads. Calico provides two major services for Cloud Native applications: Network connectivity between workloads Network security policies between workloads
Bifrost is an open source solution that enables L2 networking for Kubernetes and supports the following features . Network traffic in Bifrost can be managed and monitored with traditional equipment . Support macvlan access to service traffic . 2. Communication path introductionOverlay solution: It means a cross-host network that connects containers on different hosts with the same virtual network .
VXLAN (Virtual eXtensible Local Area Network) is one of the NVO3 (Network Virtualization over Layer 3) standard technologies defined by IETF. It adopts L2 over L4 (MAC-in-UDP) message encapsulation mode and encapsulates Layer 2 messages with Layer 3 protocols. It can expand the Layer 2 network within the Layer 3 range and meet the needs of large Layer 2 virtual migration and multi-tenancy in data centers .
IPIP tunnel is implemented based on TUN device. TUN network device can encapsulate layer 3 (IP network data packet) data packet in another layer 3 data packet. Linux natively supports several different IPIP tunnel types, but they all rely on TUN network device . ipip: A common IPIP tunnel is a packet encapsulated into an IPv4 packet . gre: Generic Routing Encapsulation, defines a mechanism for encapsulating other network layer protocols on any network layer protocol, so it is applicable to both IPv4 and IPv6 . sit: The sit mode is mainly used for IPv4 packets to encapsulate IPv6 packets, that is, IPv6 over IPv4 . isatap: Intra-Site Automatic Tunnel Addressing Protocol, similar to sit, is also used for IPv6 tunnel encapsulation . vti: Virtual Tunnel Interface, a type of IPsec tunnel technology . In this article we use the common IPIP tunnel called ipip . Underlay solution: A network composed of devices such as switches and routers, driven by Ethernet protocols, routing protocols, and VLAN protocols.
Border Gateway Protocol (BGP) is a distance vector routing protocol that enables reachable routes between autonomous systems (AS) and selects the best route .
VLAN (Virtual Local Area Network) is a communication technology that logically divides a physical LAN into multiple broadcast domains. Hosts within a VLAN can communicate directly, but hosts between VLANs cannot communicate directly, thus limiting broadcast messages to one VLAN . 3. Principles of network plug-ins
4. Classification and comparison of network plug-ins
5. Network plug-in application scenariosIn view of the complex network conditions in the data center, we need to choose the corresponding container network solution according to the needs .
In this article, we will conduct a detailed analysis of hybridnet vxlan, hybridnet vlan, hybridnet bgp, calico IPIP, calico BGP and bifrost based on macvlan transformation on pod data links . 3. Network plug-in architecture and communication path1. HybridnetOverall architecture
Communication Path(1) VXLAN mode
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack . According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod2 in the 39999 routing table . The traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action . Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack . According to the destination IP, the traffic matches the 39999 routing table in the host's policy routing, and matches the routing rules of Pod1 in the 39999 routing table . The traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action .
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack . According to the destination IP, the traffic is matched to the 40000 routing table in the host's policy routing, and in the 40000 routing table, it is matched to the routing rule that the network segment where Pod2 is located needs to be sent to the eth0.vxlan20 network card . The forwarding table of the eth0.vxlan20 device records the correspondence between the mac address of the peer vtep and the remoteip . The traffic passes through the eth0.vxlan20 network card and is encapsulated with a UDP header . After querying the route, it is found that the local machine is in the same network segment. The mac address of the other party's physical network card is obtained through mac address query and sent via Node1 eth0 physical network card . Traffic enters from the Node2 eth0 physical network card and decapsulates a UDP header through the eth0.vxlan20 network card . According to the 39999 routing table, traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action . Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack . According to the destination IP, the traffic is matched to the 40000 routing table in the host's policy routing, and the routing rule that the network segment where Pod1 is located needs to be sent to the eth0.vxlan20 network card is matched in the 40000 routing table . The forwarding table of the eth0.vxlan20 device records the correspondence between the mac address of the peer vtep and the remoteip . The traffic passes through the eth0.vxlan20 network card and is encapsulated with a UDP header . After querying the route, it is found that the local machine is in the same network segment. The mac address of the other party's physical network card is obtained through mac address query and sent via Node2 eth0 physical network card . Traffic enters from the Node1 eth0 physical network card and decapsulates a UDP header through the eth0.vxlan20 network card . According to the 39999 routing table, traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action . (2) VLAN Mode
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack . According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod2 in the 39999 routing table . The traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action . Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack . According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod1 in the 39999 routing table . The traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action .
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack . According to the destination IP, the traffic is matched to the 10001 routing table in the host's policy routing, and is matched to the corresponding routing rule of Pod2 in the 10001 routing table . According to the routing rules, the traffic is sent from the eth0 physical network card corresponding to the eth0.20 network card and sent to the switch . The switch matches the MAC address of pod2, so the traffic is sent to the eth0 physical network card corresponding to Node2 . The traffic is received by the eth0.20 VLAN network card, and according to the route matched by the 39999 routing table, the traffic is sent from the hybrYYY network card to the pod2 container network stack, completing the packet sending action . Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack . According to the destination IP, the traffic is matched to the 10001 routing table in the host's policy routing, and is matched to the routing rule corresponding to Pod1 in the 10001 routing table . According to the routing rules, the traffic is sent from the eth0 physical network card corresponding to the eth0.20 network card and sent to the switch . The switch matches the MAC address of pod1, so the traffic is sent to the eth0 physical network card corresponding to Node1 . The traffic is received by the eth0.20 VLAN network card, and according to the route matched by the 39999 routing table, the traffic is sent from the hybrXXX network card to the pod1 container network stack to complete the packet return action . (3) BGP mode
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack . According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod2 in the 39999 routing table . The traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action . Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack . According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod1 in the 39999 routing table . The traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action .
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack . According to the destination IP, the traffic is matched to the 10001 routing table in the host's policy routing, and is matched to the default route in the 10001 routing table . According to the route, the traffic is sent to the switch corresponding to 10.0.0.1 . The switch matches the specific route corresponding to pod2 and sends the traffic to the eth0 physical network card of Node2 . The traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action . Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack . According to the destination IP, the traffic is matched to the 10001 routing table in the host's policy routing, and is matched to the default route in the 10001 routing table . According to the route, the traffic is sent to the switch corresponding to 10.0.0.1 . The switch matches the specific route corresponding to pod1 and sends the traffic to the eth0 physical network card of Node1 . The traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action . 2. CalicoBasic concepts:
Overall architecture
Communication Path(1) IPIP mode
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->caliXXX on the host side, and enters the host network stack . Based on the destination IP, the traffic matches the routing rule of Pod2 in the routing table . The traffic enters the Pod2 container network stack from the caliYYY network card, completing the packet sending action . Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->caliYYY on the host side, and enters the host network stack . Based on the destination IP, the traffic matches the routing rule of Pod1 in the routing table . The traffic enters the Pod1 container network stack from the caliXXX network card and completes the packet return action .
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->caliXXX on the host side, and enters the host network stack . src:pod1IP dst: pod2IP According to the destination IP, the traffic matches the routing rule in the routing table that forwards the traffic to the tunl0 network card . src:pod1IP dst: pod2IP Traffic is encapsulated by IPIP from tunl0 (that is, encapsulated with an IP header) and sent out through the eth0 physical network card . src:Node1IP dst: Node2IP Traffic enters Node2's host network stack from Node2's eth0 network card . src:Node1IP dst: Node2IP Traffic enters tunl0 for IPIP depacketization . src:pod1IP dst: pod2IP The traffic enters the Pod2 container network stack from the caliYYY network card, completing the packet sending action . src:pod1IP dst: pod2IP Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->caliYYY on the host side, and enters the host network stack . src:pod2IP dst: pod1IP According to the destination IP, the traffic matches the routing rule in the routing table that forwards the traffic to the tunl0 network card . src:pod2IP dst: pod1IP Traffic is encapsulated by IPIP from tunl0 (that is, encapsulated with an IP header) and sent out through the eth0 physical network card . src: Node2IP dst: Node1IP Traffic enters Node1's host network stack from Node1's eth0 network card . src: Node2IP dst: Node1IP Traffic enters tunl0 for IPIP depacketization . src:pod2IP dst: pod1IP The traffic enters the Pod1 container network stack from the caliXXX network card and completes the packet return action . src:pod2IP dst: pod1IP (2) BGP mode
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->caliXXX on the host side, and enters the host network stack . Based on the destination IP, the traffic matches the routing rule of Pod2 in the routing table . The traffic enters the Pod2 container network stack from the caliYYY network card, completing the packet sending action . Packing return process:
The traffic enters the Pod1 container network stack from the caliXXX network card and completes the packet return action .
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->caliXXX on the host side, and enters the host network stack . According to the destination IP, the traffic matches the routing rule of the corresponding network segment of Pod2 in the routing table and is sent out from the eth0 physical network card of Node1 . The traffic enters from the Node2 eth0 physical network card and enters the Pod2 container network stack from the caliYYY network card, completing the packet sending action . Packing return process: Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->caliYYY on the host side, and enters the host network stack . According to the destination IP, the traffic matches the routing rule of the corresponding network segment of Pod1 in the routing table and is sent out from the eth0 physical network card of Node2 . The traffic enters from the Node1 eth0 physical network card and enters the Pod1 container network stack from the caliXXX network card to complete the packet return action . 3. BifrostOverall architecture
Communication Path(1) MACVLAN mode
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the macvlan network card, that is, pod1's eth0 goes through the layer 2 network and enters the eth0-10 VLAN subnet card . Since macvlan is in bridge mode, it can match the MAC address of pod2 . The traffic enters the Pod2 container network stack from the eth0-10 VLAN subnet card and completes the packet sending action . Packing return process: Pod2 traffic passes through the macvlan network card, that is, pod2's eth0 goes through the layer 2 network and enters the eth0-10 VLAN subnet card . Since macvlan is in bridge mode, it can match the MAC address of pod1 . The traffic enters the Pod1 container network stack from the eth0-10 VLAN subnet card and completes the packet return action .
The communication process of Pod1 accessing Pod2 is as follows: Pod1 traffic passes through the macvlan network card, that is, pod1's eth0 takes the default route (gateway is 5.0.0.1) and enters the eth0-5 vlan subnet card . Since the MAC address of the gateway 5.0.0.1 is found on eth0-5, the traffic is sent from the eth0 physical network card to the switch . The traffic is matched to the MAC address of pod2 on the switch . The traffic enters the physical network card of the host where Pod2 is located, and enters the corresponding eth0-10 VLAN subnet card . The traffic enters the Pod2 container network stack from the eth0-10 VLAN subnet card and completes the packet sending action . Packing return process: Pod2 traffic passes through the macvlan network card, that is, pod2's eth0 takes the default route (gateway is 10.0.0.1) and enters the eth0-10 vlan subnet card . Since the MAC address of the gateway 10.0.0.1 is found on eth0-10, the traffic is sent from the eth0 physical network card to the switch . The traffic is matched to the MAC address of pod1 on the switch . The traffic enters the physical network card of the host where Pod1 is located, and enters the corresponding eth0-5 VLAN subnet card . The traffic enters the Pod1 container network stack from the eth0-5 VLAN subnet card and completes the packet return action .
Communication process of Pod1 accessing Pod2 Package issuance process: Pod1 traffic passes through the macvlan network card, that is, pod1's eth0 takes the default route (gateway is 5.0.0.1) and enters the eth0-5 vlan subnet card . Since the MAC address of the gateway 5.0.0.1 is found on eth0-5, the traffic is sent from the eth0 physical network card to the switch . The traffic is matched to the MAC address of pod2 on the switch . The traffic enters the physical network card of the host where Pod2 is located, and enters the corresponding eth0-10 VLAN subnet card . The traffic enters the Pod2 container network stack from the eth0-10 VLAN subnet card and completes the packet sending action . Packing return process: Pod2 traffic passes through the macvlan network card, that is, pod2's eth0 takes the default route (gateway is 10.0.0.1) and enters the eth0-10 vlan subnet card . Since the MAC address of the gateway 10.0.0.1 is found on eth0-10, the traffic is sent from the eth0 physical network card to the switch . The traffic is matched to the MAC address of pod1 on the switch . The traffic enters the physical network card of the host where Pod1 is located, and enters the corresponding eth0-5 VLAN subnet card . The traffic enters the Pod1 container network stack from the eth0-5 VLAN subnet card and completes the packet return action . IV. Problems and Future Development1. IPv4/IPv6 dual stackbackgroundIP, as the most basic element of the Internet, is a protocol designed for interconnected computer networks to communicate. It is precisely because of the IP protocol that the Internet has been able to rapidly develop into the world's largest and open computer communication network. With the development of the Internet, the IP protocol has produced two protocols: IPv4 and IPv6:
IPv4 is the fourth version of the Internet Protocol, a datagram transmission mechanism used in computer networks. This protocol is the first widely deployed IP protocol. Every device connected to the Internet (whether it is a switch, PC or other device) will be assigned a unique IP address, such as 192.149.252.76, as shown in the figure below. IPv4 uses 32-bit (4-byte) addresses and can store approximately 4.3 billion addresses. However, as more and more users access the Internet, the global IPv4 addresses have been completely exhausted in November 2019. This is also one of the reasons why the subsequent Internet Engineering Task Force (IEIF) proposed IPv6.
IPv6 is the sixth version of the Internet Protocol proposed by IEIF. It is the next generation protocol to replace IPv4. Its proposal not only solves the problem of scarce network address resources, but also solves the obstacles for various access devices to access the Internet. The address length of IPv6 is 128 bits, which can support more than 340 trillion addresses. As shown in the figure below, 3ffe:1900:fe21:4545:0000:0000:0000:0000, this is an IPv6 address. IPv6 addresses are usually divided into 8 groups, 4 hexadecimal numbers in one group, and each group is separated by a colon. When IPv4 was the mainstream and IPv6 had not yet emerged, the main problems faced were: The number of IPv4 addresses no longer meets demand, and IPv6 addresses are needed for expansion . As the domestic next-generation Internet development policy becomes clearer, customer data centers need to use IPv6 to comply with stricter regulations . status quo
Reasons why calico IPIP does not support IPv6:
2. Multiple network cards (multiple communication mechanisms)backgroundUsually in K8s, a Pod has only one interface, that is, a single network card, which is used for pod-to-pod communication in the cluster network. When a Pod needs to communicate with a heterogeneous network, you can choose to establish multiple interfaces in the Pod, that is, multiple network cards. Current issues:
status quoThere are two solutions to implement multiple network cards:
3. Network traffic controlbackgroundUsually in a data center, we divide its network traffic into two types. One is the traffic between external users and internal servers in the data center, which is called north-south traffic or vertical traffic; the other is the traffic between internal servers in the data center, also called east-west traffic or horizontal traffic. In the scope of container cloud, we define east-west traffic as traffic between hosts and containers, between containers, or between hosts within a cluster, and north-south traffic as traffic between the outside of the container cloud and the inside of the container cloud. Current issues:
status quo
References 1. Analysis of cross-host communication in calico vxlan ipv4 overlay network https://www.jianshu.com/p/5edd6982e3be 2. Qunar container platform network: Calico http://dockone.io/article/2434328 3. Best vxlan introduction https://www.jianshu.com/p/cccfb481d548 4. Uncovering the secrets of IPIP tunnels https://morven.life/posts/networking-3-ipip/ 5. Basic knowledge of BGP https://blog.csdn.net/qq_38265137/article/details/80439561 6. VLAN Basics https://cshihong.github.io/2017/11/05/VLAN%E5%9F%BA%E7%A1%80%E7%9F%A5%E8%AF%86/ 7. Differences and overview of Overlay and Underlay network protocols https://www.cnblogs.com/fengdejiyixx/p/15567609.html#%E4%BA%8Cunderlay%E7%BD%91%E7%BB%9C%E6%A8%A1%E5%9E%8B 8. Summary of the East-West Traffic Traction Solution http://blog.nsfocus.net/east-west-flow-sum/ 9. Container Network Interface (CNI) https://jimmysong.io/kubernetes-handbook/concepts/cni.html 10. In-depth understanding of CNI in K8s network https://zhuanlan.zhihu.com/p/450140876 |
<<: Wi-Fi Alliance: Wi-Fi 6 and 6E have been "rapidly adopted"
>>: Understand the IP location function of the entire network in one article
DediPath has released a Christmas promotion plan,...
The latest data released by the Ministry of Indus...
Starlink recently announced that the satellite in...
Biden is hoping to finalize an infrastructure bil...
CloudCone has released two promotional plans for ...
Where does the “mutual trust” of the sharing econ...
5G (or 5th generation mobile networks) deployment...
Automakers are battling in court with Qualcomm, N...
Since last year, there has been an increasing amo...
[[402368]] This article is reprinted from the WeC...
2016 is known as the "first year of online l...
V5.NET is a company that provides cloud servers a...
introduction Hello, everyone. I am your technical...
Since the beginning of 2018, the new number segme...
The Double 11 discount of Krypt's ION platfor...