A thorough understanding of container network communication

A thorough understanding of container network communication

Author | Chen Yunhao (Huanhe)

1. Background

1. Why did container networks emerge?

In a car engine production workshop, the various components of the car engine will be assembled in a certain order, which requires that directly related components must know the specific location of the next component. When a car engine is assembled, there are still many parts missing from the final finished car, such as chassis, body, etc. At this time, the engine needs to be sent to an assembly center for assembly and installation, so we must know the geographical location of the assembly center.

These locations can be understood as IP addresses in the container, which is the case with container networks. In the above example, both the intercommunication scenario within the container network node and the communication scenario across nodes are described.

With the development of cloud computing, communication between applications has evolved from physical machine networks, virtual machine networks, to the current container networks. Since containers are different from physical machines and virtual machines, containers can be understood as standard, lightweight, portable, independent containers. Containers are isolated from each other and use their own environment and resources. However, as the environment becomes more and more complex, containers will need to transmit information between containers or between containers and the outside of the cluster during operation. At this time, the container must have a name (i.e., IP address) at the network level, and thus container networks came into being.

Let's talk about the origin of container networking from a technical perspective. First, we need to talk about the essence of containers. It is achieved by the following points:

  • cgroup: implement resource quotas .
  • overlay fs: implements file system security and portability .
  • namespace: realizes resource isolation .

IPC: System V IPC and POSIX message queues;

Network: network equipment, network protocol stack, network port, etc .;

PID: process ;

Mount: mount point ;

UTS: host name and domain name ;

USR: users and user groups ;

Since the network stacks between the host and the container, and between the containers, are not connected, and there is no unified control plane, containers cannot directly perceive each other. To solve this problem, the container network that we will discuss in this article has emerged, and combined with different network virtualization technologies, it has brought about a variety of container network solutions.

2. Basic requirements for container networks

IP-per-Pod, each Pod has an independent IP address, and all containers in the Pod share a network namespace.

All Pods in the cluster are in a directly connected flat network and can be directly accessed via IP .

All containers can access each other directly without NAT .

All nodes and all containers can directly access each other without NAT .

The container itself sees the same IP as other containers .

The Service cluster IP can only be accessed within the cluster. External requests must be accessed through NodePort, LoadBalance, or Ingress .

2. Introduction to Network Plug-ins

1. Network plug-in overview

The container and the host where the container is located are two separate places. If they need to be connected, a bridge must be established. However, since the container side has no name, the bridge cannot be established. At this time, the container side needs to be named first, so that the bridge can be successfully established. The network plug-in plays the role of naming the container side and establishing the bridge.

That is, the network plugin inserts the network interface into the container network namespace (for example, one end of the veth pair) and makes any necessary changes on the host (for example, connecting the other end of the veth to the bridge). Then it assigns a free IP address to the interface by calling the appropriate IPAM plugin (IP address management plugin) and sets up routing rules corresponding to this IP address.

For K8s, the network is one of the most important functions, because without a good network, pods between different nodes in the cluster or even between the same node cannot run well.

However, when designing the network, K8s adopted only one principle: "flexibility"! So how can it be flexible? That is, K8s itself does not implement too many network-related operations, but instead formulates a specification:

  • There are configuration files that provide the name of the network plugin to use, as well as the information required by the plugin .
  • Let CRI call this plug-in and pass the runtime information of the container, including the container namespace, container ID, etc., to the plug-in .
  • I don't care about the internal implementation of the network plugin, I just need to be able to output the pod IP provided by the network plugin in the end .

That’s right, there are only these three points. Such a simple and flexible specification is the famous CNI specification.

However, precisely because K8s itself "does nothing", everyone can freely play and freely implement different CNI plug-ins, that is, network plug-ins. In addition to the well-known Calico and Bifrost network plug-ins in the community, Alibaba has also developed a network plug-in Hybridnet with excellent functions and performance.

  • Hybridnet

Hybridnet is an open source container networking solution designed for hybrid cloud, integrated with Kubernetes and used by the following PaaS platforms:

  • Alibaba Cloud ACK Release
  • Alibaba Cloud AECP
  • Ant Financial SOFAStack

Hybridnet focuses on efficient large-scale clustering, heterogeneous infrastructure, and user-friendliness.

  • Calico

Calico is a widely adopted, proven open source networking and network security solution for Kubernetes, virtual machines, and bare metal workloads. Calico provides two major services for Cloud Native applications:

Network connectivity between workloads

Network security policies between workloads

  • Bifrost

Bifrost is an open source solution that enables L2 networking for Kubernetes and supports the following features .

Network traffic in Bifrost can be managed and monitored with traditional equipment .

Support macvlan access to service traffic .

2. Communication path introduction

Overlay solution: It means a cross-host network that connects containers on different hosts with the same virtual network .

  • VXLAN

VXLAN (Virtual eXtensible Local Area Network) is one of the NVO3 (Network Virtualization over Layer 3) standard technologies defined by IETF. It adopts L2 over L4 (MAC-in-UDP) message encapsulation mode and encapsulates Layer 2 messages with Layer 3 protocols. It can expand the Layer 2 network within the Layer 3 range and meet the needs of large Layer 2 virtual migration and multi-tenancy in data centers .

  • IPIP

IPIP tunnel is implemented based on TUN device. TUN network device can encapsulate layer 3 (IP network data packet) data packet in another layer 3 data packet. Linux natively supports several different IPIP tunnel types, but they all rely on TUN network device .

ipip: A common IPIP tunnel is a packet encapsulated into an IPv4 packet .

gre: Generic Routing Encapsulation, defines a mechanism for encapsulating other network layer protocols on any network layer protocol, so it is applicable to both IPv4 and IPv6 .

sit: The sit mode is mainly used for IPv4 packets to encapsulate IPv6 packets, that is, IPv6 over IPv4 .

isatap: Intra-Site Automatic Tunnel Addressing Protocol, similar to sit, is also used for IPv6 tunnel encapsulation .

vti: Virtual Tunnel Interface, a type of IPsec tunnel technology .

In this article we use the common IPIP tunnel called ipip .

Underlay solution: A network composed of devices such as switches and routers, driven by Ethernet protocols, routing protocols, and VLAN protocols.

  • BGP

Border Gateway Protocol (BGP) is a distance vector routing protocol that enables reachable routes between autonomous systems (AS) and selects the best route .

  • Vlan

VLAN (Virtual Local Area Network) is a communication technology that logically divides a physical LAN into multiple broadcast domains. Hosts within a VLAN can communicate directly, but hosts between VLANs cannot communicate directly, thus limiting broadcast messages to one VLAN .

3. Principles of network plug-ins

  • Calico uses tunnel technologies such as IPIP or establishes BGP connections between hosts to complete mutual learning of container routing and solve the problem of cross-node communication.
  • Hybridnet uses vxlan tunnel technology and establishes BGP connections between hosts to complete mutual learning of container routing or ARP proxy to solve the problem of cross-node communication.
  • Bifrost uses the switch VLAN capabilities through the kernel macvlan module to solve the container communication problem .

4. Classification and comparison of network plug-ins

  • Network plug-in classification


Overlay Solution

Underlay Solution

Mainstream solution

Routing or SDN solution: Calico IPIP/Calico VXLAN

Calico BGP/MACVLAN/IPVLAN

advantage

  1. No intrusion into the physical network
  2. Simple maintenance and management
  1. high performance
  2. Network traffic can be managed and monitored

shortcoming

  1. Container networks are difficult to monitor
  2. Containers access the outside of the cluster through Node SNAT, which cannot accurately manage traffic
  1. Invasion of existing networks
  2. High maintenance and management workload
  3. Occupies existing network IP addresses, requiring detailed planning in the early stages
  • Network plugin comparison


HybridNet

calico ipip

calico bgp

bifrost

Supported scenarios

overlay/underlay

overlay

underlay

underlay

Network stack

IPv4/IPv6

IPv4

IPv4/IPv6

IPv4

Communications Technology

vxlan/vlan/bgp

ipip

bgp

macvlan

Communication Mechanism

Tunnel communication/Layer 2+Layer 3 communication/Layer 3 communication

Tunnel communication

Layer 3 communication

Layer 2 communication

Container communication

veth pair

veth pair

veth pair

macvlan subinterface

Whether to support fixed IP/fixed IP pool

yes

yes

yes

yes

IPPool Mode

block + detail

block (e.g. 1.1.1.0/24)

block (e.g. 1.1.1.0/24)

detail (such as 1.1.1.1~1.1.1.9)

North-South traffic export

SNAT/podIP

SNAT

SNAT/podIP

podIP

Whether network policy is supported

yes

yes

yes

Commercial version support

  • SNAT: Translate the source IP address of the data packet .
  • podIP: Direct communication via podIP .
  • Veth pair: In Linux, you can create a pair of veth pair network cards. When packets are sent from one side, they can be received by the other side. For container traffic, it will pass through the veth pair network card on the host side . The pair network card enters the host network stack, that is, it will pass the host's iptables rules and then be sent out by the physical network card.
  • macvlan sub-interface: The macvlan sub-interface is completely independent of the original host main interface and can be configured with a separate MAC address and IP address. When communicating externally, container traffic will not enter the host network stack, will not pass the host's iptables rules, and will only pass through the second layer and be sent out by the physical network card.

5. Network plug-in application scenarios

In view of the complex network conditions in the data center, we need to choose the corresponding container network solution according to the needs .

  • If you want to be less invasive to the data center's physical network, you can choose to use a tunnel solution .
  • If dual stack support is required, the hybridnet vxlan solution can be selected .
  • If only single-stack IPv4 is supported, you can choose calico IPIP or calico vxlan .
  • Hope that the data center supports and uses BGP
  • If the host is in the same network segment, the calico BGP solution (supports dual stack) can be selected .
  • If the host machine is in a different network segment, you can choose the hybridnet bgp solution (supports dual stack) .
  • In pursuit of high performance and low latency for business, solutions such as macvlan and ipvlan l2 have emerged .
  • In the public cloud scenario, you can choose the terway solution, or other ipvlan l3 solutions, or tunnel solutions .
  • There are also solutions developed to meet all scenarios, such as hybridnet and multus. Multus is an open source container network plug-in to support other CNI capabilities .

In this article, we will conduct a detailed analysis of hybridnet vxlan, hybridnet vlan, hybridnet bgp, calico IPIP, calico BGP and bifrost based on macvlan transformation on pod data links .

3. Network plug-in architecture and communication path

1. Hybridnet

Overall architecture

  • Hybridnet-daemon: controls the data plane configuration on each node, such as Iptables rules, policy routing , etc.

Communication Path

(1) VXLAN mode

  • Same-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack .

According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod2 in the 39999 routing table .

The traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action .

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack .

According to the destination IP, the traffic matches the 39999 routing table in the host's policy routing, and matches the routing rules of Pod1 in the 39999 routing table .

The traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action .

  • Cross-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack .

According to the destination IP, the traffic is matched to the 40000 routing table in the host's policy routing, and in the 40000 routing table, it is matched to the routing rule that the network segment where Pod2 is located needs to be sent to the eth0.vxlan20 network card .

The forwarding table of the eth0.vxlan20 device records the correspondence between the mac address of the peer vtep and the remoteip .

The traffic passes through the eth0.vxlan20 network card and is encapsulated with a UDP header .

After querying the route, it is found that the local machine is in the same network segment. The mac address of the other party's physical network card is obtained through mac address query and sent via Node1 eth0 physical network card .

Traffic enters from the Node2 eth0 physical network card and decapsulates a UDP header through the eth0.vxlan20 network card .

According to the 39999 routing table, traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action .

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack .

According to the destination IP, the traffic is matched to the 40000 routing table in the host's policy routing, and the routing rule that the network segment where Pod1 is located needs to be sent to the eth0.vxlan20 network card is matched in the 40000 routing table .

The forwarding table of the eth0.vxlan20 device records the correspondence between the mac address of the peer vtep and the remoteip .

The traffic passes through the eth0.vxlan20 network card and is encapsulated with a UDP header .

After querying the route, it is found that the local machine is in the same network segment. The mac address of the other party's physical network card is obtained through mac address query and sent via Node2 eth0 physical network card .

Traffic enters from the Node1 eth0 physical network card and decapsulates a UDP header through the eth0.vxlan20 network card .

According to the 39999 routing table, traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action .

(2) VLAN Mode

  • Same-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack .

According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod2 in the 39999 routing table .

The traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action .

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack .

According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod1 in the 39999 routing table .

The traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action .

  • Cross-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack .

According to the destination IP, the traffic is matched to the 10001 routing table in the host's policy routing, and is matched to the corresponding routing rule of Pod2 in the 10001 routing table .

According to the routing rules, the traffic is sent from the eth0 physical network card corresponding to the eth0.20 network card and sent to the switch .

The switch matches the MAC address of pod2, so the traffic is sent to the eth0 physical network card corresponding to Node2 .

The traffic is received by the eth0.20 VLAN network card, and according to the route matched by the 39999 routing table, the traffic is sent from the hybrYYY network card to the pod2 container network stack, completing the packet sending action .

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack .

According to the destination IP, the traffic is matched to the 10001 routing table in the host's policy routing, and is matched to the routing rule corresponding to Pod1 in the 10001 routing table .

According to the routing rules, the traffic is sent from the eth0 physical network card corresponding to the eth0.20 network card and sent to the switch .

The switch matches the MAC address of pod1, so the traffic is sent to the eth0 physical network card corresponding to Node1 .

The traffic is received by the eth0.20 VLAN network card, and according to the route matched by the 39999 routing table, the traffic is sent from the hybrXXX network card to the pod1 container network stack to complete the packet return action .

(3) BGP mode

  • Same-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack .

According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod2 in the 39999 routing table .

The traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action .

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack .

According to the destination IP address, the traffic is matched to the 39999 routing table in the host's policy routing, and is matched to the routing rule of Pod1 in the 39999 routing table .

The traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action .

  • Cross-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->hybrXXX on the host side, and enters the host network stack .

According to the destination IP, the traffic is matched to the 10001 routing table in the host's policy routing, and is matched to the default route in the 10001 routing table .

According to the route, the traffic is sent to the switch corresponding to 10.0.0.1 .

The switch matches the specific route corresponding to pod2 and sends the traffic to the eth0 physical network card of Node2 .

The traffic enters the Pod2 container network stack from the hybrYYY network card and completes the packet sending action .

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->hybrYYY on the host side, and enters the host network stack .

According to the destination IP, the traffic is matched to the 10001 routing table in the host's policy routing, and is matched to the default route in the 10001 routing table .

According to the route, the traffic is sent to the switch corresponding to 10.0.0.1 .

The switch matches the specific route corresponding to pod1 and sends the traffic to the eth0 physical network card of Node1 .

The traffic enters the Pod1 container network stack from the hybrXXX network card and completes the packet return action .

2. Calico

Basic concepts:

  • A pure three-layer data center network solution .
  • The Linux Kernel is used to implement vRouter on the host machine to forward data .
  • The vRouter propagates routing information through the BGP protocol .
  • Based on iptables, it also provides rich and flexible network policy rules .

Overall architecture

  • Felix: runs on each container host node and is responsible for configuring routing, ACL and other information to ensure the connectivity of the container .
  • Brid: Distributes the routing information written by Felix into the Kernel to the Calico network to ensure the effectiveness of communication between containers .
  • etcd: Distributed Key/Value storage, responsible for network metadata consistency and ensuring the accuracy of the Calico network status .
  • RR: Route reflector. By default, Calico works in node-mesh mode . All nodes are connected to each other. Node-mesh mode works well in small-scale deployment. However, in large-scale deployment, the number of connections will be very large, consuming too many resources. BGP RR can avoid this situation . Through one or more BGP RRs, centralized route distribution is completed, reducing the consumption of network resources and improving Calico's work efficiency and stability .

Communication Path

(1) IPIP mode

  • Same-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->caliXXX on the host side, and enters the host network stack .

Based on the destination IP, the traffic matches the routing rule of Pod2 in the routing table .

The traffic enters the Pod2 container network stack from the caliYYY network card, completing the packet sending action .

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->caliYYY on the host side, and enters the host network stack .

Based on the destination IP, the traffic matches the routing rule of Pod1 in the routing table .

The traffic enters the Pod1 container network stack from the caliXXX network card and completes the packet return action .

  • Cross-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->caliXXX on the host side, and enters the host network stack .

src:pod1IP

dst: pod2IP

According to the destination IP, the traffic matches the routing rule in the routing table that forwards the traffic to the tunl0 network card .

src:pod1IP

dst: pod2IP

Traffic is encapsulated by IPIP from tunl0 (that is, encapsulated with an IP header) and sent out through the eth0 physical network card .

src:Node1IP

dst: Node2IP

Traffic enters Node2's host network stack from Node2's eth0 network card .

src:Node1IP

dst: Node2IP

Traffic enters tunl0 for IPIP depacketization .

src:pod1IP

dst: pod2IP

The traffic enters the Pod2 container network stack from the caliYYY network card, completing the packet sending action .

src:pod1IP

dst: pod2IP

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->caliYYY on the host side, and enters the host network stack .

src:pod2IP

dst: pod1IP

According to the destination IP, the traffic matches the routing rule in the routing table that forwards the traffic to the tunl0 network card .

src:pod2IP

dst: pod1IP

Traffic is encapsulated by IPIP from tunl0 (that is, encapsulated with an IP header) and sent out through the eth0 physical network card .

src: Node2IP

dst: Node1IP

Traffic enters Node1's host network stack from Node1's eth0 network card .

src: Node2IP

dst: Node1IP

Traffic enters tunl0 for IPIP depacketization .

src:pod2IP

dst: pod1IP

The traffic enters the Pod1 container network stack from the caliXXX network card and completes the packet return action .

src:pod2IP

dst: pod1IP

(2) BGP mode

  • Same-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->caliXXX on the host side, and enters the host network stack .

Based on the destination IP, the traffic matches the routing rule of Pod2 in the routing table .

The traffic enters the Pod2 container network stack from the caliYYY network card, completing the packet sending action .

Packing return process:

  • Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->caliYYY on the host side, and enters the host network stack .
  • Based on the destination IP, the traffic matches the routing rule of Pod1 in the routing table .

The traffic enters the Pod1 container network stack from the caliXXX network card and completes the packet return action .

  • Cross-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the veth-pair network card, that is, from pod1's eth0->caliXXX on the host side, and enters the host network stack .

According to the destination IP, the traffic matches the routing rule of the corresponding network segment of Pod2 in the routing table and is sent out from the eth0 physical network card of Node1 .

The traffic enters from the Node2 eth0 physical network card and enters the Pod2 container network stack from the caliYYY network card, completing the packet sending action .

Packing return process:

Pod2 traffic passes through the veth-pair network card, that is, from pod2's eth0->caliYYY on the host side, and enters the host network stack .

According to the destination IP, the traffic matches the routing rule of the corresponding network segment of Pod1 in the routing table and is sent out from the eth0 physical network card of Node2 .

The traffic enters from the Node1 eth0 physical network card and enters the Pod1 container network stack from the caliXXX network card to complete the packet return action .

3. Bifrost

Overall architecture

  • veth0-bifrXXX: bifrost is a solution for implementing service access for macvlan, through veth-pair . The network card completes the conversion of service traffic into access to pod by kube-proxy + iptables in the host network stack .
  • eth0: The eth0 network card in the container is the macvlan network card corresponding to the host vlan subnet card .

Communication Path

(1) MACVLAN mode

  • Same node and same VLAN communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the macvlan network card, that is, pod1's eth0 goes through the layer 2 network and enters the eth0-10 VLAN subnet card .

Since macvlan is in bridge mode, it can match the MAC address of pod2 .

The traffic enters the Pod2 container network stack from the eth0-10 VLAN subnet card and completes the packet sending action .

Packing return process:

Pod2 traffic passes through the macvlan network card, that is, pod2's eth0 goes through the layer 2 network and enters the eth0-10 VLAN subnet card .

Since macvlan is in bridge mode, it can match the MAC address of pod1 .

The traffic enters the Pod1 container network stack from the eth0-10 VLAN subnet card and completes the packet return action .

  • Same-node cross-vlan communication

The communication process of Pod1 accessing Pod2 is as follows:

Pod1 traffic passes through the macvlan network card, that is, pod1's eth0 takes the default route (gateway is 5.0.0.1) and enters the eth0-5 vlan subnet card .

Since the MAC address of the gateway 5.0.0.1 is found on eth0-5, the traffic is sent from the eth0 physical network card to the switch .

The traffic is matched to the MAC address of pod2 on the switch .

The traffic enters the physical network card of the host where Pod2 is located, and enters the corresponding eth0-10 VLAN subnet card .

The traffic enters the Pod2 container network stack from the eth0-10 VLAN subnet card and completes the packet sending action .

Packing return process:

Pod2 traffic passes through the macvlan network card, that is, pod2's eth0 takes the default route (gateway is 10.0.0.1) and enters the eth0-10 vlan subnet card .

Since the MAC address of the gateway 10.0.0.1 is found on eth0-10, the traffic is sent from the eth0 physical network card to the switch .

The traffic is matched to the MAC address of pod1 on the switch .

The traffic enters the physical network card of the host where Pod1 is located, and enters the corresponding eth0-5 VLAN subnet card .

The traffic enters the Pod1 container network stack from the eth0-5 VLAN subnet card and completes the packet return action .

  • Cross-node communication

Communication process of Pod1 accessing Pod2

Package issuance process:

Pod1 traffic passes through the macvlan network card, that is, pod1's eth0 takes the default route (gateway is 5.0.0.1) and enters the eth0-5 vlan subnet card .

Since the MAC address of the gateway 5.0.0.1 is found on eth0-5, the traffic is sent from the eth0 physical network card to the switch .

The traffic is matched to the MAC address of pod2 on the switch .

The traffic enters the physical network card of the host where Pod2 is located, and enters the corresponding eth0-10 VLAN subnet card .

The traffic enters the Pod2 container network stack from the eth0-10 VLAN subnet card and completes the packet sending action .

Packing return process:

Pod2 traffic passes through the macvlan network card, that is, pod2's eth0 takes the default route (gateway is 10.0.0.1) and enters the eth0-10 vlan subnet card .

Since the MAC address of the gateway 10.0.0.1 is found on eth0-10, the traffic is sent from the eth0 physical network card to the switch .

The traffic is matched to the MAC address of pod1 on the switch .

The traffic enters the physical network card of the host where Pod1 is located, and enters the corresponding eth0-5 VLAN subnet card .

The traffic enters the Pod1 container network stack from the eth0-5 VLAN subnet card and completes the packet return action .

IV. Problems and Future Development

1. IPv4/IPv6 dual stack

background

IP, as the most basic element of the Internet, is a protocol designed for interconnected computer networks to communicate. It is precisely because of the IP protocol that the Internet has been able to rapidly develop into the world's largest and open computer communication network. With the development of the Internet, the IP protocol has produced two protocols: IPv4 and IPv6:

  • IPv4

IPv4 is the fourth version of the Internet Protocol, a datagram transmission mechanism used in computer networks. This protocol is the first widely deployed IP protocol. Every device connected to the Internet (whether it is a switch, PC or other device) will be assigned a unique IP address, such as 192.149.252.76, as shown in the figure below. IPv4 uses 32-bit (4-byte) addresses and can store approximately 4.3 billion addresses. However, as more and more users access the Internet, the global IPv4 addresses have been completely exhausted in November 2019. This is also one of the reasons why the subsequent Internet Engineering Task Force (IEIF) proposed IPv6.

  • IPv6

IPv6 is the sixth version of the Internet Protocol proposed by IEIF. It is the next generation protocol to replace IPv4. Its proposal not only solves the problem of scarce network address resources, but also solves the obstacles for various access devices to access the Internet. The address length of IPv6 is 128 bits, which can support more than 340 trillion addresses. As shown in the figure below, 3ffe:1900:fe21:4545:0000:0000:0000:0000, this is an IPv6 address. IPv6 addresses are usually divided into 8 groups, 4 hexadecimal numbers in one group, and each group is separated by a colon.

When IPv4 was the mainstream and IPv6 had not yet emerged, the main problems faced were:

The number of IPv4 addresses no longer meets demand, and IPv6 addresses are needed for expansion .

As the domestic next-generation Internet development policy becomes clearer, customer data centers need to use IPv6 to comply with stricter regulations .

status quo


HybridNet

calico IPIP

calico BGP

bifrost

Whether to support IPv6/dual stack

yes

no

yes

no

Reasons why calico IPIP does not support IPv6:

  • ipip is a common IPIP tunnel, which encapsulates the message into an IPv4 message, so it does not support IPv6 packets .

2. Multiple network cards (multiple communication mechanisms)

background

Usually in K8s, a Pod has only one interface, that is, a single network card, which is used for pod-to-pod communication in the cluster network. When a Pod needs to communicate with a heterogeneous network, you can choose to establish multiple interfaces in the Pod, that is, multiple network cards.

Current issues:

  • Some customers have limited real IP resources, which makes it impossible to use the underlay solution .
  • The customer wants to separate the UDP network from the TCP network, which means that the network model based on the TCP protocol cannot exist independently in the UDP network .

status quo

There are two solutions to implement multiple network cards:

  • When a single CNI calls IPAM, the corresponding network card is generated through CNI config configuration and appropriate IP resources are allocated .
  • The meta-CNI calls each CNI in turn to complete the corresponding network card and allocate appropriate IP resources, such as the multus solution .

3. Network traffic control

background

Usually in a data center, we divide its network traffic into two types. One is the traffic between external users and internal servers in the data center, which is called north-south traffic or vertical traffic; the other is the traffic between internal servers in the data center, also called east-west traffic or horizontal traffic.

In the scope of container cloud, we define east-west traffic as traffic between hosts and containers, between containers, or between hosts within a cluster, and north-south traffic as traffic between the outside of the container cloud and the inside of the container cloud.

Current issues:

  • Traditional firewalls are unable to control traffic in the east-west scenario of container clouds and need to provide the ability to control traffic between services or containers .

status quo


calico

cillum

bifrost-commercial version

Technical foundation

iptables

ebpf

ebpf

Adaptability

Layer 3 routing and traffic passing through the host network stack

Layer 2 and meets the Cillum communication method

Mainstream CNI plugins

References

1. Analysis of cross-host communication in calico vxlan ipv4 overlay network

https://www.jianshu.com/p/5edd6982e3be

2. Qunar container platform network: Calico

http://dockone.io/article/2434328

3. Best vxlan introduction

https://www.jianshu.com/p/cccfb481d548

4. Uncovering the secrets of IPIP tunnels

https://morven.life/posts/networking-3-ipip/

5. Basic knowledge of BGP

https://blog.csdn.net/qq_38265137/article/details/80439561

6. VLAN Basics

https://cshihong.github.io/2017/11/05/VLAN%E5%9F%BA%E7%A1%80%E7%9F%A5%E8%AF%86/

7. Differences and overview of Overlay and Underlay network protocols

https://www.cnblogs.com/fengdejiyixx/p/15567609.html#%E4%BA%8Cunderlay%E7%BD%91%E7%BB%9C%E6%A8%A1%E5%9E%8B

8. Summary of the East-West Traffic Traction Solution

http://blog.nsfocus.net/east-west-flow-sum/

9. Container Network Interface (CNI)

https://jimmysong.io/kubernetes-handbook/concepts/cni.html

10. In-depth understanding of CNI in K8s network

https://zhuanlan.zhihu.com/p/450140876

<<:  Wi-Fi Alliance: Wi-Fi 6 and 6E have been "rapidly adopted"

>>:  Understand the IP location function of the entire network in one article

Recommend

How long can the operators’ hard-earned V-shaped rebound last?

The latest data released by the Ministry of Indus...

Starlink Internet service aims to increase speed to 300Mbps this year

Starlink recently announced that the satellite in...

COVID-19 impacts industries, 5G and broadband will become a top priority

Biden is hoping to finalize an infrastructure bil...

5G network deployment brings both opportunities and challenges

5G (or 5th generation mobile networks) deployment...

How to implement a custom serial communication protocol?

[[402368]] This article is reprinted from the WeC...

TCP waves four times: Why four times? The principle is revealed!

introduction Hello, everyone. I am your technical...

Krypt: $11.11/half year-2G/50GB/2TB/Los Angeles & San Jose

The Double 11 discount of Krypt's ION platfor...