Spiderpool: How to solve the problem of zombie IP recycling

Spiderpool: How to solve the problem of zombie IP recycling

In the Underlay network, how to recycle zombie IPs? The cloud native network open source project--Spiderpool provides a corresponding solution. Let's take a look.

01Underlay Network Solution

Why do we need an Underlay network solution? In a data center private cloud, there are many application scenarios that require an Underlay network:

  • Low latency and high throughput: In some application scenarios that require low latency and high throughput, the Underlay network solution is usually more advantageous than the Overlay network solution. Since the Underlay network is built on the physical network, it can provide faster and more stable network transmission services.
  • Traditional host applications on the cloud: In data centers, many traditional host applications still use traditional network connection methods, such as service exposure and discovery, multi-subnet connection, etc. In this case, the use of Underlay network solutions can better meet the needs of these applications.
  • Data center network management: Data center managers usually need to implement security control over applications, such as using firewalls, VLAN isolation, etc. In addition, they also need to use traditional network observation methods to implement cluster network monitoring. Using an Underlay network solution can more easily meet these requirements.
  • Independent host network card planning: In some special application scenarios, such as Kubevirt, storage projects, and log projects, independent host network card planning is required to ensure bandwidth isolation of the underlying subnet. Using the Underlay network solution can better support the needs of these applications, thereby improving application performance and reliability.

With the increasing popularity of private cloud in data centers, underlay networks, as an important part of data center network architecture, have been widely used in data center network architecture to provide more efficient network transmission and better network topology management capabilities.

02Zombie IP Problem in Underlay Network

What is a zombie IP? The IP addresses assigned to Pods are recorded in IPAM, but these Pods no longer exist in the Kubernetes cluster. These IPs can be called zombie IPs.

In actual production, it is inevitable that zombie IPs will appear in the cluster, such as:

  • When deleting a Pod in the cluster, the cni delete call fails due to network anomalies or cni binary crashes, resulting in the IP address not being reclaimed by cni.
  • After a node unexpectedly crashes, the Pods in the cluster are permanently in the deleting state, and the IP addresses occupied by the Pods cannot be released.

In a Kubernetes cluster using an Underlay network, when zombie IPs appear, the following problems may occur:

  • Limited IP resources in the Underlay network: In a large-scale cluster, the number of Pods may be very large. IPAM will allocate a specified Underlay subnet IP to each Pod instance for network communication. If the zombie IP problem occurs, it may lead to a large amount of IP resource waste, or there may be no Underlay IP resources available.
  • Fixed IP requirements lead to new Pod startup failure: If an IP pool with 10 IP addresses is fixed to 10 replicas of an application, if the zombie IP problem mentioned above occurs, the old Pod IP cannot be recycled, and the new Pod will not be able to start due to lack of IP resources and inability to obtain available IP. This will threaten the stability and reliability of the application, and may even cause the entire application to fail to operate normally.

03Solution: Spiderpool

Spiderpool (https://github.com/spidernet-io/spiderpool) is an Underlay network solution for Kubernetes. By providing lightweight meta plug-ins and IPAM plug-ins, Spiderpool flexibly integrates and strengthens the existing CNI projects in the open source community, simplifies the operation and maintenance of IPAM in the Underlay network to the greatest extent, and makes multi-CNI collaboration truly feasible. It supports running in bare metal, virtual machines, public clouds and other environments.

Spiderpool uses the following IP recovery mechanism to solve the problem of faulty IP in the Underlay network:

  • For Pods in the Terminating state, Spiderpool will automatically release their IP addresses after the Pod's spec.terminationGracePeriodSecond. This feature can be controlled by the environment variable SPIDERPOOL_GC_TERMINATING_POD_IP_ENABLED. This capability can be used to resolve failure scenarios where nodes unexpectedly crash.
  • In failure scenarios such as cni delete failure, if a Pod that was once assigned an IP address is destroyed, but the assigned IP address is still recorded in IPAM, a zombie IP phenomenon is formed. Spiderpool will automatically recycle these zombie IP addresses based on periodic and event scanning mechanisms to address this problem.

04Equal Proportional IP Allocation Test

IPAM requires the ability to accurately allocate IP addresses, and Spiderpool also has a robust faulty IP recovery capability. The author conducted the following proportional IP allocation test to verify this. This proportional IP allocation test is based on the 0.3.1 version of the CNI Specification, using Macvlan with Spiderpool (version v0.6.0) as the test solution, and selected the open source community's Whereabouts (version v0.6.2) with Macvlan, Kube-ovn (version v1.11.8) Calico-ipam (version v3.26.1) several network solutions for comparison, the test scenario is as follows:

1. Create 1,000 Pods and limit the number of available IPv4/IPv6 addresses to 1,000 each, ensuring that the ratio of available IP addresses to Pods is 1:1.

2. Use the following command to rebuild the 1,000 Pods at once, and record the time taken for all the 1,000 rebuilt Pods to run. Verify that when the IP address is fixed, in the case of concurrently rebuilt Pods involving IP address recovery, preemption, and conflict, each IPAM plug-in can quickly adjust the limited IP address resources to ensure the speed of application recovery.

 ~# kubectl get pod | grep "prefix" | awk '{print $1}' | xargs kubectl delete pod

3. Power off all nodes and then power them on again to simulate fault recovery, and record the time it takes for 1,000 Pods to reach the Running state again.

4. Delete all Deployments and record the time it takes for all Pods to completely disappear.

The test data is as follows:

Spiderpool and Kube-ovn's IPAM allocation principle is that all Pods of the entire cluster node are allocated IPs from the same CIDR, so IP allocation and release need to face fierce competition, and the IP allocation performance challenge will be greater; Whereabouts and Calico's IPAM allocation principle is that each node has a small IP set, so IP allocation competition is relatively small, and IP allocation performance challenges will be small. However, from the experimental data, although Spdierpool's IPAM principle is "disadvantaged", its IP allocation performance is very good.

During the test of the Macvlan + Whereabouts combination, 922 Pods in the created scenario reached the Running state at a relatively uniform rate within 14m25s. The Pod growth rate has since been greatly reduced, and it took 21m49s for 1,000 Pods to reach the Running state. As for the rebuilt scenario, after 55 Pods reached the Running state, Whereabouts could no longer allocate IPs to Pods. Since the number of IP addresses and Pods in the test scenario is 1:1, if the IPAM component fails to recycle IPs correctly, the new Pod will not be able to start due to lack of IP resources and inability to obtain available IPs.

05 Conclusion

From the above tests, we can see that Spiderpool performs well in various test scenarios. Although Spiderpool is an IPAM solution for Underlay networks, its IP allocation and recovery capabilities are comparable to those of mainstream Overlay CNI (such as Calico), although Spiderpool faces more complex IP address preemption and conflict issues.

<<:  What are LPWAN technologies?

>>:  V2X communication: A new era of cooperation between vehicles and infrastructure

Recommend

GreenCloudVPS Kansas node is online, 2G memory package starts at $15 per year

GreenCloudVPS has launched its 30th data center p...

Several thinking patterns that need to be changed in the 6G era

First of all, 5/6G is born for the interconnectio...

Learn more about load balancers

Every load balancer is a reverse proxy, but not e...

A brief comparison of two SR-TE implementation methods

1. Brief description of background technology Reg...

Is 5G really useful? Please give technology some time

[[393766]] What is 5G network? "5G" act...

5G toB: The next battle between operators and OTT?

In the 5G era, will the battle between operators ...

TCP/IP based application programming interface

In "TCP/IP Basics: Data Encapsulation",...

Talk about the other side of 5G that you don’t know

At present, domestic policies mainly revolve arou...

Kubernetes network technology analysis: Pod communication based on routing mode

Preface Pods can communicate with each other with...