vSwitch expansion in the Ack cluster Terway network scenario

vSwitch expansion in the Ack cluster Terway network scenario

[[442525]]

Table of contents

  • 1. Terway Network Introduction
  • 2. Problem phenomenon
  • 3. Capacity expansion operation
    • 3.1 Add a switch and configure NAT
    • 3.2 Configuring Terway for Cluster
    • 3.3 Restart terway

1. Terway Network Introduction

Terway is an open-source CNI (Container Network Interface) plug-in based on the private network VPC developed by Alibaba Cloud. It supports defining access policies between containers based on Kubernetes standard network policies. You can use the Terway network plug-in to achieve network interconnection within the Kubernetes cluster.

The Terway network plug-in assigns native elastic network cards to Pods to implement Pod networks. It supports network policies based on Kubernetes standards to define access policies between containers and is compatible with Calico network policies.

In the Terway network plug-in, each Pod has its own network stack and IP address. Communication between Pods in the same ECS is directly forwarded through the internal forwarding of the machine. For Pods across ECSs, messages are directly forwarded through the elastic network card of the VPC. Since there is no need to use tunneling technologies such as VxLAN to encapsulate messages, the Terway mode network has higher communication performance.

In a word, the biggest feature of Terway is that it uses the characteristics of the ECS server on the cloud to level the network of pods and nodes, and uses the IP address in the vSwitch under VPC.

2. Problem phenomenon

Due to the use of the Terway network mode, as the number of node machines and pods increases, each allocated IP address will consume the available IP addresses of the vswitch under the vpc. If the business grows rapidly in a short period of time, causing a large number of available IP addresses to be consumed by pods, then there may be insufficient available IP addresses of the vSwitch due to insufficient early planning.

At this time, the status of the newly created pod is ContainerCreating. When you describe the pod, the prompt "error allocate ip..." appears. At this time, check the Terway log of the node where the Pod is located, and you will see the following content:

  1. Message: The specified VSwitch "vsw-xxxxx" has not enough IpAddress.

If there are not enough IP addresses, it means that the switch IP address is insufficient. You can log in to the switch console to view the number of available IP addresses of the switch where the node is located. If there are very few or even 0 IP addresses, it means that capacity expansion is required.

3. Capacity expansion operation

3.1 Add a switch and configure NAT

Create a new vSwitch in the VPC corresponding to the private network management console. The vSwitch must be in the same region as the vSwitch with insufficient IP resources. This is because the strategy of Terway when assigning pod IPs is to assign the IP corresponding to the vSwitch in the availability zone where the node is located. Therefore, capacity expansion requires the expansion of the switch in the same availability zone.

This should be considered when initializing a new switch in the cluster and when expanding the switch capacity. As the density of Pods increases, in order to meet the growing demand for IP addresses from Pods, it is recommended that the network bits of the vSwitch created for Pods be less than or equal to 19, that is, each network segment contains at least 8192 available IP addresses.

After the vSwitch is created, you need to configure a NAT policy for the vSwitch to access the external network.

3.2 Configuring Terway for Cluster

Configure the cluster's Terway and add the vSwitch created above to the Terway's ConfigMap configuration.

  1. kubectl -n kube-system edit cm eni-config

For configuration examples, refer to Terway Configuration Reference [1]. Some of the contents are described as follows:

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name : eni-config
  5. namespace: kube-system
  6. data:
  7. 10-terway.conf: |-
  8. {
  9. "cniVersion" : "0.3.0" ,
  10. "name" : "terway" ,
  11. "type" : "terway"  
  12. }
  13. disable_network_policy: "true"  
  14. eni_conf: |-
  15. {
  16. "version" : "1" , # version
  17. "max_pool_size" : 80, # Maximum water level of resource pool
  18. "min_pool_size" : 20, # minimum water level of resource pool
  19. "credential_path" : "/var/addon/token-config" ,
  20. "vswitches" : { "cn-shanghai-f" :[ "vsw-AAA" , "vsw-BBB" ]}, # Associated virtual switches (ENI multi-IP mode), add vsw-BBB to the VSwitches section, where vsw-AAA is an existing VSwitch with insufficient IP resources
  21. "eni_tags" : { "ack.aliyun.com" : "xxxxxxxxx" },
  22. "service_cidr" : "172.16.0.0/16" , # Service CIDR
  23. "security_group" : "sg-xxxxxxx" , # security group ID
  24. "vswitch_selection_policy" : "ordered"  
  25. }

In the above configuration parameters, the configuration value of the resource pool water level. Terway uses the underlying network resources of the underlying virtualization to connect the container network. The creation and release of network resources require a series of API calls. Frequent API calls during Pod creation and destruction will result in a long Pod configuration time. Terway caches resources in a pooled manner. When the water level is less than the minimum water level of the resource pool, resources are automatically replenished. When the water level is greater than the maximum water level of the resource pool, resources are released. This ensures efficient resource utilization and allocation efficiency.

This is equivalent to pre-allocating an IP address. The specific settings can be flexibly set based on the maximum number of ENI auxiliary network cards and the maximum number of pods supported by the machine node specifications.

3.3 Restart terway

Restart all Terway pods to quickly refresh the cache and take effect.

  1. # kubectl -n kube-system delete pod -l app=terway-eniip
  2. # kubectl -n kube-system get pod | grep terway

After restarting, check whether the abnormal pod has obtained the IP normally.

When troubleshooting issues related to IP allocation for a pod, you can also enter the Terway pod of the node and execute the command line to view the currently allocated IP addresses and the temporarily idle IP addresses that have been allocated from the vSwitch.

  1. # terway-cli mapping
  2. Status | Pod Name | Res ID | Factory Res ID
  3. Normal | node-problem-detector-l5h52 | 00:16:10:48:3e:37.10.244.18.167 | 00:16:10:48:3e:37.10.244.18.167
  4. ...
  5. Idle | | 00:16:10:48:3e:37.10.244.18.132 |
  6. 00:16:10:48:3e:37.10.244.18.18 | 00:16:10:48:3e:37.10.244.18.18
  7. 00:16:10:48:3e:37.10.244.18.54 | 00:16:10:48:3e:37.10.244.18.54

See you ~

References

[1]Terway configuration reference:

https://github.com/AliyunContainerService/terway/blob/main/docs/dynamic-config.md

This article is reprinted from the WeChat public account "Xianren Technology", which can be followed through the following QR code. To reprint this article, please contact the Xianren Technology public account.

<<:  What are the 5G scenarios in digital transformation?

>>:  Custom Traefik (local) plugins

Recommend

A brief discussion on WebSocket interface testing

What is WebSocket WebSocket is a protocol based o...

Key Roles of Artificial Intelligence in Mobile App Development

[[431728]] 【51CTO.com Quick Translation】 Today, t...

Daily Algorithm: Stair Climbing Problem

[[433205]] Suppose you are climbing a staircase. ...

Weird! 5G networking using option 6?

[[341641]] This article is reprinted from the WeC...

A complete history of web crawlers

[[415987]] The well-known research organization A...

Let’s talk about the Vrrp protocol?

[[374759]] This article is reprinted from the WeC...

How edge computing will benefit from 5G technology

With the development of 5G technology, more and m...