A major technical challenge for container cloud platforms: Should the network choose SDN or underlay, or...?

A major technical challenge for container cloud platforms: Should the network choose SDN or underlay, or...?

[[320730]]

How to choose the network when building a container cloud platform? If you choose SDN network, how should you choose the SDN network implementation solution?

The network of the container cloud platform has always been a technical problem. Should we use the SDN network or bridge to the Underlay network? If we use the SDN network, how to choose from so many SDN network implementation solutions?

The question comes from @Dongxin, a system architect of a certain banking company. The following content comes from the practical experience sharing of many peers in the twt community. Everyone is welcome to participate in the exchange and express their opinions.

@liufengyi Software Architect of an Internet Bank:

Prioritize flattening and interconnecting the entire company's network, so that the cost of application transformation is much lower, that is, based on the company's original network. If the container application is a relatively independent service, you can consider overlay. If the scale is not large, some open source network components can be considered.

@A system engineer at a financial enterprise:

calico, bgp, ingress, nginx, etc. are all OK

1. Calico connects the network inside the cluster with the network outside the cluster. However, as more and more nodes inside the cluster are accessed from outside the cluster, the difficulty of operation and maintenance will increase.

2. BGP needs to configure internal routing protocols, which will cause performance loss

3. Ingress and nginx are similar. Use these two components to load external applications into the cluster.

4. Hostnetwork method

5. Nodeport method

How to choose depends on your own IT architecture and regulatory requirements

@zhuqibs Software Development Engineer:

Regarding SDN or underlay, if you build it yourself, you will definitely not choose SDN. It's about cost, brothers! Cisco wanted to build an ACI for us last year, and a switch cost more than 10,000 yuan. If the bank has a lot of money, it doesn't matter, but small and medium-sized enterprises are short of funds and the epidemic has made it difficult for them to choose it.

We have also used VMware's NST-G, which has bugs. This PKS has a "period" every month. Every time, the network storage is completely blocked and the IO is basically slow. The manufacturer sent people to solve the problem for more than half a year, and even replaced the switch but the problem was not solved. As a result, they compensated us with 3 servers and replaced them all with ordinary vcentor.

SDN sounds good, but the reality is very bleak. Of course, if you have money and are willing to make mistakes and pursue new technologies, it is no problem. For example, public clouds such as Alibaba Cloud and Tencent Cloud are basically SDN. They have money and people to fill the gap.

Therefore, most companies only need Underlay, plus Kubernetes' own California, fannel, or cattle, and there will generally be no problem. The only problem is that there is no hard isolation on the network and no customized stuff, but we can use public cloud. It costs a lot of money to build it ourselves.

@xiaoping378 System Architect of a technology company:

1. Since we are talking about the construction of a container cloud platform, we must have network infrastructure in place, so we do not consider the data center-level SDN solution. We only consider building on the existing network construction results.

2. Don't be superstitious about commercial solutions. Whether it is open source or commercial solutions, everyone must comply with the CNI interface of k8s. It is not recommended to place too many functions in the container network solution, such as network speed limit, group security strategy, etc.

3. Considering that the host layer can generally ensure that the second layer is accessible, the simplest solution is flannel. At present, flannel has been developed to support the activation of DR network in vxlan mode. In layman's terms, in the same subnet, hostgw is used, and the host machine acts as a soft router with performance close to that of a bare network. In the case of a subnet collapse, vxlan is used. This takes into account both performance and scalability. In addition, flannel currently focuses on optimizing the problem of excessive usage of routing tables or arp in large-scale container clouds, and has achieved the following: only one routing item, one fdb item, and one arp item are added for each host expansion.

4. If you consider container network isolation and security policies (actually, this is not necessary. For network isolation, you can set scheduling policies at the project level to achieve physical isolation), you can consider the Canal network solution, which is a combination of calico and flannel.

@Garyy An insurance system engineer:

Ideas for building container networks:

The development of container networks has now become a two-horse meeting. The two-horse meeting actually refers to Docker's CNM and CNI led by Google, CoreOS, and Kuberenetes. First of all, it should be made clear that CNM and CNI are not network implementations, they are network specifications and network systems. From the perspective of R&D, they are just a bunch of interfaces. They don't care whether you use Flannel or Calico at the bottom layer. CNM and CNI are concerned about network management issues.

The network demand survey found that business departments are mainly concerned about the following points: 1. Connecting the container network with the physical network; 2. The faster the better; 3. The fewer changes the better; 4. As few risk points as possible.

The network solutions for containers can be roughly divided into three forms: protocol stack level, traversal form, and isolation mode.

Protocol stack level: Layer 2 is easier to understand. It is more common in traditional computer rooms or virtualization scenarios. It is based on ARP+MAC learning based on bridging. Its biggest flaw is broadcasting. Because the broadcast of Layer 2 will limit the number of nodes; Layer 3 (pure routing forwarding), the protocol stack Layer 3 is generally based on BGP, and autonomously learns the routing status of the entire computer room. Its biggest advantage is its IP penetration, that is, as long as it is a network based on this IP, then this network can go through. Obviously, its scale is very advantageous and has good scalability. However, in the actual deployment process, because most of the enterprise's network is controlled. For example, some enterprise networks do not use BGP for developers due to security considerations, or the enterprise network itself is not BGP, then you are limited in this case; the advantage of the protocol stack Layer 2 plus Layer 3 is that it can solve the scale expansion problem of pure Layer 2 and the various limitations of pure Layer 3, especially in the cloud VPC scenario, you can use the cross-node Layer 3 forwarding capability of VPC.

Traversal form: This is closely related to the actual deployment environment. Traversal forms are divided into two types: Underlay and Overlay.

Underlay: In a well-controlled network scenario, we generally use Underlay. It can be simply understood that no matter whether it is a bare metal or a virtual machine, as long as the entire network is controllable, the container network can pass through it directly. This is Underlay.

Overlay: Overlay is common in cloud scenarios. Under Overlay is a controlled VPC network. When an IP or MAC that is not within the jurisdiction of the VPC appears, the VPC will not allow this IP/MAC to pass through. In this case, we can use the Overlay method to do it.

Overlay network virtualizes physical network and pools resources, which is the key to realize cloud-network integration. Combining Overlay network with SDN technology and using SDN controller as the controller of Overlay network control plane makes it easier to integrate network and computing components, which is an ideal choice for the transformation of network to cloud platform service.

Isolation method: Isolation methods are usually divided into VLAN and VXLAN.

VLAN: VLAN is widely used in computer rooms, but there is actually a problem. The total number of tenants is limited. As we all know, VLAN has a limit on the number.

VXLAN: VXLAN is a more mainstream isolation method today because it is scalable and based on IP traversal.

@Steven99 Software Architect:

I personally think that the choice of container network is not the key point. In fact, no matter which type of network is used, it should be transparent to the end user, so you should not be obsessed with the network model.

The key points to consider may be security, stability, ease of use, etc. We used the calico network and found that there were also many problems, so we are considering replacing it. Open source products always require a lot of extra work, testing and verification, and gradual optimization. Without actual use, it is difficult to say which one is more suitable. In the process of use, you may gradually clarify your needs.

Container security and container network security may be the key points, especially in production business. When the number of services reaches a certain amount, there will be many unexpected problems. Of course, it is difficult to choose if you have not actually done it, so you can try to use it first. With frequent use, you will gradually understand what you want.

<<:  5G is gaining popularity, is artificial intelligence going to be "left out"?

>>:  Multi-cloud, security integration drives mass SD-WAN adoption

Recommend

80VPS May Promotion: 800 yuan/month-E3-1230/32GB/1TB/8C (232 IPs) cluster server

80VPS is offering a promotion for some cluster se...

Knowledge Popularization | 7 Deployment Solutions for 5G Private Networks

[[315546]] What is 5G private network? 5G private...

Friendhosting 9 data centers VDS 45% off annual payment from 14.5 euros

Friendhosting is a long-established Bulgarian hos...

RAKsmart: $59/month-2*E5-2620v2/32GB/1TB/50M bandwidth/South Korea server

Thanks to the advantage of physical distance, mos...

How to build supply chain finance modernization from Shengye

The report of the 20th National Congress of the C...

Verizon adds three new regions to its 5G mmWave service

Verizon's 5G millimeter wave network is now a...

The Evolution of Cloud Desktop! The Solution for Future Office

Five or six years ago, the rumor that "cloud...