In the early days of the website, we generally used a single machine to provide centralized services, but as the business volume grew, both performance and stability became more challenging. At this time, we thought of providing better services by expanding capacity. We usually group multiple machines into a cluster to provide external services. However, our website provides the same access portal, such as www.taobao.com. So when a user enters www.taobao.com in the browser, how can the user's request be distributed to different machines in the cluster? This is what load balancing does.
Network Technology (I) A simple understanding of layer 4 and layer 7 load balancing: ① The so-called Layer 4 is load balancing based on IP+port; Layer 7 is load balancing based on application layer information such as URL; similarly, there is Layer 2 load balancing based on MAC address and Layer 3 load balancing based on IP address. In other words, Layer 2 load balancing will receive requests through a virtual MAC address and then distribute them to the real MAC address; Layer 3 load balancing will receive requests through a virtual IP address and then distribute them to the real IP address; Layer 4 receives requests through virtual IP+port and then distributes them to the real server; Layer 7 receives requests through virtual URL or host name and then distributes them to the real server. ② The so-called four- to seven-layer load balancing means that when performing load balancing on the backend server, the traffic is forwarded based on the information of the fourth layer or the seventh layer. For example, the four-layer load balancing is to determine which traffic needs to be load balanced by publishing the third-layer IP address (VIP) and then adding the fourth-layer port number, and then perform NAT processing on the traffic that needs to be processed, forward it to the backend server, and record which server processes this TCP or UDP traffic. All subsequent traffic of this connection is also forwarded to the same server for processing. The seven-layer load balancing is based on the four-layer (there is absolutely no seven-layer without the four-layer), and then considers the characteristics of the application layer. For example, the load balancing of the same Web server, in addition to identifying whether the traffic needs to be processed based on the VIP plus port 80, can also determine whether to perform load balancing based on the URL, browser category, and language of the seven-layer. For example, if your Web server is divided into two groups, one for Chinese language and the other for English language, then the seven-layer load balancing can automatically identify the user language when the user visits your domain name, and then select the corresponding language server group for load balancing. ③ Load balancers are usually called layer 4 switches or layer 7 switches. Layer 4 switches mainly analyze the IP layer and TCP/UDP layer to achieve layer 4 traffic load balancing. In addition to supporting layer 4 load balancing, layer 7 switches also analyze application layer information, such as HTTP protocol URI or cookie information.
Note: Many of the above Load Balancers can do both layer 4 switching and layer 7 switching. (II) Load balancing devices are often referred to as "layer 4 to layer 7 switches", so what is the difference between layer 4 and layer 7? First, there is the difference in technical principles. The so-called four-layer load balancing mainly determines the final internal server selected based on the target address and port in the message and the server selection method set by the load balancing device. Taking the common TCP as an example, when the load balancing device receives a SYN request from the client, it selects a suitable server through the above method, modifies the target IP address in the message (to the backend server IP), and directly forwards it to the server. The TCP connection establishment, that is, the three-way handshake is established directly between the client and the server, and the load balancing device only plays a forwarding action similar to that of a router. In some deployment situations, in order to ensure that the server reply packet can be correctly returned to the load balancing device, the original source address of the message may be modified while forwarding the message. The so-called seven-layer load balancing, also known as "content switching", mainly determines the final internal server selected through the truly meaningful application layer content in the message and the server selection method set by the load balancing device. Taking the common TCP as an example, if the load balancing device wants to select a server based on the actual application layer content, it can only establish a connection between the final server and the client (three-way handshake) before it can receive the message with the actual application layer content sent by the client, and then decide the final internal server based on the specific fields in the message and the server selection method set by the load balancing device. In this case, the load balancing device is more like a proxy server. The load balancing device and the front-end client and the back-end server will establish TCP connections respectively. So from this technical principle, the seven-layer load balancing obviously has higher requirements for the load balancing device, and the ability to handle the seven-layer will inevitably be lower than the four-layer mode deployment method. Second, the needs of application scenarios. The benefit of seven-layer application load is that it makes the entire network more "intelligent". For example, the user traffic accessing a website can be forwarded to a specific image server through the seven-layer method, and the request for the image class can be forwarded to a specific image server and the cache technology can be used; the request for the text class can be forwarded to a specific text server and the compression technology can be used. Of course, this is just a small case of seven-layer application. From a technical principle, this method can modify the client's request and the server's response in any sense, greatly improving the flexibility of the application system at the network layer. Many functions deployed in the background, such as Nginx or Apache, can be moved forward to the load balancing device, such as Header rewriting in client requests, keyword filtering or content insertion in server responses, etc. Another function that is often mentioned is security. The most common SYN Flood in the network is that hackers control many source clients and use fake IP addresses to send SYN to the same target. Usually, this threat will send a large number of SYN messages, exhausting the relevant resources on the server to achieve the purpose of Denial of Service (DoS). From the technical principle, it can be seen that in the four-layer mode, these SYNs will be forwarded to the backend server; while in the seven-layer mode, these SYNs will naturally be terminated on the load balancing device and will not affect the normal operation of the backend server. In addition, the load balancing device can set multiple policies at the seven-layer level to filter specific messages, such as SQL Injection and other specific threat means at the application level, further improving the overall security of the system from the application level. The current 7-layer load balancing mainly focuses on the application of HTTP protocol, so its application scope is mainly many websites or internal information platforms and other systems developed based on B/S. 4-layer load balancing corresponds to other TCP applications, such as ERP systems developed based on C/S. Third, issues that need to be considered for layer 7 applications. 1: Is it really necessary? The seven-layer application can indeed improve traffic intelligence, but it will inevitably bring about problems such as complex device configuration, increased load balancing pressure, and complexity in troubleshooting. When designing the system, it is necessary to consider the mixed situation of the four-layer and seven-layer applications at the same time. 2: Can it really improve security ? For example, in the case of SYN Flood attacks, the seven-layer mode does block these traffic from the server, but the load balancing device itself must have strong anti-DDoS capabilities. Otherwise, even if the server is normal, the failure of the load balancing device as the central scheduling will cause the entire application to crash. 3: Is it flexible enough? The advantage of seven-layer applications is that they can make the traffic of the entire application intelligent, but the load balancing device needs to provide a complete seven-layer function to meet the customer's application-based scheduling according to different situations. The simplest assessment is whether it can replace the scheduling function on the backend Nginx or Apache server. A load balancing device that can provide a seven-layer application development interface allows customers to set functions arbitrarily according to their needs, and it is truly possible to provide strong flexibility and intelligence. (III) Introduction to Layer 4 and Layer 7 Load Balancing: Load Balancing is built on the existing network structure. It provides a cheap, effective and transparent method to expand the bandwidth of network devices and servers, increase throughput, enhance network data processing capabilities, and improve network flexibility and availability. Load balancing has two meanings: first, a large amount of concurrent access or data traffic is shared among multiple node devices for separate processing, reducing the time users have to wait for a response; second, a single heavy-load operation is shared among multiple node devices for parallel processing. After each node device completes the processing, the results are summarized and returned to the user, greatly improving the system's processing capacity. The load balancing technology introduced in this article mainly refers to the application of traffic load balancing between all servers and applications in a server cluster. Currently, most load balancing technologies are used to improve the availability and scalability of Internet server programs such as Web servers, FTP servers, and other mission-critical servers. Load balancing technology classification There are many different load balancing technologies currently used to meet different application requirements. The following classification is based on the device objects used for load balancing, the network layer of the application (referring to the OSI reference model), and the geographical structure of the application. Software/Hardware Load Balancing The software load balancing solution refers to installing one or more additional software on the corresponding operating systems of one or more servers to achieve load balancing, such as DNS Load Balance, CheckPoint Firewall-1 ConnectControl, etc. Its advantages are that it is based on a specific environment, has simple configuration, flexible use, low cost, and can meet general load balancing needs. Software solutions also have many disadvantages, because installing additional software on each server will consume an indefinite amount of system resources. The more powerful the module, the more it consumes. Therefore, when the connection requests are particularly large, the software itself will become a key to the success or failure of the server. The software scalability is not very good and is limited by the operating system. Bugs in the operating system itself often cause security issues. The hardware load balancing solution is to install a load balancing device directly between the server and the external network. We usually call this device a load balancer. Since the dedicated device completes dedicated tasks and is independent of the operating system, the overall performance is greatly improved. Combined with diversified load balancing strategies and intelligent traffic management, the best load balancing needs can be achieved. There are many different forms of load balancers. In addition to being independent load balancers, some load balancers are integrated into switching devices and placed between the server and the Internet link. Some load balancers are integrated into the PC with two network adapters, one connected to the Internet and the other connected to the internal network of the back-end server cluster. Generally speaking, hardware load balancing is superior to software in terms of functionality and performance, but it is expensive. |
<<: Insufficient CMDB Momentum = “Failed” IT Operations?
>>: Why is millimeter wave the only way to the 5G era?
In order to do a good job in network security, SD...
【51CTO.com original article】 Table of contents 1....
[51CTO.com original article] Let me start with a ...
In response to the question of how the four major...
Recently, China Mobile and Industrial and Commerc...
Megalayer is a foreign hosting company establishe...
Tempest is a site opened by path.net in 2020. It ...
DediPath has released this month's promotion ...
[51CTO.com original article] How many "chara...
[51CTO.com original article] On December 18, 2019...
In fact, 2022 is another peak year for 5G investm...
Is there a data cable? My seat is in the first ro...
BuyVM was founded in 2010. It is a company that p...
"Connected World: Smart Homes Are Key to Fut...
Aeraki can help you manage any Layer 7 protocol i...