Please look at this case first: For a certain key value application, the entire system resource consumption is as follows: It can be seen that Sockets interface + TCP is the system bottleneck. According to the model in the figure below, the bottleneck lies in TCP (including the sockets interface). To improve system throughput, TCP must be optimized. Due to the existence of network delays, what has a greater impact on user experience is how to quickly transmit data to the client, which falls into the category of traffic optimization. This article describes how to optimize TCP performance and TCP data delivery. 1. What is TCP Acceleration? Quoting the definition from Baidu Encyclopedia: The mainstream TCP acceleration methods include: traffic-based accelerated delivery and packet processing performance optimization. The flow-based approach mainly achieves the purpose of fast data packet delivery by modifying the congestion control algorithm. Packet processing performance optimization includes kernel optimization, TCP offload, and user-mode TCP protocol-based methods. These methods can be used to optimize packet processing and thus improve system throughput. 2. Traffic-based TCP acceleration (1) TCP bilateral acceleration TCP bilateral acceleration requires the deployment of hardware devices or the installation of software at both ends of the TCP connection. The advantage of bilateral acceleration is that it can use compression and other technologies to further improve TCP transmission efficiency, but the disadvantage is that it is difficult to deploy. Bilateral acceleration is generally used for long-distance access between different branches of a company. The following figure is an example of bilateral acceleration. The TCP acceleration devices interact with each other using the SCTP protocol, while the original TCP peer and the TCP acceleration device interact with each other using the regular TCP protocol. This transparent proxy method facilitates the use of special methods to accelerate TCP acceleration devices. (2) Unilateral acceleration TCP unilateral acceleration only requires the deployment of software or equipment on one end of TCP to achieve the purpose of improving TCP's rapid data transmission. Most TCP unilateral acceleration is achieved by modifying TCP's congestion control algorithm. The following figure shows the unilateral acceleration of a commercial product. The data packets are sent violently without a slow start process. This method of sending data packets regardless of network conditions can indeed improve performance in most scenarios, but this performance improvement method actually occupies Internet bandwidth resources, just like using the emergency lane on a highway. Google's congestion control algorithm BBR, which has emerged in recent years, can be seen as a type of unilateral acceleration. The above figure shows that compared with the traditional congestion control algorithm CUBIC, the BBR algorithm still performs well in the case of network packet loss. The reason is that the BBR algorithm abandons packet loss as a direct feedback factor for congestion control and determines the sending rate and window size by calculating the bandwidth and minimum RTT in real time. In mobile applications, most network packet losses are not caused by router network congestion. Therefore, the BBR algorithm has better adaptability in mobile scenarios. In Linux kernel version 4.9 and above (excluding Docker environment), using the BBR algorithm generally only requires adding the following two sentences to the sysctl.conf file: Then execute sysctl -p to make it take effect. The advantage of TCP unilateral acceleration is that it only needs to be deployed on one side. The disadvantage is that functions such as compression cannot be directly utilized, and it will largely undermine the fairness of the Internet. 3. Kernel optimization (1) Pure kernel optimization According to Wikipedia: We learned that the kernel needs to be optimized according to the actual scenario and cannot be generalized. Reasonable optimization can improve performance a lot (sometimes 10 times), but if it is blindly optimized, the performance will decrease. In today's kernels, in most scenarios, the TCP buffer can automatically adjust the buffer size to match the transmission, so what needs to be optimized in the kernel is to choose a suitable congestion control algorithm. The following is a comparison of the Linux default algorithm CUBIC and the BBR algorithm in the case of packet loss: In the case of packet loss, BBR is not as affected as the Linux TCP default algorithm CUBIC, and its performance far exceeds CUBIC when the packet loss rate is below 20%. It is generally recommended to use the BBR algorithm in situations where packet loss is not caused by network congestion, such as mobile applications. For application scenarios with large bandwidth and long RTT time, you can refer to this. (2) Dedicated fast path Since the kernel processes TCP in a universal way, the execution path is different in different scenarios. Special optimization for certain scenarios can greatly improve the processing performance of TCP. For this aspect, you can refer to the paper "TAS: TCP Accleration as an OS Service". 4. TCP offload The kernel consumes performance in four main areas: Currently, the commonly used functions of TCP offload are TCP/IP checksum offload and TCP segmentation offload, which are mainly used to solve the second problem mentioned above. Other issues require more powerful network card support. For this, please refer to "https://www.dell.com/downloads/global/power/1q04-her.pdf". The "TCP Offload Engines" paper describes the memory copying of network applications. Please refer to the figure below for details. Whether it is the sending path or the receiving path, there are a lot of memory copies in the system, which directly affects the performance of the TCP/IP protocol stack. The TCP offload function can be used to allow the NIC to directly copy to the application buffer, reducing the number of copies and thus improving performance. TCP offload may be a future trend. It is easy to deploy and has low cost, and is a direction that deserves attention. 5. User Mode TCP Protocol User-mode protocol stacks are useful in the following situations: For whether to use the user-mode TCP protocol stack, please refer to "https://blog.cloudflare.com/why-we-use-the-linux-kernels-tcp-stack/". 6. Conclusion TCP acceleration is a very large field. If applied well, it can greatly improve the performance of the program. Therefore, do not ignore this aspect when optimizing performance. |
<<: Can SD-WAN trigger a comprehensive telecom NFV transformation?
>>: Comprehensive understanding of TCP/IP knowledge system structure summary
After completing the C2 round of financing in Sep...
June 21 news, according to foreign media reports,...
The number of online 5G users has exceeded 100 mi...
Last year, the blog shared the news of Justg'...
[[416112]] This article is reprinted from the WeC...
When it comes to TCP connection establishment and...
Is there a data cable? My seat is in the first ro...
Friendhosting has launched a promotion for "...
On October 26, foreign media reported that accord...
Recently, several domestic mobile communication o...
[51CTO.com original article] On July 21-22, 2017,...
During the Dragon Boat Festival holiday, there ar...
With the rapid development of cloud computing, cl...
With the commercialization approaching, the topic...
My recent work is related to network protocols, w...