Let's talk about the time-consuming TCP connection

Let's talk about the time-consuming TCP connection

When developing daily interfaces on the Internet backend, whether you use C, Java, PHP or Golang, you will inevitably need to call mysql, redis and other components to obtain data. You may also need to perform some RPC remote calls or call some other restful APIs. At the bottom of these calls, TCP protocol is basically used for transmission. This is because TCP protocol has the advantages of reliable connection, error retransmission, congestion control and so on in the transport layer protocol, so it is currently more widely used than UDP.

[[375616]]

I believe you must have heard that TCP also has some disadvantages, that is, the old-fashioned overhead is slightly higher. However, various technical blogs only say that the overhead is high or low, but rarely give specific quantitative analysis. To be frank, this is all nonsense with little nutrition. After thinking about my daily work, I want to figure out how much overhead is. How much time does it take to establish a TCP connection? How many milliseconds or microseconds? Can there be even a rough quantitative estimate? Of course, there are many factors that affect the time consumption of TCP, such as network packet loss, etc. Today I will only share the various situations that I encounter more frequently in my work practice.

1. Normal TCP connection establishment process

To understand how long it takes to establish a TCP connection, we need to understand the connection establishment process in detail. In the previous article "Illustrated Linux Network Packet Receiving Process", we introduced how data packets are received at the receiving end. The data packet comes out from the sender and passes through the network to the receiver's network card. After the receiver's network card DMAs the data packet to the RingBuffer, the kernel processes it through hard interrupts, soft interrupts and other mechanisms (if the data is user data, it will eventually be sent to the socket's receive queue and wake up the user process).

In the soft interrupt, when a packet is taken from the RingBuffer by the kernel, it is represented by the struct sk_buff structure in the kernel (see the kernel code include/linux/skbuff.h). The data member is the received data. When the protocol stack is processed layer by layer, the data concerned by each layer of the protocol is found by modifying the pointer to point to different locations of the data.

For TCP protocol packets, there is an important field in its Header - flags. As shown below:

By setting different marks, TCP packets are divided into types such as SYNC, FIN, ACK, and RST. The client uses the connect system call to command the kernel to send SYNC, ACK, and other packets to establish a TCP connection with the server. On the server side, many connection requests may be received, and the kernel also needs to use some auxiliary data structures - semi-connection queues and full-connection queues. Let's take a look at the entire connection process:

In this connection process, let's briefly analyze the time consumed in each step:

  • The client sends a SYNC packet: The client usually sends a SYN packet through the connect system call, which involves the CPU time consumption of the local system call and soft interrupt
  • SYN is transmitted to the server: SYN is sent from the client network card and begins to "cross mountains and seas, and through crowds of people..." This is a long-distance network transmission
  • The server processes the SYN packet: the kernel receives the packet through a soft interrupt, puts it in the semi-connection queue, and then sends a SYN/ACK response. This is another CPU time-consuming overhead.
  • SYC/ACK is transmitted to the client: After SYC/ACK is sent from the server, it also crosses many mountains and possibly many oceans to reach the client. Another long network journey
  • Client processing SYN/ACK: After the client kernel receives the packet and processes the SYN, it processes it for a few microseconds and then sends an ACK. This is also the overhead of soft interrupt processing.
  • ACK is sent to the server: Like the SYN packet, it travels almost the same distance to be transmitted again. Another long journey over the network
  • The server receives ACK: The server kernel receives and processes the ACK, then takes the corresponding connection out of the semi-connection queue and puts it into the full-connection queue.
  • Server-side user process wakeup: The user process that is blocked by the accpet system call is woken up, and then the established connection is taken out from the full connection queue. The CPU overhead of a context switch

The above steps can be simply divided into two categories:

  • The first category is the kernel consuming CPU to receive, send or process, including system calls, soft interrupts and context switches, which basically take several us.
  • The second type is network transmission. When a packet is sent from a machine, it has to pass through various network cables, switches and routers. Therefore, the time consumed by network transmission is much higher than that of the local CPU processing. It generally ranges from a few ms to hundreds of ms depending on the distance of the network.

1ms is equal to 1000us, so the network transmission time is about 1000 times higher than the CPU overhead of both ends, or even higher, up to 100,000 times. Therefore, in the normal process of establishing a TCP connection, network delay can generally be considered. One RTT refers to the round-trip delay time of a packet from one server to another. So from a global perspective, the network time required to establish a TCP connection requires about three transmissions, plus a small amount of CPU overhead on both sides, which is about a little more than 1.5 times the RTT. However, from the client's perspective, as long as the ACK packet is sent, the kernel considers that the connection is successfully established. Therefore, if the TCP connection establishment time is counted on the client, it only takes two transmission times - that is, a little more than 1 RTT. (For the server's perspective, the same is true, starting from the receipt of the SYN packet to the receipt of the ACK, there is also an RTT time in between)

2. Abnormal situations when TCP connection is established

In the previous section, you can see that from the client's perspective, under normal circumstances, the total time taken for a TCP connection is approximately the time taken for a network RTT. If everything were so simple, I think there would be no need for my sharing this time. Things are not always so beautiful, and there will always be accidents. In some cases, it may cause the network transmission time to increase during the connection, the CPU processing overhead to increase, or even the connection to fail. Now let's talk about the various bumps and bruises I encountered online.

1. The client connect system call takes too long to be controlled

Normally, a system call takes only a few microseconds. However, in the article "Tracking the murderer who drained the server CPU!", one of the author's servers encountered a situation at that time. One time, the operation and maintenance staff conveyed that the service CPU was insufficient and needed to be expanded. The server monitoring at that time was as follows:

The service had been supporting about 2,000 qps and the CPU idle rate was always 70%+. How come the CPU was suddenly insufficient? What's even more strange is that during the period when the CPU was at its lowest point, the load was not high (the server is a 4-core machine, and a load of 3-4 is normal). After investigation, it was found that when the TCP client TIME_WAIT was about 30,000, resulting in insufficient available ports, the CPU overhead of the connect system call increased by more than 100 times, and each call took 2,500 us (microseconds), reaching the millisecond level.

When encountering this problem, although the TCP connection establishment time only increases by about 2ms, the overall TCP connection time seems acceptable. However, the problem here is that these 2ms or more are consuming CPU cycles, so the problem is not small. The solution is also very simple, and there are many ways: modify the kernel parameter net.ipv4.ip_local_port_range to reserve more port numbers, or use long connections.

2. The semi-connection/full-connection queue is full

If any queue is full during the connection establishment process, the syn or ack sent by the client will be discarded. After the client waits for a long time without success, it will issue a TCP Retransmission. Take the semi-connected queue as an example:

It should be noted that the TCP handshake timeout retransmission time is in the order of seconds. That is to say, once the connection queue on the server side causes the connection to fail, it will take at least seconds to establish the connection. Normally, it takes less than 1 millisecond in the same computer room, which is about 1,000 times higher. Especially for programs that provide real-time services to users, the user experience will be greatly affected. If the handshake is not successful even after retransmission, it is likely that the user access will time out before the second retry.

There is another worse situation, which may also affect other users. If you use a process/thread pool model to provide services, such as php-fpm. We know that the fpm process is blocked. When it responds to a user request, the process cannot respond to other requests. If you have 100 processes/threads, and 50 processes/threads are stuck in the handshake connection with the redis or mysql server for a period of time (note: your server is the client side of the TCP connection at this time). During this period of time, it is equivalent to that you only have 50 processes/threads that can work normally. And these 50 workers may not be able to handle it at all, and your service may be congested. If it continues for a little longer, it may cause an avalanche, and the entire service may be affected.

Since the consequences may be so serious, how can we check whether the service at hand has a problem due to a full half/full connection queue? On the client side, you can capture packets to see if there is a SYN TCP Retransmission. If there is an occasional TCP Retransmission, it means that the corresponding server connection queue may have a problem.

On the server side, it is more convenient to check. netstat -s can view the packet loss statistics caused by the full semi-connection queue of the current system, but the number records the total number of packet losses. You need to use the watch command to monitor dynamically. If the following number changes during your monitoring, it means that the current server has packet loss caused by the full semi-connection queue. You may need to increase the length of your semi-connection queue.

  1. $ watch 'netstat -s | grep LISTEN'
  2. 8 SYNs to LISTEN sockets ignored

For the fully connected queue, the viewing method is similar.

  1. $ watch 'netstat -s | grep overflowed'
  2. 160 times the listen queue of a socket overflowed

If your service is losing packets due to a full queue, one way to do this is to increase the length of the semi-connection/full-connection queue. The semi-connection queue length in the Linux kernel is mainly affected by tcp_max_syn_backlog, so just increase it to a suitable value.

  1. # cat /proc/sys/net/ipv4/tcp_max_syn_backlog
  2. 1024
  3. # echo "2048" > /proc/sys/net/ipv4/tcp_max_syn_backlog

The total connection queue length is the smaller of the backlog passed in when the application calls listen and the kernel parameter net.core.somaxconn. You may need to adjust both your application and this kernel parameter.

  1. # cat /proc/sys/net/core/somaxconn
  2. 128
  3. # echo "256" > /proc/sys/net/core/somaxconn

After the change, we can confirm the final effective length through the Send-Q output of the ss command:

  1. $ ss -nlt
  2. Recv-Q Send-Q Local Address:Port Address:Port
  3. 0 128 *:80 *:*

Recv-Q tells us the current usage of the full connection queue of the process. If Recv-Q is close to Send-Q, you may not need to wait until the packet is lost to prepare to increase your full connection queue.

If there are still very occasional queue overflows after increasing the queue, we can tolerate it for now. What if it still takes a long time to handle? Another way is to report an error directly and don't let the client wait for a timeout. For example, set the kernel parameter tcp_abort_on_overflow of the backend interface such as Redis and Mysql to 1. If the queue is full, send a reset to the client directly. Tell the backend process/thread not to wait foolishly. At this time, the client will receive the error "connection reset by peer". It is better to sacrifice the access request of one user than to crash the entire site.

3. TCP connection time measurement

I wrote a very simple code to count how long it takes to create a TCP connection on the client side.

  1. <? php  
  2. $ ip = {server ip};
  3. $ port = {server port};
  4. $ count = 50000 ;
  5. function buildConnect($ip,$port,$num){
  6. for($ i = 0 ;$i < $num;$i++){
  7. $ socket = socket_create (AF_INET,SOCK_STREAM,SOL_TCP);
  8. if($ socket ==false) {
  9. echo "$ip $port socket_create() failed because:".socket_strerror(socket_last_error($socket))."\n";
  10. sleep(5);
  11. continue;
  12. }
  13.  
  14. if( false == socket_connect($socket, $ip, $port)){
  15. echo "$ip $port socket_connect() failed because:".socket_strerror(socket_last_error($socket))."\n";
  16. sleep(5);
  17. continue;
  18. }
  19. socket_close($socket);
  20. }
  21. }
  22.  
  23. $ t1 = microtime (true);
  24. buildConnect($ip, $port, $count);
  25. echo (($t2-$t1)*1000).'ms';

Before testing, we need to have enough ports available on the local Linux. If it is less than 50,000, it is best to adjust it to be sufficient.

  1. # echo "5000 65000" /proc/sys/net/ipv4/ip_local_port_range

1. Normal situation

Note: Do not use a machine with online services running on either the client or server side, otherwise your test may affect normal user access.

First of all, my client is located in the IDC computer room in Huailai, Hebei, and the server is a machine in the company's Guangdong computer room. The delay obtained by executing the ping command is about 37ms. After using the above script to establish 50,000 connections, the average connection time is also 37ms. This is because as we said before, from the client's point of view, as long as the third handshake packet is sent out, it is considered a successful handshake, so only one RTT and two transmission times are required. Although there will be system call overhead and soft interrupt overhead on the client and server, because their overhead is only a few us (microseconds) under normal circumstances, it has little effect on the total connection establishment delay.

Next, I changed the target server, which is located in Beijing. It is a little far from Huailai, but much closer than Guangdong. This time, the RTT of ping is about 1.6~1.7ms. After the client counts 50,000 connections, it is calculated that the time taken for each connection is 1.64ms.

Another experiment was conducted. This time, the server and client were located in the same computer room, and the ping delay was around 0.2ms~0.3ms. After running the above script, the experimental result was that 50,000 TCP connections consumed a total of 11,605ms, an average of 0.23ms each time.

Online architecture tips: Here we see that the latency in the same data center is only a few tenths of a millisecond, but when crossing a data center not far away, the TCP handshake time alone increases by 4 times. If you cross the region to Guangdong, the time difference is a hundred times. When deploying online, the ideal solution is to deploy various MySQL, redis and other services that your service depends on in the same region and the same data center (if you are a little more abnormal, you can even deploy them in the same rack). Because this way, various network packet transmissions including TCP link establishment will be much faster. Try to avoid long-distance calls across regional data centers.

2. Connection queue overflow

After testing across regions, across computer rooms, and across machines. This time, for speed, what will happen if we directly establish a connection with the local machine? The delay of pinging the local IP or 127.0.0.1 is about 0.02ms, and the local IP must have a shorter RTT than other machines. I think the connection will definitely be very fast, so let's experiment. Continuously establishing 5W TCP connections takes a total of 27154ms, and an average of about 0.54ms each time. Hmm!? How can it be much longer than cross-machine? With the previous theoretical foundation, we should have thought that because the local RTT is too short, the number of instantaneous connection establishment requests is very large, which will cause the full connection queue or half-connection queue to be full. Once the queue is full, the connection request that hits it at that time will require a connection establishment delay of more than 3 seconds. Therefore, in the above experimental results, the average time consumption seems to be much higher than the RTT.

During the experiment, I used tcpdump to capture packets and saw the following scene: It turned out that a small number of handshakes took more than 3 seconds because the semi-connection queue was full, causing the client to retransmit the SYN after waiting for a timeout.

We changed it to sleep for 1 second for every 500 connections. OK, it finally stopped lagging (or we can increase the connection queue length). The conclusion is that the total time taken for the 50,000 TCP connections on the local machine is 102,399 ms on the client side. After deducting the 100 seconds of sleep, each TCP connection takes 0.048 ms on average. It is slightly higher than the ping delay. This is because when the RTT becomes small enough, the kernel CPU time overhead will become apparent. In addition, the TCP connection is more complicated than the icmp protocol of ping, so it is normal for the delay to be slightly higher than ping by about 0.02 ms.

IV. Conclusion

Under abnormal circumstances, it may take several seconds to establish a TCP connection. One disadvantage is that it will affect the user experience and may even cause the current user to time out. Another disadvantage is that it may induce an avalanche. So when your server uses short connections to access data, you must learn to monitor whether your server's connection establishment is abnormal. If so, learn to optimize it. Of course, you can also use local memory caching or use a connection pool to maintain a long connection. These two methods can directly avoid the various overheads of TCP handshakes.

In normal circumstances, the delay of TCP establishment is about the RTT time between two machines, which is unavoidable. However, you can control the physical distance between the two machines to reduce this RTT. For example, deploy the redis you want to access as close to the backend interface machine as possible, so that the RTT can be reduced from tens of milliseconds to the lowest possible 0.1 milliseconds.

Finally, let's think about it again. If we deploy the server in Beijing, is it feasible for users in New York to access it? As mentioned above, whether we are in the same computer room or across computer rooms, the time consumed by the transmission of electrical signals can be basically ignored (because the physical distance is very close), and the network delay is basically the time consumed by the forwarding equipment. But if it is across half the world, we have to calculate the time consumed by the transmission of electrical signals. The spherical distance from Beijing to New York is about 15,000 kilometers. So, leaving aside the delay of equipment forwarding, just the time required for the light speed to propagate back and forth (RTT is round trip time, which needs to run twice) = 15,000,000 *2 / light speed = 100ms. The actual delay may be even greater than this, generally more than 200ms. Based on this delay, it is very difficult to provide users with second-level services that can be accessed. Therefore, for overseas users, it is best to build a computer room locally or purchase overseas servers.

<<:  How NFV systems converge virtual network services at the edge

>>:  The development of 5G will open up a new track for the Internet of Things

Recommend

MiWi protocol, a network protocol suitable for small IoT

There are many ways to achieve short-distance com...

AI and 5G synergy: Unleashing the full potential of the digital age

In the evolving technology landscape, two breakth...

"Interview Eight-part Essay" Network Volume 19

[[422375]] 1.How many layers does the TCP/IP netw...

Game changers for the branch office: Wi-Fi 6, 4G, 5G and SD-WAN

Today, the use of cloud computing services contin...

The role of gateways in computer networks

A gateway is a computer on a network that provide...

The turning point has arrived: NB-IoT industry is accelerating its release!

The birth of any new technology is always met wit...

What you need to know about cyber threats in your data center

Cyber ​​threats are an unfortunate reality for da...

Ethernet IP: Unlocking the power of high-speed data transmission

Ethernet IP has revolutionized the world of netwo...

IPv4 scarcity threatens Internet development

RIPENCC, the regional internet registry for Europ...

Five steps organizations can take to maximize data center performance

When many organizations list the trends and issue...

China Unicom begins deploying 2G network and stops all services

2G outdated communication technology will inevita...