Hello everyone, I am Xiaolin. I received a private message from a reader saying that there was a question in the ByteDance interview: If the server hangs up, what will happen to the client's TCP connection? If "server down" means " However, if "the server is down" means "
If it is enabled, after a period of time, the client detects that the TCP connection to the server no longer exists and will disconnect its own TCP connection; If it is not enabled, the client's TCP connection will always exist and will not be disconnected. The above is a concise answer. Let’s talk about it in detail below. If the server process crashes, what happens to the client?The TCP connection information is maintained by the kernel, so when the server process crashes, the kernel needs to reclaim all TCP connection resources of the process, so the kernel will send the first wave FIN message, and the subsequent wave processes are also completed in the kernel, and do not require the participation of the process. Therefore, even if the server process exits, it can still complete the TCP four-wave process with the client. I also did an experiment myself, using the kill -9 command to simulate a process crash, and found that after killing the process, the server would send a FIN message and wave to the client four times. What happens to the client when the server host goes down?When the server host suddenly loses power, this situation is considered a server host crash. When the server host crashes, it is unable to perform four handshakes with the client. Therefore, at the moment when the server host crashes, the client cannot immediately perceive that the server host has crashed. It can only perceive that the connection to the server no longer exists in subsequent data interactions. Therefore, we will discuss two situations:
After the server host crashes, if the client sends data After the server host crashes, the client sends a data message. Since no response is received, after waiting for a certain period of time, the client triggers the timeout retransmission mechanism and retransmits the data message that did not receive a response. When the number of retransmissions reaches a certain threshold, the kernel will determine that there is a problem with the TCP connection, and then tell the application through the Socket interface that there is a problem with the TCP connection, so the client's TCP connection will be disconnected. How many times does TCP retransmit data packets? In Linux system, there is a configuration item called tcp_retries2, the default value is 15, as shown below: This kernel parameter controls the maximum number of timeout retransmissions when a TCP connection is established. However, setting tcp_retries2 to 15 times does not mean that TCP will not notify the application to terminate the TCP connection until it has timed out and retransmitted 15 times. The kernel will calculate a timeout based on the value set for tcp_retries2 (if tcp_retries2 = 15, then the calculated timeout = 924600 ms). If the retransmission interval exceeds this timeout, it is considered that the threshold has been exceeded, so retransmission will stop and the TCP connection will be disconnected. During the timeout retransmission process, the timeout period (RTO) of each round increases exponentially. For example, if the first round RTO is 200 milliseconds, the second round RTO is 400 milliseconds, the third round RTO is 800 milliseconds, and so on. RTO is calculated based on RTT (round-trip time of a packet). If the RTT is larger, the calculated RTO will be larger. After several rounds of retransmission, the above timeout value will be reached quickly. For example, if tcp_retries2 = 15, then the calculated timeout = 924600 ms. If the total retransmission interval reaches the timeout, retransmission will stop and the TCP connection will be disconnected:
The minimum RTO and maximum RTO are defined in the Linux kernel: #define TCP_RTO_MAX ( ( unsigned ) ( 120 * HZ ) ) Linux 2.6+ uses 1000 milliseconds HZ, so TCP_RTO_MIN is about 200 milliseconds and TCP_RTO_MAX is about 120 seconds. If tcp_retries is set to 15 and RTT is relatively small, then the initial RTO value is approximately equal to the lower limit of 200ms, which means that it takes 924.6 seconds to notify the upper layer (i.e., the application) of the disconnected TCP connection. The growth relationship of RTO in each round is as shown in the following table: After the server host crashes, if the client does not send data After the server host crashes, if the client does not send data, it depends on whether the TCP keepalive mechanism is enabled. If the TCP keepalive mechanism is not enabled, after the server host crashes, if the client does not send data, the client's TCP connection will remain in place. So we can see that when the TCP keepalive mechanism is not used and both parties do not transmit data, when one party's TCP connection is in the ESTABLISHED state, it does not mean that the other party's TCP connection is necessarily normal. If the TCP keepalive mechanism is enabled, after the server host sends a crash, even if the client does not send data, after a period of time, TCP will send a detection message to detect whether the server is alive:
Therefore, the TCP keepalive mechanism can determine whether the other party's TCP connection is alive through detection messages when there is no data exchange between the two parties. What exactly does the TCP keepalive mechanism do? The principle of TCP keepalive mechanism is as follows: Define a time period. During this period, if there is no connection-related activity, the TCP keep-alive mechanism will start to work. At every time interval, a probe message will be sent. The probe message contains very little data. If several consecutive probe messages are not responded to, the current TCP connection is considered to be dead, and the system kernel will notify the upper-level application of the error information. In the Linux kernel, there are corresponding parameters to set the keep-alive time, the number of keep-alive detections, and the time interval of keep-alive detections. The following are the default values: net .ipv4 .tcp_keepalive_time = 7200 The meaning of each parameter is as follows:
That is to say, in Linux system, it takes at least 2 hours, 11 minutes and 15 seconds to find a "dead" connection. Note that if an application wants to use the TCP keepalive mechanism, it needs to set the SO_KEEPALIVE option through the socket interface for it to take effect. If it is not set, the TCP keepalive mechanism cannot be used. Isn't the TCP keepalive mechanism detection time too long? Yes, it is a bit long. TCP keepalive is implemented at the TCP layer (kernel mode). It is a fallback solution for all programs based on the TCP transport protocol. In fact, our application layer can implement a detection mechanism by itself, which can detect whether the other party is alive in a relatively short time. For example, web service software generally provides a keepalive_timeout parameter to specify the timeout of HTTP persistent connections. If the timeout of HTTP persistent connections is set to 60 seconds, the web service software will start a timer. If the client does not make a new request within 60 seconds after completing the last HTTP request, the callback function will be triggered to release the connection when the timer expires. SummarizeIf "server hangs up" means "server process crashes", when the server process crashes, the kernel will send a FIN message and wave to the client four times. However, if "the server is down" means "the server host is down", then there will not be four waves. What will happen next? It depends on whether the client will send data.
If it is enabled, when the client does not interact with data for a period of time, the TCP keepalive mechanism will be triggered to detect whether the other party exists. If it is detected that the other party has disappeared, its own TCP connection will be disconnected; If it is not enabled, the client's TCP connection will always exist and remain in the ESTABLISHED state. |
<<: Ransomware cannot be prevented? "Dynamic security defense" + "key data backup"
>>: The Internet of Things in the 5G Era
The 2016 Huawei Dalian Software Development Cloud...
When the network scale of a data center becomes l...
Megalayer is a hosting provider founded in 2019 a...
ZJI was founded in 2011. It is the original well-...
HostKvm was founded in 2013 and currently provide...
edgeNAT has released a promotion for February thi...
In the early stage of 4G development, the dividen...
This is a promotional activity released by the me...
At present, 5G integrated applications are in a c...
In recent years, "5G" has been a verita...
IT Home reported on December 7 that the 2023 Worl...
As the global 5G latest version standard is locke...
This article is reprinted from the WeChat public ...
At present, it is a global consensus that the Ind...
2021 is a big year for China's 5G development...