Byte One: The server is down, is the client's TCP connection still there?

Byte One: The server is down, is the client's TCP connection still there?

Hello everyone, I am Xiaolin.

I received a private message from a reader saying that there was a question in the ByteDance interview: If the server hangs up, what will happen to the client's TCP connection?

If "server down" means "

However, if "the server is down" means "

  • If the client sends data, since the server no longer exists, the client's data message will time out and be retransmitted. When the number of retransmissions reaches a certain threshold, the TCP connection will be disconnected;
  • If the client does not send data, check whether the client has enabled the TCP keepalive mechanism.

If it is enabled, after a period of time, the client detects that the TCP connection to the server no longer exists and will disconnect its own TCP connection;

If it is not enabled, the client's TCP connection will always exist and will not be disconnected.

The above is a concise answer. Let’s talk about it in detail below.

If the server process crashes, what happens to the client?

The TCP connection information is maintained by the kernel, so when the server process crashes, the kernel needs to reclaim all TCP connection resources of the process, so the kernel will send the first wave FIN message, and the subsequent wave processes are also completed in the kernel, and do not require the participation of the process. Therefore, even if the server process exits, it can still complete the TCP four-wave process with the client.

I also did an experiment myself, using the kill -9 command to simulate a process crash, and found that after killing the process, the server would send a FIN message and wave to the client four times.

What happens to the client when the server host goes down?

When the server host suddenly loses power, this situation is considered a server host crash.

When the server host crashes, it is unable to perform four handshakes with the client. Therefore, at the moment when the server host crashes, the client cannot immediately perceive that the server host has crashed. It can only perceive that the connection to the server no longer exists in subsequent data interactions.

Therefore, we will discuss two situations:

  • After the server host goes down, the client will send data;
  • After the server host crashes, the client will not send data;

After the server host crashes, if the client sends data

After the server host crashes, the client sends a data message. Since no response is received, after waiting for a certain period of time, the client triggers the timeout retransmission mechanism and retransmits the data message that did not receive a response.

When the number of retransmissions reaches a certain threshold, the kernel will determine that there is a problem with the TCP connection, and then tell the application through the Socket interface that there is a problem with the TCP connection, so the client's TCP connection will be disconnected.

How many times does TCP retransmit data packets?

In Linux system, there is a configuration item called tcp_retries2, the default value is 15, as shown below:

This kernel parameter controls the maximum number of timeout retransmissions when a TCP connection is established.

However, setting tcp_retries2 to 15 times does not mean that TCP will not notify the application to terminate the TCP connection until it has timed out and retransmitted 15 times. The kernel will calculate a timeout based on the value set for tcp_retries2 (if tcp_retries2 = 15, then the calculated timeout = 924600 ms). If the retransmission interval exceeds this timeout, it is considered that the threshold has been exceeded, so retransmission will stop and the TCP connection will be disconnected.

During the timeout retransmission process, the timeout period (RTO) of each round increases exponentially. For example, if the first round RTO is 200 milliseconds, the second round RTO is 400 milliseconds, the third round RTO is 800 milliseconds, and so on.

RTO is calculated based on RTT (round-trip time of a packet). If the RTT is larger, the calculated RTO will be larger. After several rounds of retransmission, the above timeout value will be reached quickly.

For example, if tcp_retries2 = 15, then the calculated timeout = 924600 ms. If the total retransmission interval reaches the timeout, retransmission will stop and the TCP connection will be disconnected:

  • If the RTT is relatively small, the initial RTO value is approximately equal to the lower limit of 200ms, that is, the timeout of the first round is 200 milliseconds. Since the total timeout duration is 924600 ms, the phenomenon manifested is that the message is retransmitted 15 times, exceeding the timeout value, thus disconnecting the TCP connection.
  • If the RTT is relatively large, assuming that the initial RTO value is calculated to be 1000 ms, that is, the timeout of the first round is 1 second, then there is no need to retransmit 15 times, and the total retransmission interval will exceed 924600 ms.

The minimum RTO and maximum RTO are defined in the Linux kernel:

 #define TCP_RTO_MAX ( ( unsigned ) ( 120 * HZ ) )
#define TCP_RTO_MIN ( ( unsigned ) ( HZ / 5 ) )

Linux 2.6+ uses 1000 milliseconds HZ, so TCP_RTO_MIN is about 200 milliseconds and TCP_RTO_MAX is about 120 seconds.

If tcp_retries is set to 15 and RTT is relatively small, then the initial RTO value is approximately equal to the lower limit of 200ms, which means that it takes 924.6 seconds to notify the upper layer (i.e., the application) of the disconnected TCP connection. The growth relationship of RTO in each round is as shown in the following table:

After the server host crashes, if the client does not send data

After the server host crashes, if the client does not send data, it depends on whether the TCP keepalive mechanism is enabled.

If the TCP keepalive mechanism is not enabled, after the server host crashes, if the client does not send data, the client's TCP connection will remain in place. So we can see that when the TCP keepalive mechanism is not used and both parties do not transmit data, when one party's TCP connection is in the ESTABLISHED state, it does not mean that the other party's TCP connection is necessarily normal.

If the TCP keepalive mechanism is enabled, after the server host sends a crash, even if the client does not send data, after a period of time, TCP will send a detection message to detect whether the server is alive:

  • If the peer is working normally, when the TCP keep-alive detection message is sent to the peer, the peer will respond normally, so the TCP keep-alive time will be reset and wait for the next TCP keep-alive time to arrive.
  • If the peer host crashes, or the peer is unreachable due to other reasons, when the TCP keepalive detection message is sent to the peer, but there is no response, after several consecutive keepalive detection times, TCP will report that the TCP connection has died.

Therefore, the TCP keepalive mechanism can determine whether the other party's TCP connection is alive through detection messages when there is no data exchange between the two parties.

What exactly does the TCP keepalive mechanism do?

The principle of TCP keepalive mechanism is as follows:

Define a time period. During this period, if there is no connection-related activity, the TCP keep-alive mechanism will start to work. At every time interval, a probe message will be sent. The probe message contains very little data. If several consecutive probe messages are not responded to, the current TCP connection is considered to be dead, and the system kernel will notify the upper-level application of the error information.

In the Linux kernel, there are corresponding parameters to set the keep-alive time, the number of keep-alive detections, and the time interval of keep-alive detections. The following are the default values:

 net .ipv4 .tcp_keepalive_time = 7200
net .ipv4 .tcp_keepalive_intvl = 75
net .ipv4 .tcp_keepalive_probes = 9

The meaning of each parameter is as follows:

  • tcp_keepalive_time=7200: indicates that the keepalive time is 7200 seconds (2 hours), that is, if there is no connection-related activity within 2 hours, the keepalive mechanism will be activated
  • tcp_keepalive_intvl=75: means each detection interval is 75 seconds;
  • tcp_keepalive_probes=9: means that if there is no response after 9 detections, the other party is considered unreachable and the connection is terminated.

That is to say, in Linux system, it takes at least 2 hours, 11 minutes and 15 seconds to find a "dead" connection.

Note that if an application wants to use the TCP keepalive mechanism, it needs to set the SO_KEEPALIVE option through the socket interface for it to take effect. If it is not set, the TCP keepalive mechanism cannot be used.

Isn't the TCP keepalive mechanism detection time too long?

Yes, it is a bit long.

TCP keepalive is implemented at the TCP layer (kernel mode). It is a fallback solution for all programs based on the TCP transport protocol.

In fact, our application layer can implement a detection mechanism by itself, which can detect whether the other party is alive in a relatively short time.

For example, web service software generally provides a keepalive_timeout parameter to specify the timeout of HTTP persistent connections. If the timeout of HTTP persistent connections is set to 60 seconds, the web service software will start a timer. If the client does not make a new request within 60 seconds after completing the last HTTP request, the callback function will be triggered to release the connection when the timer expires.

Summarize

If "server hangs up" means "server process crashes", when the server process crashes, the kernel will send a FIN message and wave to the client four times.

However, if "the server is down" means "the server host is down", then there will not be four waves. What will happen next? It depends on whether the client will send data.

  • If the client sends data, since the server no longer exists, the client's data packet will time out and be retransmitted. When the total retransmission interval reaches a certain threshold (the kernel will calculate a threshold based on the value set by tcp_retries2), the TCP connection will be disconnected;
  • If the client does not send data, check whether the client has enabled the TCP keepalive mechanism.

If it is enabled, when the client does not interact with data for a period of time, the TCP keepalive mechanism will be triggered to detect whether the other party exists. If it is detected that the other party has disappeared, its own TCP connection will be disconnected;

If it is not enabled, the client's TCP connection will always exist and remain in the ESTABLISHED state.

<<:  Ransomware cannot be prevented? "Dynamic security defense" + "key data backup"

>>:  The Internet of Things in the 5G Era

Recommend

How to quickly troubleshoot data center networks

When the network scale of a data center becomes l...

HostKvm Newly Offers 30% Off Los Angeles CN2 Line VPS, 20% Off All Sitewide

HostKvm was founded in 2013 and currently provide...

Operators are making full use of 4G, so what about 5G?

In the early stage of 4G development, the dividen...

It’s time to promote 5G applications

At present, 5G integrated applications are in a c...

The three major operators unveiled their latest 5G strategies

As the global 5G latest version standard is locke...

What do you think the sequence number of a TCP reset message is?

This article is reprinted from the WeChat public ...