Soul-searching question for TCP: Are you going to surrender?

Soul-searching question for TCP: Are you going to surrender?


TCP three-way handshake packet loss

What happens if the first handshake is lost?

When the client wants to establish a TCP connection with the server, the first thing it sends is a SYN message, and then enters the SYN_SENT state.

After that, if the client fails to receive the SYN-ACK message from the server (the second handshake), the "timeout retransmission" mechanism will be triggered, and the SYN message will be retransmitted, and the sequence number of the retransmitted SYN message will be the same.

Different versions of operating systems may have different timeout periods, some are 1 second, and some are 3 seconds. This timeout period is hard-coded in the kernel. If you want to change it, you need to recompile the kernel, which is troublesome.

When the client does not receive the SYN-ACK message from the server after 1 second, the client will resend the SYN message. How many times should it resend?

In Linux, the maximum number of retransmissions of the client's SYN message is controlled by the tcp_syn_retries kernel parameter. This parameter is customizable and the default value is generally 5.

 # cat / proc / sys / net / ipv4 / tcp_syn_retries
5

Usually, the first timeout retransmission is after 1 second, the second timeout retransmission is after 2 seconds, the third timeout retransmission is after 4 seconds, the fourth timeout retransmission is after 8 seconds, and the fifth timeout retransmission is after 16 seconds. That's right, each timeout is twice as long as the previous one.

After the fifth timeout retransmission, the client will continue to wait for 32 seconds. If the server still does not respond with an ACK, the client will stop sending SYN packets and disconnect the TCP connection.

Therefore, the total time is 1+2+4+8+16+32=63 seconds, about 1 minute.

For example, assuming that the tcp_syn_retries parameter value is 3, when the client's SYN message is always lost in the network, the following process will occur:

Specific process:

  • When the client retransmits the SYN message three times after the timeout, since tcp_syn_retries is 3, the maximum number of retransmissions has been reached, so it waits for a while (twice the last timeout period). If it still fails to receive the second handshake (SYN-ACK message) from the server, the client will disconnect.

What happens if the second handshake is lost?

When the server receives the first handshake from the client, it will return a SYN-ACK message to the client. This is the second handshake. At this time, the server will enter the SYN_RCVD state.

The SYN-ACK message of the second handshake actually has two purposes:

  • The ACK in the second handshake is a confirmation message for the first handshake;
  • The SYN in the second handshake is the message initiated by the server to establish a TCP connection;

So, if the second handshake is lost, something interesting will happen. What exactly will happen?

Because the second handshake message contains the ACK confirmation message of the first handshake to the client, if the client does not receive the second handshake for a long time, then the client may think that its SYN message (first handshake) is lost, so the client will trigger the timeout retransmission mechanism and retransmit the SYN message.

Then, because the second handshake contains the server's SYN message, when the client receives it, it needs to send an ACK confirmation message to the server (the third handshake), and the server will consider that the SYN message has been received by the client.

Then, if the second handshake is lost, the server will not receive the third handshake, so the server will trigger the timeout retransmission mechanism and retransmit the SYN-ACK message.

Under Linux, the maximum number of retransmissions of a SYN-ACK packet is determined by the tcp_synack_retries kernel parameter, and the default value is 5.

 # cat / proc / sys / net / ipv4 / tcp_synack_retries
5

Therefore, when the second handshake is lost, both the client and the server will retransmit:

  • The client will retransmit the SYN message, which is the first handshake. The maximum number of retransmissions is determined by the tcp_syn_retries kernel parameter.
  • The server will retransmit the SYN-ACK message, which is the second handshake. The maximum number of retransmissions is determined by the tcp_synack_retries kernel parameter.

For example, assuming that the tcp_syn_retries parameter value is 1 and the tcp_synack_retries parameter value is 2, then when the second handshake is always lost, the process that occurs is as follows:

Specific process:

  • When the client retransmits the SYN message once after a timeout, since tcp_syn_retries is 1, the maximum number of retransmissions has been reached, so it waits for a while (twice the last timeout period). If it still fails to receive the second handshake (SYN-ACK message) from the server, the client will disconnect.
  • When the server retransmits the SYN-ACK message twice after the timeout, since tcp_synack_retries is 2, the maximum number of retransmissions has been reached, so it waits for a while (twice the last timeout period). If it still fails to receive the third handshake (ACK message) from the client, the server will disconnect.

What happens if the third handshake is lost?

After the client receives the SYN-ACK message from the server, it will return an ACK message to the server, which is the third handshake. At this time, the client state enters the ESTABLISH state.

Because the ACK of the third handshake is a confirmation message for the SYN of the second handshake, when the third handshake is lost, if the server party is slow to receive the confirmation message, it will trigger the timeout retransmission mechanism and retransmit the SYN-ACK message until the third handshake is received or the maximum number of retransmissions is reached.

Note that ACK messages will not be retransmitted. When ACK is lost, the other party will retransmit the corresponding message.

For example, assuming that the tcp_synack_retries parameter value is 2, then when the third handshake is lost, the process that occurs is as follows:

Specific process:

  • When the server retransmits the SYN-ACK message twice after the timeout, since tcp_synack_retries is 2, the maximum number of retransmissions has been reached, so it waits for a while (twice the last timeout period). If it still fails to receive the third handshake (ACK message) from the client, the server will disconnect.

TCP four-wave packet loss

What happens if the first wave is lost?

When the client (the active closing party) calls the close function, it sends a FIN message to the server, trying to disconnect from the server. At this time, the client connection enters the FIN_WAIT_1 state.

Under normal circumstances, if the ACK from the server (passive closing party) is received in time, the state will quickly change to FIN_WAIT2.

If the first wave is lost, and the client fails to receive the ACK from the passive party, the timeout retransmission mechanism will be triggered, and the FIN message will be retransmitted. The number of retransmissions is controlled by the tcp_orphan_retries parameter.

When the client retransmits the FIN message more than tcp_orphan_retries​ times, it will no longer send FIN messages and will wait for a period of time (twice the last timeout period). If it still fails to receive the second wave, it will directly enter the close state.

For example, assuming the tcp_orphan_retries parameter value is 3, when the first wave is always lost, the process that occurs is as follows:

Specific process:

When the client retransmits the FIN message three times after the timeout, since tcp_orphan_retries is 3, the maximum number of retransmissions has been reached, so it waits for a while (twice the last timeout period). If it still fails to receive the second wave (ACK message) from the server, the client will disconnect.

What happens if the second wave is lost?

When the server receives the first wave from the client, it will first send an ACK confirmation message, and the server's connection will enter the CLOSE_WAIT state.

As we mentioned earlier, ACK messages will not be retransmitted, so if the server's second wave is lost, the client will trigger the timeout retransmission mechanism and retransmit the FIN message until it receives the server's second wave or reaches the maximum number of retransmissions.

For example, assuming the tcp_orphan_retries parameter value is 2, when the second wave is always lost, the process that occurs is as follows:

Specific process:

  • When the client retransmits the FIN message twice after the timeout, because tcp_orphan_retries is 2, the maximum number of retransmissions has been reached, so it waits for a while (twice the last timeout period). If it still fails to receive the second wave (ACK message) from the server, the client will disconnect.

It should be mentioned here that when the client receives the second wave, that is, after receiving the ACK message sent by the server, the client will be in the FIN_WAIT2 state. In this state, it needs to wait for the server to send the third wave, that is, the FIN message from the server.

For the connection closed by the close function, since data can no longer be sent and received, the FIN_WAIT2​ state cannot last too long. The tcp_fin_timeout controls how long the connection lasts in this state. The default value is 60 seconds.

This means that for a connection closed by calling close, if no FIN message is received after 60 seconds, the client (the active closing party) will directly close the connection, as shown in the following figure:

But please note that if the active closing party uses the shutdown function to close the connection and specifies to close only the sending direction but not the receiving direction, it means that the active closing party can still receive data.

At this time, if the active closing party has not received the third wave, the active closing party's connection will remain in the FIN_WAIT2​ state (tcp_fin_timeout cannot control the connection closed by shutdown). As shown in the following figure:

What happens if the third wave is lost?

When the server (passive closing party) receives the FIN message from the client (active closing party), the kernel will automatically reply with ACK, and the connection is in the CLOSE_WAIT state. As the name suggests, it means waiting for the application process to call the close function to close the connection.

At this time, the kernel has no right to close the connection on behalf of the process. The process must actively call the close function to trigger the server to send a FIN message.

When the server is in the CLOSE_WAIT state and the close function is called, the kernel will send a FIN message and the connection will enter the LAST_ACK state, waiting for the client to return ACK to confirm the connection is closed.

If the ACK is not received for a long time, the server will resend the FIN message. The number of retransmissions is still controlled by the tcp_orphan_retries parameter, which is the same as the way the client resends the FIN message.

For example, assuming tcp_orphan_retries = 3, when the third wave is lost, the following process occurs:

Specific process:

  • When the server retransmits the third wave message three times, since tcp_orphan_retries is 3, the maximum number of retransmissions has been reached, so it waits for a while (twice the last timeout period). If it still fails to receive the fourth wave (ACK message) from the client, the server will disconnect.
  • Because the client closes the connection through the close function, there is a time limit for the client to be in the FIN_WAIT_2 state. If the client still fails to receive the third wave (FIN message) from the server within the tcp_fin_timeout time, the client will be disconnected.

What happens if the fourth wave is lost?

When the client receives the FIN message of the third wave from the server, it will return an ACK message, which is the fourth wave. At this time, the client connection enters the TIME_WAIT state.

In Linux systems, the TIME_WAIT state will last for 2MSL before entering the closed state.

Then, the server (passive closing party) remains in the LAST_ACK state until it receives the ACK message.

If the ACK message of the fourth wave does not reach the server, the server will resend the FIN message. The number of retries is still controlled by the tcp_orphan_retries parameter introduced earlier.

For example, assuming tcp_orphan_retries is 2, when the fourth wave is always lost, the following process occurs:

Specific process:

  • When the server retransmits the third wave message 2 times, since tcp_orphan_retries is 2, the maximum number of retransmissions is reached, so it waits for a while (twice the last timeout period). If it still fails to receive the fourth wave (ACK message) from the client, the server will disconnect.
  • After receiving the third wave, the client enters the TIME_WAIT state and starts a timer of 2MSL. If the client receives the third wave (FIN message) again during the process, the timer will be reset. After waiting for 2MSL, the client will disconnect.

<<:  In the wave of digital transformation, how should enterprise IT architecture change according to needs?

>>:  In order to improve network reliability, do you know how hard we work on protecting the OTN optical layer?

Recommend

DesiVPS: $3/month KVM-2GB/20GB/2.5TB/Los Angeles Data Center

According to information from LEB, DesiVPS is a f...

Seamless mobile connectivity is key to digitalization in healthcare

[[373455]] The widespread problem of unreliable c...

Is SDN going to die? See what everyone is saying!

With the advent of network automation, programmab...

Make the network run autonomously like a driverless car

[51CTO.com original article] Juniper Networks Glo...

How many HTTP requests can you guess on a TCP connection?

A classic interview question is what happens from...

Learn VLAN division from scratch to double your network performance!

When it comes to network security and performance...