4000 words on TCP timeout and retransmission. If you don't learn anything from reading this, I'll lose.

4000 words on TCP timeout and retransmission. If you don't learn anything from reading this, I'll lose.

The previous article introducing TCP, "TCP three-way handshake, four waves and some details", received good feedback and I am quite happy about it. This time I will continue to talk about the part about timeout and retransmission.

We all know that the TCP protocol has a retransmission mechanism, that is, if the sender believes that packet loss has occurred, it will resend the data packets. Obviously, we need a way to "guess" whether packet loss has occurred. The simplest idea is that the receiver returns an ACK to the sender every time it receives a packet, indicating that it has received the data. Conversely, if the sender does not receive an ACK within a period of time, it knows that the data packet is likely lost, and then resends the data packet until it receives an ACK.

You may notice that I used the word "guess" because even if it times out, the data packet may not be lost, it just took a long detour and arrived late. After all, the TCP protocol is a transport layer protocol, and it is impossible to know exactly what happened at the data link layer and the physical layer. But this does not hinder our timeout retransmission mechanism, because the receiver will automatically ignore duplicate packets.

The concepts of timeout and retransmission are actually so simple, but there are many internal details. The first question that comes to our mind is, how long does it take to be considered a timeout?

[[311799]]

1. How is the timeout determined?

A one-size-fits-all approach is to set the timeout to a fixed value, such as 200ms, but this is definitely problematic. Our computers interact with many servers, which are located all over the world, both at home and abroad, and the delays vary greatly. For example:

  • My personal blog is located in China, with a delay of about 30ms. That is to say, under normal circumstances, the ACK of a data packet can be received in about 60ms. However, according to our method, it takes 200ms to determine packet loss (normally it may be 90 to 120 ms), which is really inefficient.
  • Suppose you visit a foreign website and the delay is 130 ms. This is troublesome. Normal data packets may be considered timed out, resulting in a large number of data packets being resent. As you can imagine, the resent data packets can also be easily misjudged as timed out. The feeling of the avalanche effect

Therefore, setting a fixed value is very unreliable. We need to dynamically adjust the timeout period according to network delay. The greater the delay, the longer the timeout period.

Here we introduce two concepts:

  • RTT (Round Trip Time): Round trip delay, that is, the time from the data packet being sent to the corresponding ACK being received. RTT is for the connection, and each connection has its own independent RTT.
  • RTO (Retransmission Time Out): Retransmission timeout, which is the timeout mentioned above.

A relatively standard RTT definition:

Measure the elapsed time between sending a data octet with a particular sequence number and receiving an acknowledgment that covers that sequence number (segments sent do not have to match segments received). This measured elapsed time is the Round Trip Time (RTT).

1. Classical method

The original specification "RFC0793" uses the following formula to obtain a smoothed RTT estimate (called SRTT):

  1. SRTT < - α·SRTT + (1 - α)·RTT

RTT refers to the latest sample value. This estimation method is called "exponentially weighted moving average". The name sounds high-sounding, but the whole formula is relatively easy to understand. It is to use the existing SRTT value and the latest measured RTT value to take a weighted average.

With SRTT, it is time to set the corresponding RTO value. "RFC0793" calculates it like this:

  1. RTO = min (ubound, max(lbound, (SRTT)·β))

Here, ubound is the upper bound of RTO, lbound is the lower bound of RTO, and β is called the delay dispersion factor, with a recommended value of 1.3 to 2.0. This calculation formula uses the value of (SRTT)·β as RTO, but it also limits the upper and lower limits of RTO.

This calculation method seems to be fine at first glance (at least that's how I feel), but in actual application, there are two flaws:

There were two known problems with the RTO calculations specified in RFC-793. First, the accurate measurement of RTTs is difficult when there are retransmissions. Second, the algorithm to compute the smoothed round-trip time is inadequate [TCP:7], because it incorrectly assumed that the variance in RTT values ​​would be small and constant. These problems were solved by Karn's and Jacobson's algorithm, respectively.

This passage is taken from "RFC1122", let me explain:

When packet retransmission occurs, RTT calculation becomes "troublesome". I drew a picture to illustrate these situations:

The figure shows two cases. The methods of calculating RTT in these two cases are different (this is the so-called retransmission ambiguity):

However, the client does not know which case has occurred. If the client chooses the wrong case, the RTT will be too large or too small, which will affect the calculation of RTO. (The simplest and crudest solution is to ignore the retransmitted data packets and only calculate those that have not been retransmitted, but this will cause other problems. For details, see Karn's algorithm)

  • Case 1: RTT = t2 - t0
  • Case 2: RTT = t2 - t1

Another problem is that this algorithm assumes that RTT fluctuations are relatively small, because this weighted average algorithm is also called a low-pass filter, which is not sensitive to sudden network fluctuations. If the network delay suddenly increases and the actual RTT value is much larger than the estimated value, it will cause unnecessary retransmissions and increase the network burden. (The increase in RTT already indicates that the network is overloaded, and these unnecessary retransmissions will further increase the network burden).

2. Standard Methods

To be honest, this standard method is quite... cumbersome, so I will just post the formula:

  • SRTT <- (1 - α)·SRTT + α·RTT //Same as the basic method, calculate the weighted average of SRTT
  • rttvar <- (1 - h) · rttvar + h · (|RTT - SRTT |) // Calculate the difference between SRTT and the true value (called absolute error |Err|), and also use weighted average
  • RTO = SRTT + 4 rttvar // The estimated new RTO, the coefficient 4 of rttvar is adjusted

The overall idea of ​​this algorithm is to combine the average value (which is the basic method) and the average deviation to make an estimate, and a wave of metaphysical parameter adjustment can achieve good results. If you want to learn more about this algorithm, refer to "RFC6298".

2. Retransmission - Important Events of TCP

1. Timer-based retransmission

Under this mechanism, each data packet has a corresponding timer. Once the RTO is exceeded and no ACK is received, the data packet is resent. Data packets that have not received ACK will be stored in the retransmission buffer and deleted from the buffer after the ACK is received.

First of all, it should be made clear that for TCP, timeout retransmission is a very important event (RTO is often greater than twice RTT, and timeout often means congestion). Once this happens, TCP will not only retransmit the corresponding data segment, but also reduce the current data sending rate because TCP will believe that the current network is congested.

A simple timeout retransmission mechanism is often inefficient, as in the following situation:

Assume that data packet 5 is lost, and data packets 6, 7, 8, and 9 have already reached the receiver. At this time, the client can only wait for the server to send an ACK. Note that the server cannot send an ACK for packets 6, 7, 8, and 9. This is determined by the sliding window mechanism. Therefore, the client has no idea how many packets are lost, and may pessimistically believe that the data packets after 5 are also lost, and retransmit these 5 packets, which is a waste.

2. Fast retransmit

The fast retransmission mechanism "RFC5681" triggers retransmission based on feedback from the receiving end rather than retransmission timer expiration.

As mentioned earlier, timer-based retransmissions often require a long wait, and fast retransmission uses a very clever method to solve this problem: if the server receives out-of-order packets, it will also reply to the client with an ACK, but it will be a repeated ACK. Take the example just now, when receiving out-of-order packets 6, 7, 8, and 9, the server will send ACK = 5 for all of them. In this way, the client knows that there is a vacancy in 5. Generally speaking, if the client receives repeated ACKs three times in a row, it will retransmit the corresponding packet without waiting for the timer to time out.

But fast retransmit still does not solve the second problem: how many packets should be retransmitted?

3. Retransmission with Selective Acknowledgement

The improved method is SACK (Selective Acknowledgment). Simply put, it returns the sequence number range of the most recently received segments based on fast retransmission, so that the client knows which data packets have arrived at the server.

Here are a few simple examples:

Case 1: The first packet is lost and the remaining 7 packets are received.

When receiving any of the 7 packets, the receiver will return an ACK with the SACK option to inform the sender which out-of-order packets it has received. Note: Left Edge and Right Edge are the left and right boundaries of these out-of-order packets.

  1. Triggering ACK Left Edge Right Edge
  2. Segment
  3.  
  4. 5000 (lost)
  5. 5500 5000 5500 6000
  6. 6000 5000 5500 6500
  7. 6500 5000 5500 7000
  8. 7000 5000 5500 7500
  9. 7500 5000 5500 8000
  10. 8000 5000 5500 8500
  11. 8500 5000 5500 9000

Case 2: The 2nd, 4th, 6th, and 8th packets are lost.

  • When the first packet is received, there is no disorder and the ACK is replied normally.
  • When receiving the 3rd, 5th, and 7th packets, they reply with ACK with SACK because of out-of-order packets.

Because there are many fragment segments in this case, the corresponding Block segment also has many groups. Of course, due to the size limit of the option field, Block also has an upper limit.

  1. Triggering ACK First Block 2nd Block 3rd Block
  2. Segment Left Right Left Right Left Right
  3. Edge Edge Edge Edge Edge Edge
  4.  
  5. 5000 5500
  6. 5500 (lost)
  7. 6000 5500 6000 6500
  8. 6500 (lost)
  9. 7000 5500 7000 7500 6000 6500
  10. 7500 (lost)
  11. 8000 5500 8000 8500 7000 7500 6000 6500
  12. 8500 (lost)

However, the SACK specification "RFC2018" is a bit tricky. The receiver may provide a SACK to tell the sender this information, and then "go back on its word", that is, the receiver may delete these (out-of-order) data packets and then notify the sender. The following is excerpted from "RFC2018":

Note that the data receiver is permitted to discard data in its queue that has not been acknowledged to the data sender, even if the data has already been reported in a SACK option. Such discarding of SACKed packets is discouraged, but may be used if the receiver runs out of buffer space.

The last sentence means that this measure can be taken when the receiver's buffer is almost exhausted, but this behavior is certainly not recommended. . .

Due to this operation, the sender cannot directly clear the data in the retransmission buffer after receiving SACK. It can only be cleared when the receiver sends a normal ACK number greater than the value of its maximum sequence number. In addition, the retransmission timer is also affected. The retransmission timer should ignore the impact of SACK. After all, the receiver deleting the data is no different from losing the packet.

4. DSACK extension

DSACK, or repeated SACK, is a mechanism that carries additional information on the basis of SACK to inform the sender which packets it has received repeatedly. The purpose of DSACK is to help the sender determine whether there is packet disorder, ACK loss, packet duplication or false retransmission. This allows TCP to better perform network flow control.

Regarding DSACK, "RFC2883" gives many examples. Interested readers can read it. I will not go into details here.

<<:  Image Gallery: TCP/IP Protocol Suite and Security

>>:  6 SD-WAN trends to watch in 2020

Recommend

Currently, CDN security is far from enough

Today, many businesses realize that DDoS defense ...

Teach you how to use the next generation Internet protocol "IPV6"

IPv6 is called the "next generation Internet...

LMT to build 5G air-ground hybrid network with Omnispace

According to foreign media reports, Lockheed Mart...

What else will we look forward to in the communications industry in 2023?

​Hello everyone, I am Xiaozaojun. The joyful and ...