Are you still worried about TCP retransmission, sliding window, flow control, and congestion control? You won’t have to worry after reading the diagrams.

Are you still worried about TCP retransmission, sliding window, flow control, and congestion control? You won’t have to worry after reading the diagrams.

Preface

The previous article "Whether it is hard or not, you decide! Nearly 40 illustrations to explain the TCP three-way handshake and four-way wave interview questions that are asked thousands of times" has been recognized by many readers. I would like to thank you for your recognition. It makes everyone feel warm.

[[322830]]

Here I come! Today I am here to illustrate TCP again. Xiaolin may be late, but he will not be absent.

The main reason for the delay is that TCP is extremely complex. In order to ensure reliability, it uses a huge number of mechanisms to ensure it. It is really a "great" protocol, but as I was writing it, I found that it was too complicated. . .

All the pictures in this article were drawn by Xiaolin. It was very hard and tiring. Without further ado, let’s get straight to the main text. Go!

text

I believe everyone knows that TCP is a reliable transmission protocol, so how does it ensure reliability?

In order to achieve reliable transmission, many things need to be considered, such as data corruption, packet loss, duplication, and fragment order disorder. If these problems cannot be solved, there is no way to talk about reliable transmission.

Then, TCP achieves reliable transmission through mechanisms such as sequence numbers, confirmation responses, retransmission control, connection management, and window control.

Today, we will focus on TCP's retransmission mechanism, sliding window, flow control, and congestion control.

outline

1. Retransmission Mechanism

One of the ways TCP achieves reliable transmission is through sequence numbers and confirmation responses.

In TCP, when the data from the sender reaches the receiving host, the receiving host returns a confirmation message to indicate that the message has been received.

Normal data transmission

But in a complex network, data transmission may not be as smooth as shown above. What if the data is lost during the transmission?

Therefore, TCP uses a retransmission mechanism to solve the problem of packet loss.

Next, let’s talk about the common retransmission mechanism:

  • Timeout retransmission
  • Fast Retransmit
  • SACK
  • D-SACK

1. Timeout retransmission

One of the retransmission mechanisms is to set a timer when sending data. If the ACK confirmation message from the other party is not received after the specified time, the data will be resent, which is what we often call timeout retransmission.

TCP will timeout and retransmit in the following two situations:

  • Packet Loss
  • Confirmation reply lost

Two cases of timeout retransmission

(1) What should the timeout be set to?

Let's first understand what RTT (Round-Trip Time) is. From the following figure we can know:

RTT

RTT is the time required for data to be transmitted from one end of the network to the other, that is, the round-trip time of the packet.

The retransmission timeout is expressed as RTO (Retransmission Timeout).

What will happen if the timeout RTO is "longer or shorter" in the case of retransmission?

Longer vs. shorter timeouts

There are two situations with different timeout periods in the figure above:

  • When the timeout period RTO is large, retransmission is slow and it takes a long time to retransmit, which is inefficient and has poor performance.
  • When the timeout RTO is small, it may lead to retransmission even if there is no loss, so the retransmission is fast, which will increase network congestion and cause more timeouts. More timeouts lead to more retransmissions.

It is very important to accurately measure the value of the timeout RTO, which can make our retransmission mechanism more efficient.

Based on the above two situations, we can know that the value of the timeout retransmission time RTO should be slightly larger than the value of the round-trip RTT of the message.

RTO should be slightly larger than RTT

At this point, you may think that the calculation of the retransmission timeout period RTO value is not very complicated.

It seems that when the sender sends a packet, t0 is recorded, and when the receiver receives the ack, t1 is recorded, so RTT = t1 – t0. It is not that simple. This is just a sample and cannot represent the general situation.

In fact, the "round-trip RTT value of the message" often changes, because our network also changes frequently. Because the "round-trip RTT value of the message" often fluctuates, the "timeout retransmission time RTO value" should be a dynamically changing value.

Let's take a look at how Linux calculates RTO.

To estimate the round trip time, you usually need to sample the following two:

  • TCP needs to sample the RTT time and then perform weighted averaging to calculate a smoothed RTT value, and this value needs to change continuously because network conditions are constantly changing.
  • In addition to sampling the RTT, we also need to sample the fluctuation range of the RTT to avoid the situation where a large fluctuation in the RTT would be difficult to detect.

RFC6289 recommends using the following formula to calculate RTO:

RTO calculation recommended by RFC6289

Where SRTT is the smoothed RTT, and DevRTR is the difference between the smoothed RTT and the latest RTT.

Under Linux, α = 0.125, β = 0.25, μ = 1, ∂ = 4. Don't ask how they came about; they were simply obtained through extensive experiments.

If the data that has been retransmitted times out again and needs to be retransmitted, TCP's strategy is to double the timeout interval.

That is, every time a retransmission timeout occurs, the next timeout interval will be set to twice the previous value. Two timeouts indicate a poor network environment and frequent retransmissions are not recommended.

The problem with timeout-triggered retransmission is that the timeout period may be relatively long. Is there a faster way?

Therefore, the "fast retransmission" mechanism can be used to solve the waiting time for timeout retransmission.

2. Fast retransmit

TCP also has another fast retransmit mechanism, which is not time-driven but data-driven retransmission.

How does the fast retransmit mechanism work? It's actually very simple, a picture is worth a thousand words.

Fast retransmission mechanism

In the figure above, the sender sent 1, 2, 3, 4, and 5 copies of data:

  • The first packet, Seq1, is delivered first, so Ack returns 2;
  • As a result, Seq2 was not received for some reason, but Seq3 arrived, so Ack was still sent back to 2;
  • Seq4 and Seq5 have arrived, but Ack returns 2 because Seq2 has not been received.
  • The sender receives three acknowledgments with Ack = 2 and knows that Seq2 has not been received. It will retransmit the lost Seq2 before the timer expires.
  • Finally, Seq2 is received. At this time, because Seq3, Seq4, and Seq5 have all been received, Ack returns 6.

Therefore, the way fast retransmit works is that when three identical ACK messages are received, the lost segments will be retransmitted before the timer expires.

The fast retransmit mechanism only solves one problem, which is the timeout problem, but it still faces another problem: when retransmitting, should it retransmit the previous one or retransmit all the packets?

For example, for the above example, should Seq2 be retransmitted? Or should Seq2, Seq3, Seq4, and Seq5 be retransmitted? This is because the sender does not know who sent back the three consecutive Ack 2s.

Depending on the TCP implementation, both of the above situations are possible. It can be seen that this is a double-edged sword.

In order to solve the problem of not knowing which TCP packets to retransmit, the SACK method was developed.

3. SACK method

There is another way to implement the retransmission mechanism called SACK (Selective Acknowledgment).

This method requires adding a SACK in the "Option" field of the TCP header, which can send the cached map to the sender so that the sender can know which data has been received and which data has not been received. Knowing this information, only the lost data can be retransmitted.

As shown in the figure below, the sender receives the same ACK confirmation message three times, which triggers the fast retransmission mechanism. Through the SACK information, it is found that only the data segment 200~299 is lost. When retransmitting, only this TCP segment is selected for repetition.

Selective Confirmation

If SACK is to be supported, both parties must support it. In Linux, this feature can be enabled via the net.ipv4.tcp_sack parameter (enabled by default after Linux 2.4).

4. Duplicate SACK

Duplicate SACK, also known as D-SACK, mainly uses SACK to tell the "sender" which data has been received repeatedly.

The following two examples are used to illustrate the role of D-SACK.

Example 1: ACK packet loss

ACK packet loss

  • Both ACKs sent by the receiver to the sender are lost, so the sender times out and retransmits the first packet (3000 ~ 3499)
  • The "receiver" then finds that the data is received repeatedly, so it returns a SACK = 3000~3500, telling the "sender" that the data from 3000 to 3500 has already been received. Because the ACK has reached 4000, it means that all the data before 4000 has been received, so this SACK represents D-SACK.
  • In this way, the "sender" will know that the data is not lost, but the ACK confirmation message from the "receiver" is lost.

Example 2: Network Delay

Network delay

  • The data packets (1000~1499) were delayed by the network, resulting in the "sender" not receiving the confirmation message Ack 1500.
  • The three identical ACK confirmation messages that arrived later triggered the fast retransmission mechanism, but after the retransmission, the delayed data packets (1000~1499) arrived at the "receiver" again;
  • So the "receiver" sends back a SACK=1000~1500. Since the ACK has reached 3000, this SACK is a D-SACK, indicating that a duplicate packet has been received.
  • In this way, the sender knows that the reason for triggering fast retransmission is not because the sent packet is lost, nor because the response ACK packet is lost, but because of network delay.

It can be seen that D-SACK has the following advantages:

  • It can let the "sender" know whether the sent packet is lost or the ACK packet responded by the receiver is lost;
  • You can know whether the data packet of the "sender" is delayed by the network;
  • You can know whether the data packet of the "sender" is copied in the network;

In Linux, this feature can be enabled/disabled via the net.ipv4.tcp_dsack parameter (enabled by default since Linux 2.4).

2. Sliding Window

1. Reasons for introducing the window concept

We all know that TCP requires a confirmation response for each data packet sent. When the previous data packet receives the response, the next one is sent.

This mode is a bit like a face-to-face chat between you and me, you talk a few words and I talk a few words. But the disadvantage of this method is that it is relatively inefficient.

If you finish a sentence and I am dealing with other things and don't reply to you in time, then you have to wait for me to finish other things and reply to you before you can say the next sentence. Obviously, this is unrealistic.

Confirmation response by data packet

Therefore, this transmission method has a disadvantage: the longer the round-trip time of the data packet, the lower the efficiency of communication.

To solve this problem, TCP introduced the concept of window, which will not reduce the efficiency of network communication even when the round-trip time is long.

Then, with a window, you can specify the window size. The window size refers to the maximum value of data that can be sent without waiting for a confirmation response.

The window is actually a buffer space opened by the operating system. The sending host must keep the sent data in the buffer before waiting for the confirmation reply to return. If the confirmation reply is received on time, the data can be cleared from the buffer.

Assuming the window size is 3 TCP segments, the sender can "continuously send" 3 TCP segments, and if ACK is lost in the middle, it can be confirmed by "the next confirmation response". As shown in the following figure:

Parallel processing using sliding windows

It doesn't matter if the ACK 600 message is lost, because the next ACK can be used for confirmation. As long as the sender receives ACK 700, it means that the receiver has received all the data before 700. This mode is called cumulative confirmation or cumulative response.

(1) Who determines the window size?

There is a field in the TCP header called Window, which is the window size.

This field is used by the receiver to tell the sender how much buffer space it has available to receive data. The sender can then send data based on the receiver's processing capability without causing the receiver to be unable to process the data.

Therefore, usually the window size is determined by the receiver.

The data size sent by the sender cannot exceed the window size of the receiver, otherwise the receiver will not be able to receive the data normally.

(2) Sender’s sliding window

Let's first look at the sender's window. The following figure shows the data cached by the sender. It is divided into four parts according to the processing situation. The dark blue box is the send window, and the purple box is the available window:

  • #1 is the data that has been sent and received ACK confirmation: 1~31 bytes
  • #2 is the data that has been sent but no ACK confirmation has been received: 32~45 bytes
  • #3 is not sent but the total size is within the receiver's processing range (the receiver still has space): 46~51 bytes
  • #4 is not sent but the total size exceeds the receiver's processing range (the receiver has no space): after 52 bytes

In the figure below, when the sender sends "all" the data at once, the size of the available window becomes 0, indicating that the available window is exhausted and no more data can be sent before receiving ACK confirmation.

Available window exhausted

In the figure below, after receiving the ACK confirmation response for the previously sent data 32~36 bytes, if the size of the sending window has not changed, the sliding window moves 5 bytes to the right, because 5 bytes of data have been acknowledged, and then 52~56 bytes become the available window again, so the 5 bytes of data 52~56 can be sent subsequently.

32 ~ 36 bytes confirmed

(3) How does the program represent the four parts of the sender?

The TCP sliding window scheme uses three pointers to track bytes in each of the four transmission categories. Two of the pointers are absolute pointers (referring to a specific sequence number) and one is a relative pointer (requires an offset).

SND.WND, SND.UN, SND.NXT

  • SND.WND: indicates the size of the send window (the size is specified by the receiver);
  • SND.UNA: is an absolute pointer that points to the sequence number of the first byte that has been sent but not confirmed, that is, the first byte of #2.
  • SND.NXT: It is also an absolute pointer, which points to the sequence number of the first byte of the unsent but sendable range, that is, the first byte of #3.
  • The first byte pointing to #4 is a relative pointer, which requires the SND.NXT pointer plus an offset of the size of SND.WND to point to the first byte of #4.

Then the calculation of the available window size can be: Available window size = SND.WND - (SND.NXT - SND.UNA)

(4) Receiver’s Sliding Window

Next, let's look at the receiver's window. The receiver window is relatively simple and is divided into three parts based on the processing situation:

  • #1 + #2 is the data that has been successfully received and confirmed (waiting for the application process to read);
  • #3 is data that was not received but can be received;
  • #4 Data not received and data that cannot be received;

Receive Window

The three receiving parts are divided using two pointers:

  • RCV.WND: Indicates the size of the receive window, which will be notified to the sender.
  • RCV.NXT: is a pointer that points to the sequence number of the next data byte expected to be sent from the sender, which is the first byte of #3.
  • The first byte pointing to #4 is a relative pointer, which requires the RCV.NXT pointer plus an offset of the size of RCV.WND to point to the first byte of #4.

(5) Are the sizes of the receive window and the send window equal?

They are not completely equal. The size of the receive window is approximately equal to the size of the send window.

Because the sliding window is not static. For example, when the receiving application reads data very quickly, the receiving window can be vacant very quickly. Then the new receiving window size is told to the sender through the Windows field in the TCP message. There is a delay in the transmission process, so the receiving window and the sending window are approximately equal.

3. Flow Control

The sender cannot send data to the receiver mindlessly; the receiver's processing capabilities must be considered.

If you keep sending data to the other party without thinking, but the other party cannot process it, it will trigger the retransmission mechanism, resulting in unnecessary waste of network traffic.

To solve this problem, TCP provides a mechanism that allows the "sender" to control the amount of data sent according to the actual receiving capacity of the "receiver". This is called flow control.

For simplicity, let's assume the following scenario:

  • The client is the receiver and the server is the sender
  • Assume that the receiving window and the sending window are the same, both are 200
  • Assume that both devices maintain the same window size throughout the transmission process and are not affected by external factors.

Flow Control

According to the flow control in the figure above, each process is described below:

  • The client sends a request datagram to the server. It should be noted that this example uses the server as the sender, so the server's receiving window is not drawn.
  • After receiving the request message, the server sends a confirmation message and 80 bytes of data, so the available window Usable is reduced to 120 bytes. At the same time, the SND.NXT pointer also shifts 80 bytes to the right and points to 321, which means that the next time data is sent, the sequence number is 321.
  • After the client receives 80 bytes of data, the receiving window moves 80 bytes to the right, and RCV.NXT points to 321, which means that the sequence number of the next message expected by the client is 321, and then a confirmation message is sent to the server.
  • The server sends 120 bytes of data again, so the available window is exhausted to 0, and the server can no longer send data.
  • After the client receives 120 bytes of data, the receiving window moves 120 bytes to the right, RCV.NXT points to 441, and then sends a confirmation message to the server.
  • After the server receives the confirmation message for the 80 bytes of data, the SND.UNA pointer shifts rightward to point to 321, so the available window Usable increases to 80.
  • After the server receives the confirmation message for the 120-byte data, the SND.UNA pointer shifts rightward to point to 441, so the available window Usable increases to 200.
  • The server can continue to send, so after sending 160 bytes of data, SND.NXT points to 601, so the available window Usable is reduced to 40.
  • After the client receives 160 bytes, the receiving window moves 160 bytes to the right, and RCV.NXT points to 601, and then sends a confirmation message to the server.
  • After the server receives the confirmation message for the 160-byte data, the send window moves 160 bytes to the right, so the SND.UNA pointer shifts by 160 and points to 601, and the available window Usable increases to 200.

1. Relationship between operating system buffer and sliding window

In the previous flow control example, we assumed that the send window and the receive window are unchanged, but in fact, the number of bytes stored in the send window and the receive window are placed in the operating system memory buffer, and the operating system buffer will be adjusted by the operating system.

When the application process cannot read the contents of the buffer in time, it will also affect our buffer.

(1) How does worrying about the system buffer affect the send window and receive window?

Let’s take a look at the first example.

Changes in the send window and receive window when the application does not read the cache in time.

Consider the following scenario:

  • The client acts as the sender and the server acts as the receiver. The initial sizes of the send window and the receive window are 360;
  • The server is very busy. When receiving data from the client, the application layer cannot read the data in time.

According to the flow control in the figure above, each process is described below:

  • After the client sends 140 bytes of data, the available window becomes 220 (360 - 140).
  • The server receives 140 bytes of data, but the server is very busy, and the application process only reads 40 bytes, and there are 100 bytes occupying the buffer, so the receive window shrinks to 260 (360 - 100), and finally sends the confirmation message, passing the window size to the client.
  • After the client receives the confirmation and window notification messages, the sending window is reduced to 260.
  • The client sends 180 bytes of data, and the available window is reduced to 80.
  • The server receives 180 bytes of data, but the application does not read any data, so the 180 bytes remain in the buffer. The receive window shrinks to 80 (260 - 180), and the window size is sent to the client when the confirmation message is sent.
  • After the client receives the confirmation and window notification messages, the sending window is reduced to 80.
  • After the client sends 80 bytes of data, the available window is exhausted.
  • The server receives 80 bytes of data, but the application still does not read any data. These 80 bytes remain in the buffer, so the receive window shrinks to 0 and sends the window size to the client when sending the confirmation message.
  • After the client receives the confirmation and window notification messages, the sending window is reduced to 0.

It can be seen that the final window is shrunk to 0, that is, the window is closed. When the sender's available window becomes 0, the sender will actually send a window detection message at regular intervals to know whether the receiver's window has changed. This content will be discussed later, so I will briefly mention it here.

Let’s look at the second example first.

When the server system resources are very tight, the worrying system may directly reduce the size of the receiving buffer. At this time, the application cannot read the cached data in time. Then something serious will happen and data packets will be lost.

Describe each process:

  • The client sends 140 bytes of data, and the available window is reduced to 220.
  • Because the server is very busy, the operating system reduces the receive buffer by 100 bytes. After receiving the confirmation message for 140 data, because the application did not read any data, 140 bytes remained in the buffer, so the receive window size shrunk from 360 to 100. Finally, when sending the confirmation message, the window size is notified to the other party.
  • At this time, the client has not received the notification window message from the server, so it does not know that the receive window has shrunk to 100. The client only sees that its available window is 220, so the client sends 180 bytes of data, and the available window is reduced to 40.
  • When the server receives 180 bytes of data, it finds that the data size exceeds the size of the receive window, so it loses the data packet.
  • When the client receives the confirmation message and window notification message sent by the server in step 2, it tries to reduce the send window to 100 and shrinks the right end of the window to the left by 80. At this time, the available window size will have a strange negative value.

Therefore, if the cache is reduced first and then the window is shrunk, packet loss will occur.

To prevent this from happening, TCP regulations do not allow reducing the cache and shrinking the window at the same time. Instead, the window is shrunk first, and then the cache is reduced after a while, thus avoiding packet loss.

2. Window closed

As we have seen before, TCP performs flow control by allowing the receiver to indicate the amount of data (window size) it wants to receive from the sender.

If the window size is 0, the sender will be prevented from passing data to the receiver until the window becomes non-zero, which means the window is closed.

(1) Potential dangers of window closing

When the receiver notifies the sender of the window size, it does so through an ACK message.

Then, when the window is closed, after processing the data, the receiver will notify the sender of an ACK message with a window value other than 0. If the ACK message notifying the window is lost in the network, it will be a big trouble.

Window closing potential danger

This will cause the sender to wait for the receiver's non-zero window notification, and the receiver will also wait for the sender's data. If no measures are taken, this mutual waiting process will cause a deadlock.

(2) How does TCP solve the potential deadlock problem when the window is closed?

To solve this problem, TCP sets a persistence timer for each connection. As long as one party of the TCP connection receives a zero window notification from the other party, the persistence timer is started.

If the persistence timer times out, a window probe message will be sent, and when the other party confirms the probe message, it will give its current receive window size.

Window detection

  • If the receive window is still 0, the party that receives this message will restart the persistence timer;
  • If the receive window is not 0, the deadlock situation can be broken.

The number of window probes is generally 3 times, each time about 30-60 seconds (different implementations may be different). If the receive window is still 0 after 3 times, some TCP implementations will send a RST message to terminate the connection.

3. Confused Window Syndrome

If the receiver is too busy and does not have time to remove the data in the receiving window, the sender's sending window will become smaller and smaller.

In the end, if the receiver frees up a few bytes and tells the sender that there is now a window of a few bytes, the sender will send these bytes without hesitation. This is the silly window syndrome.

You know, our TCP + IP header has 40 bytes. It is not economical to incur such a large overhead just to transmit those few bytes of data.

It's like a bus that can carry 50 people. Every time one or two people come, it will just leave. Only bus drivers who have a mine at home dare to do this, otherwise they will go bankrupt sooner or later. It's not difficult to solve this problem. The bus driver will wait until the number of passengers exceeds 25 before he decides that the bus can leave.

As an example of confused window syndrome, consider the following scenario:

The receiver's window size is 360 bytes, but the receiver is stuck for some reason. Assume that the receiver's application layer has the following reading capabilities:

For every 3 bytes received by the receiver, the application can only read 1 byte of data from the buffer;

  • Before the next sender's TCP segment arrives, the application
  • 40 additional bytes are read from the buffer;

Confused Window Syndrome

The changes in the window size of each process are clearly described in the figure. It can be found that the window is constantly decreasing and the data sent is relatively small.

So, the phenomenon of silly window syndrome can occur on both the sender and the receiver:

  • The receiver can advertise a small window
  • The sender can send small data

So, to solve the confused window syndrome, just solve the above two problems.

  • Let the receiver not notify the sender of the small window
  • Let the sender avoid sending small data

(1) How can I prevent the recipient from notifying the small window?

The usual strategy for the receiver is as follows:

When the "window size" is less than min(MSS, cache space/2), that is, less than the minimum value of MSS and 1/2 cache size, the sender will be notified that the window is 0, which prevents the sender from sending any more data.

After the receiver has processed some data, the window size >= MSS, or half of the receiver's buffer space is available, the window can be opened to allow the sender to send data.

(2) How can the sender avoid sending small data?

The sender's usual strategy:

The Nagle algorithm is used. The idea of ​​the algorithm is delay processing. Data can be sent only when one of the following two conditions is met:

  • Wait until the window size >= MSS or the data size >= MSS
  • Receive the ACK reply packet of the previously sent data

As long as one of the above conditions is not met, the sender will continue to store data until the above sending condition is met.

In addition, the Nagle algorithm is turned on by default. If you are using programs that require small data packets to interact, such as telnet or ssh, which are highly interactive programs, you need to turn off the Nagle algorithm.

You can set the TCP_NODELAY option in the Socket to disable this algorithm (there is no global parameter for disabling the Nagle algorithm, and it needs to be disabled according to the characteristics of each application)

  1. setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, (char *)&value, sizeof(int));

4. Congestion Control

1. Why do we need congestion control? Isn't there flow control?

The previous flow control is to prevent the "sender"'s data from filling up the "receiver"'s cache, but it does not know what happened in the network.

Generally speaking, computer networks are in a shared environment, so it is possible that the network will be congested due to communications between other hosts.

When the network is congested, if a large number of data packets continue to be sent, it may cause data packet delays and loss. At this time, TCP will retransmit the data, but retransmission will cause a heavier burden on the network, which will lead to greater delays and more packet loss. This situation will enter a vicious cycle and be continuously amplified...

Therefore, TCP cannot ignore what happens on the network. It is designed as a selfless protocol. When the network is congested, TCP will sacrifice itself and reduce the amount of data sent.

So, there is congestion control, the purpose of which is to prevent the "sender"'s data from filling up the entire network.

In order to regulate the amount of data to be sent on the "sender side", a concept called "congestion window" is defined.

2. What is the congestion window? How does it relate to the send window?

The congestion window cwnd is a state variable maintained by the sender, which changes dynamically according to the degree of network congestion.

We mentioned earlier that the sending window swnd and the receiving window rwnd are approximately equal. Now, after the concept of congestion window is introduced, the value of the sending window is swnd = min(cwnd, rwnd), which is the minimum value of the congestion window and the receiving window.

The rules for changing the congestion window cwnd are as follows:

  • As long as there is no congestion in the network, cwnd will increase;
  • But when congestion occurs in the network, cwnd decreases;

3. So how do you know if the current network is congested?

In fact, as long as the "sender" does not receive the ACK response message within the specified time, that is, a timeout retransmission occurs, it will be considered that the network is congested.

4. What are the control algorithms for congestion control?

Congestion control mainly consists of four algorithms:

  • Slow Start
  • Congestion Avoidance
  • Congestion occurs
  • Fast recovery

(1) Slow start

After TCP has just established a connection, it first has a slow start process. This slow start means increasing the number of data packets sent little by little. If a large amount of data is sent right away, wouldn't this cause congestion to the network?

The slow start algorithm only requires one rule to be remembered: every time the sender receives an ACK, the size of the congestion window cwnd increases by 1.

Here it is assumed that the congestion window cwnd and the send window swnd are equal. Here is an example:

  • After the connection is established, cwnd is initialized to 1, indicating that data of an MSS size can be transmitted.
  • When an ACK is received, cwnd is increased by 1, so 2 packets can be sent at a time.
  • After receiving 2 ACKs, cwnd increases by 2, so 2 more packets can be sent than before, so 4 packets can be sent this time.
  • When these 4 ACK confirmations arrive, the cwnd of each confirmation increases by 1, and the cwnd of the 4 confirmations increases by 4, so 4 more can be sent than before, so 8 can be sent this time.

Slow start algorithm

It can be seen that with the slow start algorithm, the number of packets sent increases exponentially.

So when will the slow start increase end?

There is a state variable called ssthresh (slow start threshold).

  • When cwnd < ssthresh, the slow start algorithm is used.
  • When cwnd >= ssthresh, the "congestion avoidance algorithm" will be used.

(2) Congestion Avoidance Algorithm

As mentioned earlier, when the congestion window cwnd "exceeds" the slow start threshold ssthresh, the congestion avoidance algorithm will be entered.

Generally speaking, the size of ssthresh is 65535 bytes.

Then after entering the congestion avoidance algorithm, its rule is: every time an ACK is received, cwnd increases by 1/cwnd.

Continuing with the previous slow start example, let's assume that ssthresh is 8:

When 8 ACK acknowledgements arrive, each confirmation increases by 1/8, and the cwnd of 8 ACK confirmations increases by 1 in total. Therefore, 9 MSS-sized data can be sent this time, which becomes a linear growth.

Congestion Avoidance

Therefore, we can find that the congestion avoidance algorithm changes the exponential growth of the original slow start algorithm into linear growth. It is still in the growth stage, but the growth rate is slower.

As the traffic keeps growing, the network will gradually become congested, causing packet loss. At this time, the lost data packets will need to be retransmitted.

When the retransmission mechanism is triggered, the "congestion occurrence algorithm" is entered.

(3) Congestion occurs

When the network is congested, data packets will be retransmitted. There are two main retransmission mechanisms:

  • Timeout retransmission
  • Fast Retransmit

The congestion sending algorithms used by these two methods are different, and we will discuss them separately below.

a. Congestion occurrence algorithm for timeout retransmission

When a "timeout retransmission" occurs, the congestion occurrence algorithm will be used.

At this time, the values ​​of sshresh and cwnd will change:

  • ssthresh is set to cwnd/2,
  • cwnd is reset to 1

Congested sending - timeout retransmission

Then, we restart the slow start, which will suddenly reduce the data flow. This is really like returning to the pre-liberation era once the "timeout retransmission" occurs. However, this method is too radical and the reaction is too strong, which will cause network lag.

It's like drifting at high speed on Akina Mountain, and suddenly you have to brake urgently. Can the tires withstand it? . . .

b. Congestion occurrence algorithm with fast retransmit

There is a better way. We have talked about the "fast retransmit algorithm" before. When the receiver finds that an intermediate packet is lost, it sends the ACK of the previous packet three times, so the sender will retransmit quickly without waiting for a timeout.

TCP considers this situation not serious because most of the packets are not lost, only a small part is lost. Then the changes of ssthresh and cwnd are as follows:

  • cwnd = cwnd/2, which means it is set to half of the original value;
  • ssthresh = cwnd;
  • Enter the fast recovery algorithm

(4) Quick recovery

Fast retransmit and fast recovery algorithms are usually used at the same time. The fast recovery algorithm believes that if you can still receive 3 duplicate ACKs, it means that the network is not that bad, so there is no need to be as strong as RTO timeout.

As mentioned before, before entering fast recovery, cwnd and ssthresh have been updated:

  • cwnd = cwnd/2, which means it is set to half of the original value;
  • ssthresh = cwnd;

Then, enter the fast recovery algorithm as follows:

  • Congestion window cwnd = ssthresh + 3 (3 means that 3 packets are confirmed to have been received)
  • Retransmit lost packets
  • If a duplicate ACK is received, cwnd increases by 1.
  • If an ACK for new data is received, cwnd is set to ssthresh, and then the congestion avoidance algorithm is entered.

Fast retransmit and fast recovery

In other words, the situation did not return to the pre-liberation era overnight like "timeout retransmission", but it remained at a relatively high value and subsequently increased linearly.

<<:  Driven by the new infrastructure, will data center construction be "rushed"?

>>:  Why is 5G considered the criterion for the Internet of Things era?

Recommend

A brief discussion on operation and maintenance under SDN architecture

At present, the domestic network operation and ma...

Quickly understand the core components based on Netty server

Thanks to Netty's excellent design and encaps...

Gcore (gcorelabs) Japanese VPS simple test

The tribe has shared G-core product information s...

How to handle errors and exceptions in large software

【51CTO.com Quick Translation】 "I didn't ...

Do you know how much power 5G actually consumes?

5G is one of the hottest topics at the moment, an...

PacificRack: $8.8/year KVM-768MB/10GB/1TB/Los Angeles data center

On the 1st of this month, I shared PacificRack...

XSX: Japan/Singapore dedicated servers 50% off, E3-1230v3/16GB/480G SSD only $57

XSX.net recently launched a 50% discount promotio...