TCP Sliding Window Principle Analysis

TCP Sliding Window Principle Analysis

I. Summary

A few days ago, when I was sharing an article about network programming knowledge, a netizen sent me a private message asking "Can you write an article about the principle of TCP sliding window?"

I didn't respond immediately at the time, but after checking various sources, I discovered that TCP is extremely complex, like a clear ditch that you thought was shallow, but when you step into it, it turns out to be unfathomable.

Although I have summarized some technical knowledge related to network programming before and made some introductions to the TCP protocol stack, the descriptions were generally simple and I did not have a deep understanding. This article has largely filled the gap in my knowledge of computer networks.

Without further ado, let’s get straight to the point!

2. TCP Data Transmission

In the previous article, we learned that the TCP protocol can ensure reliable and error-free data transmission between computers on the network. For example, uploading files, downloading files, browsing web pages, etc. all benefit from it, and the actual application scenarios are very wide.

The UDP protocol is also the dominant protocol along with the TCP protocol. Although the UDP protocol has higher transmission efficiency, it does not guarantee the correctness of data transmission and is slightly inferior to TCP.

In fact, after years of development, the TCP protocol has become a standard protocol for achieving reliable data transmission. The so-called reliability means ensuring that data reaches the destination accurately, without duplication or delay. How does the TCP protocol achieve these characteristics?

In fact, it is not easy to achieve reliable data transmission, because there are many abnormal situations to consider, such as data loss, disordered data order, network congestion, etc. If these problems cannot be solved, there is no way to talk about reliable transmission.

In general, the TCP protocol achieves stable and reliable data transmission through mechanisms such as sequence numbers, confirmation responses, retransmission control, connection management, and window control.

The following is the message format of TCP protocol.

picture

The TCP message segment consists of two parts: the protocol header and the data. The fixed part of the protocol header is 20 bytes, the header is the fixed part, and the option part follows.

The following are the meanings of the various fields in the segment header:

  • Source port number and destination port number: each occupies 2 bytes. The port is the service interface of the transport layer and the application layer, which is used to find the sending and receiving processes. Generally speaking, a TCP connection can be uniquely identified by the port number and IP address. In network programming, it is usually called a socket interface.
  • Sequence number: Seq Sequence number, occupies 4 bytes. It is used to identify the sequence number of the data byte stream sent from the TCP sender to the TCP receiver. The initiator marks this when sending data.
  • Confirmation number: Ack number, occupies 4 bytes, contains the next sequence number that the receiver expects to receive. The confirmation number field is valid only when the ACK flag bit is 1. Therefore, the confirmation number should be the last successfully received data byte sequence number plus 1, that is, Ack = Seq + 1.
  • Data offset: occupies 4 bytes and is used to indicate the length of the TCP header.
  • Reserved field: occupies 6 bits, can be ignored temporarily, and the value is all 0.
  • Six-bit flag bit: The value content has the following meanings

URG (urgent): When it is 1, it indicates that the urgent pointer field is valid

ACK (acknowledgement): When it is 1, it indicates that the confirmation number field is valid

PSH (Push): When it is 1, the receiver should hand over this message segment to the application layer as soon as possible.

RST (reset): When it is 1, it indicates that the TCP connection fails and the connection must be reestablished

SYN (synchronization): used to synchronize sequence numbers when a connection is established

FIN (termination): When it is 1, it indicates that the sender has finished sending data and requires to release the connection.

  • Window: occupies 2 bytes, used for flow control and congestion control, indicating the size of the current receive buffer.
  • Checksum: occupies 2 bytes, including the header and data
  • Urgent pointer: indicates the location of the end of the urgent data in the message segment, used in conjunction with URG
  • Options and padding: are optional and unchecked by default.

When TCP protocol is used to transmit data between computers, each connection needs to go through three stages: creating a connection, transmitting data, and releasing the connection. That is, before transmitting data, a logical connection is established between the sender and the receiver, then the data is transmitted, and finally the connection is disconnected. It ensures relatively reliable data transmission between the two computers.

2.1. Create a connection

Before two devices are ready to transmit data, TCP will establish a connection. The stage of creating a connection requires a three-way handshake. The process is as follows:

picture

The detailed process is as follows:

  • First handshake: The client sends a connection request to the server and waits for the server to confirm
  • Second handshake: After receiving the request, the server sends a confirmation back to the client to notify the client that it has received the connection request
  • The third handshake: The client sends a confirmation message to the server again to confirm the connection

After completing the above three handshakes, the reliability connection is established and data transmission can be carried out.

2.2. Release the connection

When the data transmission is completed, TCP will release the connection. The release of the connection requires four handshakes. The process is as follows:

picture

  • First wave: The client sends a request to the server to disconnect and waits for the server to confirm
  • Second wave: After receiving the request, the server sends a confirmation message to the client and agrees to close the request
  • The third wave: The server sends a request to the client again to cut off the connection and wait for the client to confirm
  • Fourth wave: After receiving the request, the client sends a confirmation message to the server and agrees to close the request

After completing the above 4 waves, the connection is released.

2.3 Data transmission process

Through the above introduction, we can depict a simplified version of the TCP data transmission process, as shown in the figure below.

picture

The sequence number and confirmation response mechanism is one of the ways TCP achieves reliable data transmission and is also the most important cornerstone.

However, in a complex network environment, data transmission may not be as smooth as described in the above figure. For example, if data packets are lost, TCP uses a retransmission mechanism to solve this problem.

3. Introduction to retransmission mechanism

When the network is unstable, data packet loss is likely to occur. What retransmission methods does TCP use to solve the problem of data packet loss?

Common retransmission methods are as follows:

  • Timeout retransmission
  • Fast Retransmit
  • SACK
  • D-SACK

3.1、Timeout retransmission

Timeout retransmission, as the name suggests, means setting a timer when sending data. When the specified time expires and no ACK confirmation message is received from the other party, the data will be resent.

TCP will timeout and retransmit in the following two situations:

  • Sent packets are lost
  • Confirmation reply lost

The key issue is how to set the timeout retransmission time.

Let's first take a look at the normal data transmission process.

picture

RTT refers to the time required for data to be transmitted from one end of the network to the other end, that is, the round-trip time for a data packet to be sent.

The timeout retransmission time is represented by RTO (Retransmission Timeout).

What will happen if the timeout retransmission time is set too long? As shown in the figure below

picture

What will happen if the timeout retransmission time is set too small? As shown in the following figure

picture

After analyzing all the way, we can draw the following conclusions:

  • When the timeout retransmission time RTO is set to a large value, data transmission efficiency will be poor. For example, after a data packet is lost, it takes a long time to retransmit, resulting in poor performance.
  • When the timeout retransmission time RTO is set to a small value, packets may be retransmitted even if they are not lost. Multiple retransmissions will cause network congestion and lead to more timeouts. More timeouts mean more retransmissions.

Therefore, we can draw a conclusion that the timeout retransmission time cannot be set too large or too small, and must be calculated accurately.

Taking the Linux operating system as an example, the RTO calculation process is as follows!

  • First, the round-trip time required for TCP data transmission, that is, the RTT value, is sampled, and then a weighted average is performed to calculate a smooth RTT value. At the same time, this value will continue to change with the network status.
  • In addition to sampling the RTT value, it is also necessary to record the fluctuation of RTT to avoid large changes in RTT that are difficult to detect.

picture

Where SRTT is the smoothed RTT, and DevRTR is the difference between the smoothed RTT and the latest RTT. Under Linux, α = 0.125, β = 0.25, μ = 1, ∂ = 4 are usually used.

The actual calculated retransmission timeout RTO value should be slightly larger than the round-trip RTT value of the message.

If the data that has been retransmitted times out again and needs to be retransmitted, TCP's strategy is to double the timeout interval.

That is to say, every time a timeout retransmission occurs, the next timeout interval will be set to twice the previous value. Multiple timeouts indicate a poor network environment and frequent retransmissions are not recommended.

3.2 Fast Retransmit

Although timeout retransmission can solve the problem of data packet loss, the timeout retransmission time may sometimes be long. Is there a faster retransmission method?

Fast retransmission is used to make up for the problem of too long time in the timeout retransmission mechanism.

Simply put, fast retransmit does not drive retransmission by time like timeout retransmission, but drives retransmission by number of times.

When the number of duplicate ACKs received for a message reaches a certain threshold (usually 3), TCP will check for lost segments and retransmit them before the timer expires.

The general working method can be described as follows!

picture

In the figure above, the sender sends 1, 2, 3, 4, and 5 copies of data to the receiver. The general execution process is as follows:

  • The first packet, Seq1, is sent first, and the receiver responds with Ack 2, indicating that Seq 1 has been received and is ready to receive the next packet with sequence number 2.
  • Seq2 was not received for some reason, and Seq3 arrived. Because Seq2 was missing, Ack was still returned 2
  • The following Seq4 and Seq5 have arrived, but because Seq2 has not been received, Ack returns 2.
  • The sender receives three ACKs with Ack = 2 and knows that Seq2 has not been received. It will retransmit the lost Seq2 before the timer expires.
  • Finally, the receiver receives Seq2. At this time, because Seq3, Seq4, and Seq5 have all been received, Ack returns 6.

Therefore, the working method of fast retransmission is that when the number of identical ACK messages received reaches a threshold, which is 3 by default, the lost segments will be retransmitted before the timer expires.

The fast retransmit mechanism makes up for the problem of too long time in the timeout retransmission mechanism, but it still faces another problem, that is, when retransmitting, should it retransmit the previous one or retransmit all packets?

For example, in the above example, should Seq2 be retransmitted, or should Seq2, Seq3, Seq4, and Seq5 be retransmitted?

Depending on the TCP implementation, both of the above situations are possible.

3.3 SACK method

In order to solve the problem of not knowing which TCP packets to retransmit, genius engineers came up with the SACK method, the full English name: Selective Acknowledgment, also known as selective confirmation.

The specific implementation is to add a SACK in the TCP header option field. The receiver can send the cached data map to the sender, so that the sender can know which data has been received and which data has not been received. Knowing this information, only the lost data can be retransmitted.

As shown in the figure below, when the sender receives the same ACK confirmation message three times, the fast retransmission mechanism will be triggered. Through the SACK information, it is found that only the data segment 200~299 is lost, and the lost segment will be retransmitted to improve the reliability and efficiency of data transmission.

picture

It is important to note that if you want to support the SACK mechanism, both the sender and the receiver must support it. In the Linux operating system, developers can enable this feature through the net.ipv4.tcp_sack parameter (enabled by default after Linux 2.4).

3.4 Duplicate SACK method

Finally, let’s talk about the Duplicate SACK method, also known as D-SACK. This method mainly uses SACK and ACK to tell the sender which data has been received repeatedly to prevent TCP from repeatedly retransmitting.

We use a case to introduce the role of D-SACK, such as the scenario of ACK packet loss, as shown in the figure below!

picture

Process analysis:

  • The sender successfully sent two data packets to the receiver, but the two ACK confirmation responses sent by the receiver to the sender were lost. After the sender checked the timeout, it retransmitted the first data packet (100 ~ 199)
  • The receiver finds that the data is received repeatedly, so it sends back an ACK 300 and SACK 100~199, telling the sender that the data 100~299 has already been received. Because the ACK has reached 300, this SACK can be called D-SACK.
  • When the sender knows that the data is not lost, but the receiver's ACK confirmation message is lost, it will not continue to resend the data packet.

The benefit of using the D-SACK method is that it allows the sender to know whether the sent packet is lost or the ACK packet responded by the receiver is lost, and then decide whether to continue to resend the packet.

In Linux operating system, you can use the net.ipv4.tcp_dsack parameter to enable/disable this feature (enabled by default after Linux 2.4).

4. Introduction to Sliding Window

In the above, we have introduced the data transmission mechanism of the TCP protocol. After a connection is established between two computers, data can be transmitted. TCP must make a confirmation response every time it sends a data packet. When the previous data packet receives the response, the next one is sent to ensure reliable data transmission.

picture

Although this transmission method is reliable, it also has obvious disadvantages. The efficiency of data transmission is very low. It is like you are talking to someone on the phone now. You say a sentence, and you can only say the next sentence after the other party replies to you. This is obviously unrealistic.

To solve this problem, TCP introduced a sliding window, which allows multiple data packets to be sent into the window at one time without waiting for the recipient's confirmation response in sequence. Even if the round-trip time is long, it will not reduce the data transmission efficiency.

So what is a sliding window? Let's take a toll booth on a highway as an example to make an analogy.

Anyone who has been on a highway should know that there is an entrance toll booth and an exit toll booth on the highway. TCP is the same, except that there is a sender sliding window at the entrance and a receiver sliding window at the exit.

picture

For the sender sliding window, we can regard the data packets as vehicles and classify their states:

  • Vehicles that have not yet entered the entrance toll station: This corresponds to the Not Sent, Recipient Not Ready to Receive section in the figure above. These are data that the sender has not sent and the receiver is not ready to receive.
  • Vehicles that have entered the toll booth but not the highway: This corresponds to the Not Sent, Recipient Ready to Receive section in the figure above. These are data that have not been sent by the sender but have been notified to the receiver. In fact, they are already in the window (sender cache) and are waiting to be sent.
  • Vehicles traveling on the highway: These correspond to the Send But Not Yet Acknowledged part in the above figure. These are data that have been sent by the sender and are waiting to be received by the receiver. They are data within the window.
  • Vehicles arriving at the exit toll station: This corresponds to the Sent and Acknowledged part in the above figure. These are data that have been successfully sent and accepted, and these data have left the window.

Similarly, for the receiver sliding window, we can also regard the data packets as vehicles and classify their states:

  • Vehicles that have not yet arrived at the exit toll station: The status is Not Received, indicating that no data has been received.
  • Vehicles that arrive at the exit toll booth but have not completed payment: The status is Received Not ACK, which means that the message has been accepted but no ACK has been sent.
  • Vehicles that have paid the fee and left the exit toll booth: The status is Received and ACK, indicating that the message has been accepted and ACK has been sent.

Through the above description, I believe everyone has a preliminary understanding of the sliding window. In the entire data transmission process, light transmission is similar to a highway, and the sliding window is similar to a toll station. Through the toll station, appropriate flow control of vehicles can be achieved to prevent congestion on the highway. The sliding window has the same effect.

4.1. Sender’s Sliding Window

The figure below is an example of the sliding window of the sender, which is divided into four parts according to the processing situation. The dark blue box is the sending window and the purple box is the available window.

picture

Meaning:

  • #1 indicates the data that has been sent and received ACK confirmation: 1~31 bytes
  • #2 indicates data that has been sent but no ACK confirmation has been received: 32~45 bytes
  • #3 indicates that the total size is not sent but is within the receiver's processing range: 46 to 51 bytes
  • #4 indicates that the data has not been sent but the total size exceeds the processing range of the receiver: 52 bytes or more

When the sender sends all the data at once, the size of the available window becomes 0, indicating that the available window is exhausted and no more data can be sent before receiving the ACK confirmation from the receiver.

picture

After receiving the ACK confirmation response for the previously sent data 32~36 bytes, if the size of the sending window has not changed, the sliding window moves 5 bytes to the right, because 5 bytes of data have been acknowledged, and then 52~56 bytes become the available window again, so the 5 bytes of data 52~56 can be sent subsequently.

picture

How does the program accurately control the sender's window data?

The TCP sliding window scheme uses three pointers to track bytes in each of the four transmission categories. Two of the pointers are absolute pointers (referring to specific sequence numbers) and one is a relative pointer (requires an offset).

picture

Meaning:

  • SND.WND: indicates the size of the send window (the size is specified by the receiver)
  • SND.UNA: is an absolute pointer that points to the sequence number of the first byte that has been sent but not confirmed, that is, the first byte of #2
  • SND.NXT: It is also an absolute pointer, which points to the sequence number of the first byte of the unsent but sendable range, that is, the first byte of #3
  • Available window size: is a relative pointer, calculated by the formula SND.WND - (SND.NXT - SND.UNA)

4.2. Sliding Window of the Receiver

Next, let's look at the receiving side's sliding window. The receiving window is relatively simple and is divided into three parts based on the processing situation.

picture

Meaning:

  • #1 and #2 indicate data that has been successfully received and confirmed, waiting to be read by the application process
  • #3 indicates data not received but data that can be received
  • #4 indicates that no data was received and data cannot be received

The three receiving parts are divided using two pointers:

  • RCV.WND: indicates the size of the receive window, which is notified to the sender
  • RCV.NXT: is an absolute pointer that points to the sequence number of the next data byte expected to be sent from the sender, which is the first byte of #3
  • The maximum value of the acceptable data position: It can be calculated by RCV.NXT + RCV.WND, which is the first byte of #4

V. Summary

Compared with the traditional data transmission model of sending a packet, waiting for confirmation and then sending the packet again, the sliding window transmission method of sending batches of packets at one time and then waiting for confirmation can significantly improve data transmission efficiency. The entire transmission process can be described by the following figure.

picture

Even if the ACK 600 confirmation message in the above figure is lost, it will not affect data transmission, because it can be confirmed by the next confirmation response. As long as the sender receives the ACK 700 confirmation response, it means that the receiver has received all the data before 700. This confirmation response mode is called cumulative confirmation or cumulative response.

In the above, we mentioned that the sliding window has a very important parameter, which is the window size.

Usually, the size of the window is determined by the receiver. The receiver tells the sender how much buffer it has available to receive data to prevent the receiver from sending too much data and being unable to process it, which would trigger the sender's retransmission mechanism and lead to unnecessary waste of network traffic.

By controlling the window size, we can prevent the sender's data from exceeding the receiver's available window, which is often called flow control.

In addition, computer networks are in a shared environment, and network congestion is inevitable. When network congestion occurs, the means of flow control are very limited.

If the network is congested, the sender continues to send a large number of data packets, which may cause data packet delays and loss. At this time, TCP will retransmit the data, which will cause a heavier burden on the network, resulting in greater delays and more packet loss, which may enter a vicious cycle.

Therefore, TCP cannot ignore what happens on the network. When the network is congested, TCP needs to reduce the amount of data sent to prevent the sender's data from filling the entire network. We call this behavior congestion control.

Regarding the implementation of flow control and congestion control, since the article is too long, we will explain it in detail in the next article.

This article organizes the knowledge shared by some excellent netizens. Special thanks to the author Xiaolin Coding for sharing the illustrated TCP sliding window article, which provided great knowledge help. At the same time, combined with my own understanding, I discussed the principle of TCP sliding window in a more comprehensive way. I hope it will be helpful to everyone.

<<:  The role of fiber in integrated infrastructure development

>>:  How 5G and IoT will revolutionize the world

Recommend

5G technology has just emerged, so don’t rush to pour cold water on it

After 3G and 4G have successively gone from unfam...

Practical tips: Teach you step by step to solve the problem of WiFi interference

Suppose there is a large classroom that can accom...

Eleven things to note when using natural cooling technology in data centers

The Green Grid, a non-profit organization dedicat...

Revealed: What secrets does the extra 1G of 5G contain compared to 4G?

With the continuous development of communication ...

5 reasons why DevOps will be a big thing in 2018

DevOps has been a hot topic for a few years now. ...

Why TCP will not be replaced

The reasons for "complaining" about TCP...

Traefik Enterprise Practice: TraefikService

Introduction The routing rules of traefik can imp...

Technology trends to watch in 2018

In the coming 2018, artificial intelligence (AI),...

Karamay: Huawei's first cloud strategic cooperation city in the world

Karamay is a desert city that was born and prospe...

Overview of important developments in the global 5G field in November 2020

In 2020, China, which was the first to achieve a ...