TCP is a very complex protocol because it has to solve many problems, and these problems bring out many sub-problems and dark sides. So learning TCP itself is a painful process, but the learning process can make people gain a lot. For the details of the TCP protocol, I still recommend you to read W. Richard Stevens's "TCP/IP Detailed Explanation Volume 1: Protocol" (of course, you can also read RFC793 and many RFCs later). In addition, I will use English terms in this article, so that you can find relevant technical documents through these English keywords. I want to write this article for three reasons:
Therefore, this article will not cover everything, but will only provide a general introduction to the TCP protocol, algorithms, and principles. Without further ado, first of all, we need to know that TCP is at the fourth layer - the Transport layer - in the seven-layer model of the network OSI, IP is at the third layer - the Network layer, and ARP is at the second layer - the Data Link layer. The data on the second layer is called Frame, the data on the third layer is called Packet, and the data on the fourth layer is called Segment. First of all, we need to know that the data of our program will first be typed into the TCP Segment, then the TCP Segment will be typed into the IP Packet, and then into the Ethernet Frame. After being transmitted to the other end, each layer parses its own protocol and then hands the data over to the higher-level protocol for processing. TCP header format Next, let's take a look at the format of the TCP header: TCP header format (Image source) You need to pay attention to the following points:
For other things, please refer to the following diagram: (Image source) TCP state machine In fact, transmission on the Internet is connectionless, including TCP. The so-called "connection" of TCP is actually just maintaining a "connection state" between the two communicating parties, making it look like there is a connection. Therefore, the state change of TCP is very important. Below is a comparison chart of "TCP protocol state machine" (picture source) and "TCP connection establishment", "TCP connection disconnection", and "data transmission". I put the two pictures side by side so that you can compare them. In addition, the following two pictures are very, very important, you must remember them. (Complaint: seeing such a complex state machine, you know how complex this protocol is. Complex things always have many tricky things, so the TCP protocol is actually quite tricky). Many people will ask, why does it take 3 handshakes to establish a connection and 4 handshakes to disconnect a connection?
Both ends disconnected at the same time (Image source) In addition, there are a few things to note:
Again, using tcp_tw_reuse and tcp_tw_recycle to solve the TIME_WAIT problem is very, very dangerous, because these two parameters violate the TCP protocol (RFC 1122). In fact, TIME_WAIT means that you actively disconnected, so this is the so-called "nothing will happen if you don't commit suicide". Imagine that if you let the other end disconnect, then the problem will be on the other end, haha. In addition, if your server is an HTTP server, how important is it to set an HTTP KeepAlive (the browser will reuse a TCP connection to handle multiple HTTP requests), and then let the client disconnect (you have to be careful, browsers may be very greedy, they will not actively disconnect unless they have no choice). Sequence Number in Data Transmission The following is a screenshot I took from Wireshark of data transmission when I visited coolshell.cn to show you how SeqNum changes. (Use Statistics -> Flow Graph... in the Wireshark menu) You can see that the increase of SeqNum is related to the number of bytes transmitted. In the figure above, after the three-way handshake, two packets with Len: 1440 are received, and the SeqNum of the second packet becomes 1441. Then the previous ACK is 1441, indicating that a 1440 has been received. Note: If you use the Wireshark packet capture program to watch the three-way handshake, you will find that SeqNum is always 0. This is not the case. In order to make the display more friendly, Wireshark uses Relative SeqNum - relative sequence number. You just need to cancel it in the protocol preference in the right-click menu to see "Absolute SeqNum". TCP retransmission mechanism TCP must ensure that all data packets can arrive, so a retransmission mechanism is necessary. Note that the Ack confirmation from the receiver to the sender will only confirm the last continuous packet. For example, the sender sent a total of five data packets, 1, 2, 3, 4, and 5. The receiver received 1 and 2, so it replied ack 3, and then received 4 (note that 3 was not received at this time). What will TCP do at this time? We need to know that, as mentioned earlier, SeqNum and Ack are in bytes, so when ack, you cannot jump to confirm, you can only confirm relatively large consecutive packets, otherwise, the sender will think that all the previous ones have been received. Timeout retransmission mechanism One is to not reply with ACK and wait for 3. When the sender finds that it has timed out and cannot receive the ACK for 3, it will retransmit 3. Once the receiver receives 3, it will ACK 4, which means that both 3 and 4 have been received. However, this method has a more serious problem. That is, because it has to wait for 3, even if 4 and 5 have been received, the sender has no idea what happened. Because no Ack was received, the sender may pessimistically believe that they were lost, which may cause 4 and 5 to be retransmitted. There are two options for this:
These two methods have their pros and cons. One method will save bandwidth, but it is slow, and the second method will be faster, but it will waste bandwidth and may be useless. But in general, neither is good. Because they are waiting for the timeout, the timeout may be very long (the next article will explain how TCP dynamically calculates the timeout). Fast retransmission mechanism Therefore, TCP introduced an algorithm called Fast Retransmit, which is not driven by time, but by data. In other words, if the packets do not arrive continuously, the last packet that may be lost will be acked. If the sender receives the same ack three times in a row, it will retransmit. The advantage of Fast Retransmit is that you don't have to wait for timeout before retransmitting. For example: if the sender sends 1, 2, 3, 4, and 5 pieces of data, and one piece arrives first, it sends an ack to 2. However, 2 is not received for some reason, and 3 arrives, so it sends an ack to 2 again. Then, 4 and 5 arrive, but it sends an ack to 2 again, because 2 is still not received. Therefore, the sender receives three confirmations of ack=2, and knows that 2 has not arrived, so it immediately retransmits 2. Then, the receiver receives 2, and because 3, 4, and 5 have all been received, it sends an ack to 6. The schematic diagram is as follows: Fast Retransmit only solves one problem, which is the timeout problem. It still faces a difficult choice, that is, whether to retransmit the previous one or retransmit all. For the above example, should we retransmit #2 or #2, #3, #4, and #5? Because the sender does not know who sent back these three consecutive acks (2). Maybe the sender sent 20 copies of data, which came from #6, #10, and #20. In this way, the sender is likely to retransmit the data from 2 to 20 (this is the actual implementation of some TCP). Obviously, this is a double-edged sword. SACK Method Another better way is called Selective Acknowledgment (SACK) (see RFC 2018). This method requires adding a SACK to the TCP header. The ACK is still the ACK of Fast Retransmit, and the SACK reports the received data fragments. See the figure below: In this way, the sender can know which data has arrived and which has not arrived based on the returned SACK. Therefore, the Fast Retransmit algorithm is optimized. Of course, this protocol requires support from both sides. In Linux, this function can be enabled through the tcp_sack parameter (enabled by default after Linux 2.4). Another issue that needs attention here is receiver reneging. The so-called reneging means that the receiver has the right to discard the data in the SACK that has been reported to the sender. This is not encouraged because it will complicate the problem. However, the receiver may do this in some extreme cases, such as giving memory to other more important things. Therefore, the sender cannot rely entirely on SACK, but still has to rely on ACK and maintain the Time-Out. If the subsequent ACK does not increase, then the SACK still needs to be retransmitted. In addition, the receiver can never mark the SACK packet as Ack. Note: SACK consumes the sender's resources. Imagine if a hacker sends a bunch of SACK options to the data sender, which will cause the sender to retransmit or even traverse the data that has been sent, which will consume a lot of sender resources. For details, please refer to "TCP SACK Performance Tradeoffs". Duplicate SACK – Issue with duplicate data received Duplicate SACK is also called D-SACK. It mainly uses SACK to tell the sender which data has been received repeatedly. RFC-2883 has detailed descriptions and examples. Here are a few examples (from RFC-2883) D-SACK uses the first segment of SACK as a marker.
Example 1: ACK packet loss In the example below, two ACKs are lost, so the sender retransmits the initial data packet (3000-3499). The receiver then finds that it has received a duplicate, so it sends back a SACK=3000-3500. Because the ACK has reached 4000, it means that all data before 4000 has been received, so this SACK is D-SACK - it is intended to tell the sender that I have received duplicate data, and our sender also knows that the data packet was not lost, but the ACK packet was lost.
Example 2: Network delay In the example below, the network packet (1000-1499) was delayed by the network, resulting in the sender not receiving an ACK. The three packets that arrived later triggered the "Fast Retransmit algorithm", so it was retransmitted. However, during the retransmission, the delayed packet arrived again, so a SACK=1000-1500 was returned. Because the ACK has reached 3000, this SACK is a D-SACK, which indicates that a duplicate packet has been received. In this case, the sender knows that the retransmission triggered by the "Fast Retransmit algorithm" was not caused by the loss of the sent packet or the loss of the response ACK packet, but because of network delay.
It can be seen that the introduction of D-SACK has the following benefits: 1) It allows the sender to know whether the sent packet is lost or the returned ACK packet is lost. 2) Is your timeout too small, resulting in retransmission? 3) The situation where the first-sent packets arrive later on the network (also known as reordering) 4) Are my data packets being copied on the network? Knowing these things can help TCP understand the network situation and thus better perform flow control on the network. The tcp_dsack parameter in Linux is used to enable this feature (enabled by default after Linux 2.4). |
<<: China Mobile's TD-SCDMA network withdrawal begins: Fujian has taken the lead
>>: A brief history of Wi-Fi security protocols, from zero to WPA3
December 6, 2018 was a nightmare day for Japanese...
[[354214]] Organizations implementing long-term r...
On March 30, according to foreign media reports, ...
DediPath has released a Christmas promotion plan,...
On April 20, 1994, China gained full access to th...
In the development history of China's ICT mar...
BudgetVM is still offering a 50% discount on the ...
1. Introduction When an enterprise wants to chang...
Are you building a new office? Is your current of...
As IoT devices become more common, edge computing...
[51CTO.com original article] In recent years, I h...
As the global 5G latest version standard is locke...
CMIVPS has launched its last big promotion this y...
An organization once worked with MIT to interview...
[[426350]] Recover IP address Given a string cont...