First show the mind map of this article: TCP, as a transport layer protocol, is a reflection of the quality of a software engineer and is also a knowledge point that is often asked in interviews. Here, I have sorted out some questions about the TCP core, hoping to help you. 001. Can you explain the difference between TCP and UDP? First let's summarize the basic differences: TCP is a connection-oriented, reliable, byte stream-based transport layer protocol. UDP is a connectionless transport layer protocol. (It is that simple, and other TCP features are gone). Specifically, compared with UDP, TCP has three core features:
TCP will accurately record which data has been sent, which data has been received by the other party, and which data has not been received, and ensure that the data packets arrive in order without any errors. This is stateful. When TCP detects packet loss or poor network conditions, it will adjust its behavior according to the specific situation, controlling its sending speed or resending. This is controllable. Correspondingly, UDP is stateless and uncontrollable. 3. Byte stream oriented. UDP data transmission is based on datagrams, because it only inherits the characteristics of the IP layer, while TCP converts IP packets into byte streams in order to maintain status. 002: What is the process of TCP three-way handshake? Why is it three times instead of two or four times? Dating Simulator Taking dating as an example, the most important thing for two people to be together is to first confirm their ability to love and be loved. Next, we will use this to simulate the process of a three-way handshake. first: M: I love you. The woman received it. This proves that the man has the ability to love. Second time: Woman: I receive your love and I love you too. The man received it. OK, the current situation shows that the woman has the ability to love and be loved. Third time: M: I received your love. The woman received it. Now it is possible to ensure that the man has the ability to be loved. This fully confirmed the ability of both parties to love and be loved, and the two began a sweet love. Real handshake Of course, the paragraph just now is nonsense and does not represent my values. The purpose is to make everyone understand the meaning of the entire handshake process, because the two processes are very similar. Corresponding to the three-way handshake of TCP, it is also necessary to confirm the two capabilities of both parties: the ability to send and the ability to receive. So there will be the following three-way handshake process: At the beginning, both parties are in the CLOSED state. Then the server starts listening to a certain port and enters the LISTEN state. The client then actively initiates a connection, sends a SYN, and changes itself to the SYN-SENT state. The server receives it, returns SYN and ACK (corresponding to the SYN sent by the client), and becomes SYN-REVD itself. After that, the client sends ACK to the server and changes itself to the ESTABLISHED state; after the server receives the ACK, it also changes to the ESTABLISHED state. Another thing to remind you is that, as can be seen from the figure, SYN consumes a sequence number, and the next time the corresponding ACK sequence number is sent, it must be increased by 1. Why? You only need to remember one rule: Anything that requires confirmation from the other end must consume the sequence number of the TCP message. SYN requires confirmation from the other end, but ACK does not, so SYN consumes a sequence number while ACK does not. Why not twice? Root cause: Unable to confirm the client's receiving capabilities. The analysis is as follows: If it is twice, you now send a SYN message to shake hands, but this packet is stuck in the current network and has not arrived yet. TCP thinks that this is a lost packet, so it retransmits it, and the connection is established after two handshakes. It seems that there is no problem, but after the connection is closed, what if the packet stranded in the network reaches the server? At this time, due to the two-way handshake, as long as the server receives and then sends the corresponding data packet, the connection is established by default, but now the client has been disconnected. You see the problem, this brings about a waste of connection resources. Why not four times? The purpose of the three-way handshake is to confirm the sending and receiving capabilities of both parties. Is the four-way handshake possible? Of course you can do it 100 times. But to solve the problem, three times is enough, and more than that will not be very useful. Can data be carried during the three-way handshake? The third handshake can carry data, but the first two handshakes cannot carry data. If the first two handshakes can carry data, then once someone wants to attack the server, he only needs to put a large amount of data in the SYN message in the first handshake. The server will inevitably consume more time and memory space to process this data, increasing the risk of the server being attacked. During the third handshake, the client is already in the ESTABLISHED state and has been able to confirm that the server's receiving and sending capabilities are normal. At this time, it is relatively safe and can carry data. What happens if they are opened at the same time? If both parties send SYN packets at the same time, what will happen to the state change? This is a scenario that could happen. The state transitions are as follows: When the sender sends a SYN message to the receiver, the receiver also sends a SYN message to the sender. The two are connected! After sending SYN, the status of both packets changes to SYN-SENT. After each receives the other's SYN, the status of both parties changes to SYN-REVD. Then it will reply with the corresponding ACK + SYN. After the other party receives this message, the status of both parties will change to ESTABLISHED. This is the state transition when opening at the same time. 003: Talk about the process of TCP four waves Process Disassembly At the beginning, both parties are in the ESTABLISHED state. The client is about to disconnect and sends a FIN message to the server. The location of the FIN message in the TCP message is as shown below: After sending, the client changes to the FIN-WAIT-1 state. Note that at this time, the client also changes to the half-closed state, that is, it cannot send messages to the server and can only receive. After receiving the message, the server confirms to the client and changes to the CLOSED-WAIT state. The client receives the confirmation from the server and changes to the FIN-WAIT2 state. Then, the server sends a FIN to the client and enters the LAST-ACK state. After the client receives the FIN sent by the server, it changes to the TIME-WAIT state and then sends an ACK to the server. Note that at this time, the client needs to wait long enough, specifically, 2 MSLs (Maximum Segment Lifetime). If the client does not receive a resend request from the server during this period, it means that the ACK has arrived successfully and the waving ends, otherwise the client resends the ACK. The significance of waiting for 2MSL What happens if you don't wait? If you don't wait, the client will run away directly. When the server still has a lot of data packets to send to the client and they are still on the way, if the client's port is occupied by a new application at this time, then useless data packets will be received, causing data packet confusion. Therefore, the safest way is to wait until all the data packets sent by the server are dead before starting a new application. So, one MSL is not enough, why wait for 2 MSL?
This is the meaning of waiting for 2MSL. Why four waves instead of three? Because the server will not return FIN immediately after receiving FIN, it must wait until all messages on the server are sent before sending FIN. Therefore, an ACK is sent first to indicate that the client's FIN has been received, and FIN is sent after a delay. This results in four handshakes. What would be the problem if I waved three times? This means that the server combines the sending of ACK and FIN into one wave. At this time, a long delay may cause the client to mistakenly believe that FIN has not reached the client, causing the client to continuously resend FIN. What happens if both are closed? If the client and server send FIN at the same time, how will the status change? As shown in the figure: 004: Talk about the relationship between the semi-connection queue and the SYN Flood attack Before the three-way handshake, the server's status changes from CLOSED to LISTEN, and two queues are created internally: a semi-connection queue and a full-connection queue, namely the SYN queue and the ACCEPT queue. Semi-connected queue When the client sends SYN to the server, the server replies with ACK and SYN after receiving it, and the state changes from LISTEN to SYN_RCVD. At this time, the connection is pushed into the SYN queue, which is the semi-connection queue. Full connection queue When the client returns ACK and the server receives it, the three-way handshake is completed. At this time, the connection is waiting to be taken away by a specific application. Before being taken away, it will be pushed into another queue maintained by TCP, that is, the full connection queue (Accept Queue). SYN Flood Attack Principle SYN Flood is a typical DoS/DDoS attack. The principle of the attack is very simple, that is, the client forges a large number of non-existent IP addresses in a short period of time and frantically sends SYN to the server. For the server, there will be two dangerous consequences:
How to deal with SYN Flood attack?
005: Introduce the fields in the TCP message header The message header structure is as follows (in bytes): Please remember this picture! Source port, destination port How to uniquely identify a connection? The answer is the TCP connection's quadruple - source IP, source port, destination IP and destination port. Why does the TCP message not have source IP and destination IP? This is because the IP has been processed at the IP layer. TCP only needs to record the ports of the two. Serial Number That is, Sequence number, which refers to the sequence number of the first byte of this message segment. As can be seen from the figure, the serial number is a 4-byte, 32-bit unsigned integer, which ranges from 0 to 2^32 - 1. If the maximum value is reached, it will loop back to 0. Sequence numbers have two functions in TCP communication:
ISN That is, Initial Sequence Number. During the three-way handshake process, both parties will use SYN packets to exchange each other's ISN. ISN is not a fixed value, but increases by 1 every 4 ms, and returns to 0 if overflow occurs. This algorithm makes it difficult to guess the ISN. So why do we do this? If the ISN is predicted by the attacker, it should be noted that the source IP and source port number are very easy to forge. After the attacker guesses the ISN, he can directly forge a RST and force the connection to close, which is very dangerous. The dynamically growing ISN greatly increases the difficulty of guessing the ISN. Confirmation Number That is, ACK (Acknowledgment number). It is used to inform the other party of the next sequence number expected to be received, and all bytes less than ACK have been received. Mark bit Common flag bits include SYN, ACK, FIN, RST, and PSH. SYN and ACK have been mentioned above. The explanations of the last three are as follows: FIN: Finish, indicating that the sender is ready to disconnect. RST: Reset, used to force disconnection. PSH: Push, which tells the other party that these data packets should be handed over to the upper-layer application immediately after receipt and cannot be cached. Window size It takes up two bytes, or 16 bits, but it is not enough. Therefore, TCP introduces the window scaling option, which is a scaling factor for window scaling. The scaling factor ranges from 0 to 14, and the scaling factor can expand the window value to the original 2^n power. Checksum It occupies two bytes to prevent data packets from being damaged during transmission. If a message with a checksum error is encountered, TCP will directly discard it and wait for retransmission. Optional The format of the options is as follows: The commonly used options are as follows:
006: Talk about the principle of TCP Fast Open (TFO) In the first section, we talked about the TCP three-way handshake. Some people may say that it is troublesome to do three-way handshake every time! Can it be optimized? Yes, of course. Today, let's talk about the optimized TCP handshake process, which is the principle of TCP Fast Open (TFO). The optimization process is as follows. Remember the SYN Cookie we mentioned when we talked about the SYN Flood attack? This cookie is not a browser cookie, and TFO can also be implemented with it. TFO Process The first three-way handshake First, the client sends a SYN to the server, and the server receives it. Note! Now the server does not reply SYN + ACK immediately, but calculates a SYN Cookie, puts this Cookie in the Fast Open option of the TCP message, and then returns it to the client. The client gets the value of the cookie and caches it, and then completes the three-way handshake normally. This is the process for the first three-way handshake. But the subsequent three-way handshake is different! The following three-way handshake In the subsequent three-way handshake, the client will send the previously cached Cookie, SYN, and HTTP request (yes, you read that right) to the server. The server verifies the legitimacy of the Cookie. If it is not legal, it will be discarded directly; if it is legal, it will return SYN + ACK normally. Here comes the point. Now the server can send HTTP responses to the client! This is the most significant change. The three-way handshake has not yet been established. The HTTP response can be returned after only verifying the legitimacy of the Cookie. Of course, the client's ACK must be transmitted normally, otherwise it is not called a three-way handshake. The process is as follows: Note: The ACK of the client's final handshake does not have to wait until the server's HTTP response arrives before being sent. The two processes have nothing to do with each other. Advantages of TFO The advantage of TFO lies not in the first three-way handshake, but in the subsequent handshakes. After obtaining the client's cookie and verifying it, the HTTP response can be returned directly, making full use of 1 RTT (Round-Trip Time) to transmit data in advance, which is a relatively large advantage when accumulated. 007: Can you explain the role of timestamps in TCP packets? Timestamp is an optional field in the TCP message header, which occupies 10 bytes in total and has the following format:
Among them, kind = 8, length = 10, info consists of two parts: timestamp and timestamp echo, each occupying 4 bytes. So what are these fields for? What problems do they solve? Next, let's sort them out one by one. TCP timestamps mainly solve two major problems:
Calculate the round trip time RTT When there is no timestamp, the problem of calculating RTT is shown in the following figure: If the first packet is used as the start time, the problem shown in the left figure will occur. The RTT is obviously too large. The start time should be the second one. If the second packet is used as the start time, it will cause the problem shown in the right figure. The RTT is obviously too small. The start time should be the first packet. In fact, whether the start time is based on the first or second package issuance, it is inaccurate. At this time, introducing timestamps can solve this problem well. For example, a sends a message s1 to b, and b replies to a with a message s2 containing an ACK. Then:
Preventing sequence number wraparound issues Now let's simulate this problem. The range of serial numbers is actually between 0 and 2^32 - 1. For the convenience of demonstration, we narrow this range and assume that the range is 0 to 4. Then when it reaches 4, it will return to 0.
Suppose that the packet that was previously stuck in the network comes back for the sixth time, then there will be two data packets with sequence numbers 1 and 2. How to distinguish which is which? At this time, the problem of sequence number wraparound arises. Then using timestamp can solve this problem very well, because each time a packet is sent, the kernel time of the sending machine is recorded in the message. Therefore, even if the sequence numbers of two packets are the same, the timestamps cannot be the same, so the two data packets can be distinguished. 008: How is the TCP timeout retransmission time calculated? TCP has a timeout retransmission mechanism, that is, if there is no reply to the data packet after a period of time, the data packet will be retransmitted. So how is this retransmission interval calculated? Let’s discuss this issue today. This retransmission interval is also called the retransmission timeout (RTO), and its calculation is closely related to the RTT mentioned in the previous section. Here we will introduce two main methods, one is the classic method and the other is the standard method. Classical Methods The classic method introduces a new concept - SRTT (Smoothed round trip time). Every time a new RTT is generated, the SRTT is updated according to a certain algorithm. Specifically, the calculation method is as follows (the initial value of SRTT is 0):
Among them, α is the smoothing factor, the recommended value is 0.8 and the range is 0.8 ~ 0.9. After getting the SRTT, we can calculate the value of RTO:
β is the weighting factor, generally 1.3 ~ 2.0, lbound is the lower bound, and ubound is the upper bound. In fact, this algorithm process is still very simple, but it also has certain limitations. That is, it performs well in places where RTT is stable, but it does not work in places where RTT varies greatly. This is because the smoothing factor α ranges from 0.8 to 0.9, and the impact of RTT on RTO is too small. Standard Methods In order to solve the problem that the classic method is insensitive to RTT changes, the standard method, also called the Jacobson/Karels algorithm, was introduced later. There are three steps in total. Step 1: Calculate SRTT using the following formula:
Note that the value of α in this case is different from that in the classic method. The recommended value is 1/8, which is 0.125. Step 2: Calculate the intermediate variable RTTVAR (round-trip time variation).
The recommended value of β is 0.25. This value is the highlight of this algorithm, that is, it records the difference between the latest RTT and the current SRTT, which provides us with a handle to perceive the change of RTT in the future. Step 3: Calculate the final RTO:
The recommended value for µ is 1, and the recommended value for ∂ is 4. This formula adds the latest RTT and its offset to SRTT, so as to better perceive the change of RTT. Under this algorithm, the difference between RTO and RTT change is more closely related. 009: Can you talk about TCP flow control? For the sender and receiver, TCP needs to put the sent data into the send buffer and the received data into the receive buffer. What flow control requires is to control the sending end through the size of the receiving buffer. If the receiving buffer of the other party is full, it can no longer send. To understand flow control specifically, you first need to understand the concept of sliding windows. TCP Sliding Window TCP sliding windows are divided into two types: send window and receive window. Send Window The sliding window structure of the sender is as follows: It contains four parts:
There are some important concepts, which I marked in the figure: The send window is the range framed in the figure. SND stands for send, WND stands for window, UNA stands for unacknowledged, and NXT stands for next, indicating the next send location. Receive Window The window structure of the receiving end is as follows: REV means receive, NXT indicates the next receiving position, and WND indicates the receiving window size. Flow Control Process We will not use a very complicated example here, but use the simplest back and forth simulation of the flow control process to make it easier for everyone to understand. First, the two parties shake hands three times and initialize their respective window sizes, both of which are 200 bytes. If the current sender sends 100 bytes to the receiver, then for the sender, SND.NXT must be shifted right by 100 bytes, which means that the current available window is reduced by 100 bytes. This is easy to understand. Now these 100 bytes have reached the receiving end and are placed in the receiving end's buffer queue. However, due to the heavy load, the receiving end cannot process so many bytes and can only process 40 bytes, leaving the remaining 60 bytes in the buffer queue. Please note that the receiving end has insufficient processing power at this time. Please send less data to me. Therefore, the receiving window of the receiving end should be reduced. Specifically, it should be reduced by 60 bytes, from 200 bytes to 140 bytes, because there are still 60 bytes in the buffer queue that have not been taken away by the application. Therefore, the receiving end will add the reduced sliding window of 140 bytes to the header of the ACK message, and the sending end will correspondingly adjust the size of the sending window to 140 bytes. At this time, for the sender, the part that has been sent and confirmed increases by 40 bytes, that is, SND.UNA is shifted right by 40 bytes, and the sending window is reduced to 140 bytes. This is the process of flow control. No matter how many rounds there are, the entire control process and principle remain the same. 010: Can you talk about TCP congestion control? The flow control mentioned in the previous section occurs between the sender and the receiver, and does not take into account the impact of the entire network environment. If the current network is particularly poor and packet loss is particularly prone to occur, then the sender should pay more attention. This is exactly the problem that congestion control needs to deal with. For congestion control, TCP needs to maintain two core states for each connection:
The algorithms involved are:
Next, we will break down these states and algorithms one by one. First, let’s talk about the congestion window. Congestion Window Congestion Window (cwnd) refers to the amount of data that can still be transmitted. So we introduced the concept of receive window before. What is the difference between the two?
Who is being restricted? The limit is the size of the send window. With these two windows, how do we calculate the send window? Send window size = min(rwnd, cwnd) Take the smaller value of the two. Congestion control is to control the change of cwnd. Slow Start When you first start transmitting data, you don't know whether the network is stable or congested. If you are too aggressive and send packets too quickly, packets will be lost crazily, causing an avalanche of network disasters. Therefore, congestion control first uses a conservative algorithm to slowly adapt to the entire network. This algorithm is called slow start. The operation process is as follows:
Will it keep doubling endlessly? Of course not. Its threshold is called the slow start threshold. When cwnd reaches this threshold, it is like stepping on the brakes. Don't increase so fast, my friend, hold on! How to control the size of cwnd after reaching the threshold? This is what congestion avoidance does. Congestion Avoidance Originally, cwnd increases by 1 for each ACK received. Now that the threshold has been reached, cwnd can only increase by this much: 1 / cwnd. If you calculate carefully, after one round of RTT, cwnd ACKs are received, then the size of the congestion window cwnd only increases by 1 in total. That is to say, before, cwnd doubled after one RTT, but now cwnd only increases by 1. Of course, slow start and congestion avoidance work together and are integrated. Fast retransmit and fast recovery Fast Retransmit During TCP transmission, if packet loss occurs, that is, when the receiving end finds that the data segments do not arrive in order, the receiving end will resend the previous ACK. For example, if the 5th packet is lost, even if the 6th and 7th packets arrive at the receiving end, the receiving end will always return the ACK for the 4th packet. When the sending end receives 3 duplicate ACKs, it realizes that the packet is lost, and retransmits it immediately without waiting for an RTO time to expire. This is fast retransmission, which solves the problem of whether retransmission is needed. Selective Retransmission Then you may ask, if we need to retransmit, should we retransmit only the 5th packet or all the 5th, 6th, and 7th packets? Of course, the 6th and 7th packets have already arrived. The designers of TCP are not stupid. Why transmit them again if they have already been transmitted? They can simply record which packets have arrived and which have not, and retransmit them in a targeted manner. After receiving the message from the sender, the receiver replies with an ACK message. Then, in the optional options of the message header, the SACK attribute can be added to inform the sender through the left and right edges which intervals of datagrams have been received. Therefore, even if the 5th packet is lost, after receiving the 6th and 7th packets, the receiver will still tell the sender that these two packets have arrived. If the 5th packet has not arrived, it will retransmit this packet. This process is also called selective retransmission (SACK, Selective Acknowledgment), which solves the problem of how to retransmit. Fast recovery Of course, after receiving three repeated ACKs, the sender discovers packet loss and thinks that the network is somewhat congested, so it will enter the rapid recovery phase. At this stage, the sender changes as follows:
The above are the classic algorithms for TCP congestion control: slow start, congestion avoidance, fast retransmit and fast recovery. 011: Can you talk about Nagle algorithm and delayed confirmation? Nagle's algorithm Imagine a scenario where the sender keeps sending small packets to the receiver, sending only 1 byte at a time, so sending 1,000 bytes requires 1,000 times. This frequent sending is problematic, not only because of the transmission delay, but also because sending and confirming are time-consuming. Frequent sending and receiving brings huge delays. Avoiding the frequent sending of small packets is what the Nagle algorithm does. Specifically, the rules of Nagle's algorithm are as follows:
Delayed confirmation Imagine a scenario where I receive a packet from the sender, and then receive a second packet in a very short time. Should I reply one by one, or wait a little bit and merge the ACKs of the two packets and reply together? Delayed ack does the latter, delaying slightly, then merging ACKs, and finally replying to the sender. TCP requires that this delay must be less than 500ms, and most operating systems will not exceed 200ms. However, it is important to note that there are some scenarios where confirmation cannot be delayed and you must reply immediately after receiving it:
What happens when you use both together? The former means delayed sending, and the latter means delayed receiving, which will cause greater delays and performance problems. 012. How to understand TCP keep-alive? Everyone has heard of HTTP keep-alive, but the TCP layer also has a keep-alive mechanism, and it is slightly different from the application layer. Imagine a scenario where one party loses connection due to a network failure or downtime. Since TCP is not a polling protocol, the other party is unaware of the connection failure until the next data packet arrives. At this time, keep-alive appears, and its function is to detect whether the connection at the other end is invalid. Under Linux, you can view the relevant configuration like this:
However, the current situation is that most applications do not enable the TCP keep-alive option by default. Why? From the application perspective:
Therefore, it is a rather awkward design. at last The article was first published on the author's blog (https://github.com/sanyuan0704/my_blog). If you find it helpful, I hope you can help me click a star. Thank you very much~ References: 《Detailed Explanation of Web Protocol and Practical Packet Capture——Tao Hui》 "Interesting Talk on Network Protocols" - Liu Chao Nuggets booklet "In-depth understanding of TCP protocol: from principle to practice" About BBR congestion control algorithm paper |
High decibel losses in fiber optic cable infrastr...
Overview When creating a web service application,...
[[439196]] This article is reprinted from the WeC...
A recent report released by Research And Markets ...
The development of the industry cannot be separat...
Data shows that as of the end of September, the t...
[[281563]] Development History 1. A long time ago...
5G messaging, which is seen by the industry as We...
5G New Radio (NR) is a global standard that enhan...
A classic interview question is what happens from...
In addition to the rapid development and wide cov...
【51CTO.com Quick Translation】 [[425497]] Low-code...
[[184286]] The software development cycle require...
1. Introduction to iPerf3 iPerf3 is a widely used...
Ransomware claim activity is set to grow more tha...