Munich, 2013 Nowadays, many programmers are "Internet programmers". In theory, they should be quite clear about the basic protocols of the Internet. Unfortunately, at least from my interview experience, many people have missed too many lessons in this area. Simply talking about the TCP/IP protocol layering has stumped many people. As for the "three-way handshake" of TCP/IP, very few people can talk about it. If you ask "why is it a three-way handshake", basically no one can answer it. The general answer is "this is too difficult" or "it's been too long since I graduated, I forgot this." If you cram for the exam and memorize the three-way handshake of TCP for the interview, you can do it. But to answer why TCP uses three handshakes instead of two or four, memorizing is not enough - if you don't believe it, search on the Internet and you will find all kinds of answers, with different opinions, and many questioners are confused. Is TCP-related knowledge important? I think it is very important. No matter how the Internet changes over the years, the TCP protocol itself can carry it. If you explore it carefully, you will find that its design is indeed clever and has many design ideas worth learning from. So is TCP really that difficult? Why do many people find it painful to memorize the TCP handshake process and have difficulty repeating it? I think the reason is that everyone only treats it as an "existing fact", just like memorizing history and politics in middle school. But TCP is not illogical nonsense. Once you understand the design ideas and logic, you will find that it is not difficult to understand. So, today I will give a simple explanation. First of all, let me talk about the translation of "three-way handshake". I do think the translation is wrong (having translated and published more than one million words of technical materials, I am confident that I am confident). I used to have trouble remembering the process of "three-way handshake" because I always thought it was "shaking hands three times" and "handshake" is a process where both parties come together, which is obviously inconsistent with the connection establishment process. Later I found that there was probably something wrong with the "three-way handshake". The original name of "three-way handshake" is three-way handshake. A more appropriate translation of three-way is probably "three steps", so the whole noun means "it takes three steps to establish a handshake mechanism". The advantage of this explanation is that "step" gives people a more vivid feeling, which is just "one step on one side". In fact, RFC 793 states that the handshake process can also be called three-message handshake, which is a handshake established through three messages. So, why does it take three steps to establish a handshake? We can ignore this question for now and think about what we should do if we design the handshake mechanism ourselves. We all know that TCP is a reliable communication protocol. Its "reliability" lies in the fact that any party must receive a confirmation response (ACK) before sending data (SYN) to the other party. At the same time, TCP is also a two-way communication protocol, so both parties in the communication can actively send messages. One thing to clarify here is that for many "Internet programmers", TCP is hidden under HTTP. The classic communication mode of HTTP that everyone is familiar with is "one question and one answer". There is no response without a request. However, this is only a feature of HTTP, not a feature of TCP. In the TCP protocol, the client and the server can actively send data to each other at any time - it is precisely because of this that after switching to HTTP/2, the server can actively push information to the client without having to change the TCP protocol. Back to TCP, since it is a two-way, reliable communication, it can be imagined that to establish a connection, it is necessary to confirm that the communication between both parties to each other is reliable, so it takes about four steps and sends four messages. It would be great if software design was that simple. Unfortunately, nothing is that simple. Looking closely at this picture, we can see several problems: ***, the cost of network communication is very high, and the delay is often unpredictable. Even if you can send a message less often, you can greatly reduce costs and improve efficiency. Therefore, the upper limit of the steps to establish a connection should be four steps, and the lower limit is two steps. The fewer the better. Second, the two rounds of SYN/ACK must be associated, because their functions are relatively independent, both of which confirm that the other party's communication is reliable, but they belong to the same logical operation of "establishing a connection." If the two rounds are completely independent, then if there is an extremely long interval between the two rounds, it is not a normal operation to establish a connection at all, and the program cannot recognize it, which is obviously not acceptable. Therefore, the second round of SYN/ACK must be able to be associated with the first round of SYN/ACK. If we look more closely, we can see that both the second and third steps involve sending messages from the server to the client, so can they be combined? This will at least save one network communication. Just merge ACK and SYN in the second step as above, and the problem is solved? According to the previous analysis, saving the number of message sending times is only one of the considerations. Another thing that needs to be considered is that the second round of SYN/ACK must be linked to the first round of SYN/ACK. The above is a TCP datagram, which contains many control bits to identify the status of the connection. The most common ones are SYN, ACK, and FIN: SYN stands for synchronize, which is used when establishing a connection; ACK stands for acknowledge, which means "confirmation" that the message has been received; FIN stands for finish, which is used when disconnecting. Two other things to note are SEQ NO and ACK NO. SEQ NO stands for Sequence Number. Both the server and the client maintain their own SEQ NO, which indicates "how much data has been sent" in bytes; ACK NO stands for Acknowledge Number, which is used to reply and confirm that the data corresponding to the SEQ NO has been received. Speaking of these concepts separately, they are easy to understand, but be careful not to confuse the control bit ACK and ACK NO - ACK is a Boolean value used to identify the type of datagram, and ACK NO is a numerical value used to confirm that the data has been received. Based on the above knowledge, we can know that at the beginning of establishing a connection, the control bit SYN in the datagram should be set to 1, indicating "new connection"; and it should also contain the SEQ NO. At this time, the SEQ NO has a special name called ISN, which is Initial Sequence Number (note that ISN is only used to refer to this special SEQ NO, and there is no special ISN field). When the server receives the first SYN message, it must send an ACK response, but how can it confirm that the SEQ NO is the ISN of the newly connected connection, rather than an old connection that has been delayed? Therefore, it must confirm with the client. Precisely because the second step is a unique response of ACK and SYN "combined into one", when the client receives this message, it knows that it needs to respond to the SYN and verify the ACK (if you have read RFC793 carefully, you will know that there is a special paragraph mentioning: A three-way handshake is necessary because...) In the third step, the message returned by the client contains both an ACK corresponding to the SYN, indicating that the message from the server has been received, and sets SEQ NO = ISN + 1 to confirm the verification of the ISN. The server receives this message and confirms that it wants to establish a new connection. At this point, the connection is established. The big process looks like this, and it is not difficult to understand. However, if you think about it carefully, there are still many issues to consider. For example, the state issue. Since TCP is a network communication, there will be delays. So when "the information has been sent, but the confirmation has not been received", there should be a clear state, otherwise the state will be confused. In fact, TCP does this. There is a complete state machine behind it to ensure that the state is fully controllable at every moment and after each action occurs. Everything is under control, and there will be no "isolated points" or "dead ends". The above picture is a part of the TCP state transition diagram, covering the state of establishing a link. Interested readers can try it out on their own (as an aside, "simulating walking on the diagram yourself" may seem rustic, but it is actually quite commonly used in high-tech fields. When designing the Boeing 737, no one knew how to place the engine at first. Designer Joe Sutter drew models of the fuselage and engine on paper, cut out the engine models and placed them in various places on the aircraft, and finally found that hanging them under the wings was the most appropriate). I mentioned state diagrams and state transition functions several times in my previous articles on software design. Whether it is the user life cycle or the order flow process, this tool can be used to solve it. Unfortunately, I found that many designers still don’t know how to use it or are not used to it, which is a pity. Back to the process of TCP establishing a connection, we also need to pay attention to ISN. When establishing a connection, you must first determine the ISN, which is used to align the client and server counts. The usual textbooks say that ISN is randomly generated, which ensures uniqueness. The purpose of randomness is to maintain uniqueness, but don't think that "randomness will not be repeated". Simply "taking a random number" is very easy to collide. Therefore, the traditional "random" solution is to maintain a clock and a 32-bit counter. Every 4 milliseconds, the counter increments by 1. Because 2^32 milliseconds is almost 4 and a half hours (MSL, Max Segment Lifetime), which basically exceeds the possible transmission time of any data packet in the network, so this ISN can be considered to be unique. But this solution also has risks. Since such ISNs are continuous, malicious programs in the middle may be able to predict the generation pattern of ISNs and thus forge ISNs... In short, the generation of ISNs is an interesting design problem. I will not expand on it here. If you are interested, you can search for information and read it yourself. I have met many programmers in development who, when they need to avoid duplication, think of "generating random numbers", without considering that random numbers may collide. What's worse, when they encounter situations like ISN, they just set the initial value to 0, which is really heartbreaking (have you ever thought about why ISN can't be set to 0? Welcome to leave a message to discuss). After talking about the handshake to establish a connection, let's look at the handshake to terminate the connection. As we all know, TCP is "three handshakes, four handshakes" (although I don't agree with "times", but since it has become a convention, I will continue to use the common term here). So, why does it take four times to wave hands? More people know this answer than can explain the "three-way handshake". The usual answer is: TCP is a two-way communication protocol. To end the connection, both parties must send a termination signal to tell the other party that there will be no more data to send, and wait for the other party to confirm, so a total of 2+2=4 times are required. If you have seen the process of establishing a connection before, you may have this question: Since we can save a step when establishing a connection by combining the SYN and ACK returned by the server, can we also combine the SYN and FIN returned by the server when ending the connection to save a step? It is worth congratulating you for thinking of this question, because you are not satisfied with just "knowing the fact", but want to "know the reason". However, we also need to think that since the establishment and termination of TCP connections are defined by the same group of people, and since they can think of saving a step when establishing the connection, then they have no reason not to save when terminating the connection. There must be a reason why there is no "saving". Yes, there is a reason, and it is easy to understand because the scenarios of establishing and terminating a connection are different. Before establishing a connection, neither the client nor the server will send any data to each other, so when the server returns an ACK with a SYN, the client knows that this is the first data packet received from the server. When the connection is terminated, the client sends a FIN to the server, indicating "I will not continue to send data", and the server responds with an ACK, which is fine. However, at this time, the server may not have completed the operation of sending data to the client, and the server is still transmitting data to the client. If the server combines FIN and ACK, the following situation will occur: the client has not received all the data, but suddenly receives a message from the server saying "No more data, terminate the connection". Obviously, this situation should not occur, so ACK and FIN cannot be combined together, so terminating the connection requires four steps. Recently, I was chatting with interns about various problems encountered in development and the corresponding models. Everyone was fascinated. Afterwards, someone asked me: Why don’t we encounter such interesting problems in our work? I know that this is a typical question. In fact, the answer is also very typical: because you don’t go deep into the prototype behind the problem. If you understand the prototype behind it, you will have the ability to "deduce ignorance from the known" and the vision to "discover the known from the ignorant". When I talk about development with my friends, we have a common conclusion: TCP handshake and waving seem simple, but if we really ask today's developers to design the handshake and waving process, it is estimated that more than half of them cannot design a stable, reliable, and efficient handshake and waving process. In this way, the business-level communication in many business systems is extremely unreliable, and the protocol design is full of errors and omissions, which is also a helpless result. One more thing. I have met such a person in an interview. He did not graduate from a prestigious university, but he already had five years of work experience. In addition to being able to answer popular frameworks and hot issues fluently, he was also familiar with database theory, basic network knowledge, data structures, and algorithms. Facts have fully proved that not everyone will throw away their university knowledge after working, and facts have also proved that such candidates can indeed take on important tasks. |
<<: Huawei and Chongqing jointly build cloud-based intelligent industries at the first AI Expo
>>: How do operators judge a reliable IoT project? The thoughts of operators’ IoT personnel
In 2020, as the first year of 5G, 5G network cons...
We all want devices to communicate with each othe...
Last night, a letter from a Japanese person was s...
[[284447]] 1. What is load balancing? What is loa...
Apple's iCloud Private Relay service offers p...
In 2016, Tmall’s single-day sales record was 120....
In the near future, the number of IoT devices wil...
The entire industry should immediately conduct se...
ZJI has released promotional offers for December ...
The authors describe the challenges of capacity r...
Our company has always had the need to connect al...
As networks move toward automation and intelligen...
Readers who followed the blog in the early days m...
Less than four years after the issuance of 4G lic...
Network as a Service (NaaS) refers to the ability...