Today we continue to introduce the content about TCP abnormalities. This article follows the previous article. The previous article analyzed various abnormalities in the connection process. This article focuses on various abnormalities in the data transmission process and the TCP connection after the abnormality occurs. In order to make it easier for everyone to understand this article, we copy the first half of the previous article here. This part mainly introduces the content of the protocol. The following figure is a common architecture in network communication, which is also called CS architecture. The program consists of two parts, namely the client and the server. Of course, the actual environment is much more complicated. There may be many different types and numbers of devices between the client and the server. These devices will increase the complexity of network communication. Naturally, it will also increase the complexity of program development fault tolerance. Figure 1 Basic architecture 1. Basic process of TCP Before analyzing the abnormal situation, let's recall the basic logic of the TCP protocol. Before the client and server can send and receive data, a connection must be established. The establishment of the connection is also completed by sending and receiving data packets at the protocol level, but at the user level, the client calls a connect function. The connection process is commonly known as the "three-way handshake", and the specific process is shown in Figure 2. Figure 2 TCP three-way handshake process Disconnecting a TCP connection is also complicated, and requires the so-called "four-wave" process. The reason is that TCP is a duplex communication, and the connection needs to be disconnected from both the client and the server. Figure 3 TCP's four waves Another important content is the state transition of the TCP protocol. Only by understanding this content can we clearly understand the content of the data packet in various abnormal situations. Figure 4 TCP state transition diagram This article is just a brief review of the basic process of TCP. For detailed content, please refer to the previous article of this account "From TCP to Socket, a thorough understanding of network programming" 2. Analysis of abnormal situations The analysis in this article assumes that the connection has been established and data is being sent and received. In this case, various exceptions may occur, such as server downtime, process crash, or process being killed. Below we will introduce the manifestations of the above concentrated situations in TCP communication. 1. Service process crashes Service process crash is probably the most common situation we encounter in our daily production environment. So how does the client software react in this case? Can the client perceive it? We write the client and server programs separately. The client keeps sending data and the server receives data. The simulation of the exception is very simple. We can create a pointer access exception on the server. At this time, the server program will crash. Then we observe the performance of the client. First, the results are shown in the figure below. It can be seen that the client has been reset. Combining the data packet content captured by Wireshark at this time, we can see that it is a RST packet. Let's recall the situation in which the server sends a RST message. This scenario is similar to the situation in which the server does not listen. Because the server program crashed, the socket data structure in the operating system has been released. Therefore, when the protocol layer receives a data packet, it cannot find the corresponding socket for processing, so it sends a RST message. 2. Manually kill the server application This is also a common operation online. When a module is online, the ops staff will always kill the old process first and then start the new process. So what will happen to the TCP connection during this process? Will it be RST like the previous case? Again, let's take a look at the results first. The following is the situation on the client. From the error code above, it can be seen that the pipe is broken, which means the connection is interrupted. Let's take a look at the packet capture results through Wireshark. We can see that the server sends a FIN message, which means that the server has initiated a request to close. The next message is the client's confirmation of the request. Therefore, from the above client error code and message situation, we can know that the TCP protocol is able to sense when the process is killed and sends a FIN message. Let's think about it further. Why does the kill process have FIN? How is this different from the previous crash? In fact, the kill process sends SIGKILL or SIGTERM to the kernel through the shell. After receiving the signal, the kernel will perform the corresponding cleanup work, so you can see that the server sent a FIN message. 3. The host where the Server process is located is shut down The situation of the host shutting down (here refers to manual shutdown) is similar to the situation of the process being killed. This is because when the system is shut down, the init process will send a SIGTERM signal to all processes, wait for a period of time (5 to 20 seconds), and then send a SIGKILL signal to all processes that are still running. When the server process dies, all file descriptors will be closed. The impact is the same as killing the server above. 4. The host where the Server process is located is down This is another common situation online. Even though downtime is a low-probability event, it is common for one or two of the thousands of servers online to crash. There are actually two types of crashes here: kernel panic and power outage. Kernel panic does not kill the above process in advance like shutdown, but is sudden. At this time, our client is ready to send a request to the server, which is written to the kernel by write and sent as a message by TCP, but because the host has crashed, the client cannot receive an ACK. So the client TCP continues to retransmit segments, trying to receive an ACK from the server, but the server still cannot respond. After retransmitting several times, it stops for about a few minutes and then returns an ETIMEDOUT error. In this case, if we call the synchronous sending interface, it will be blocked here if the sending buffer is slow, causing the program to be blocked. This time is really long, and for some applications, this long period of lag is unacceptable. Therefore, a method is needed to handle this situation, which can be set through the SO_SNDTIMEO flag in the socket interface. However, there are pros and cons. If this parameter is set, the data transmission may time out, and then duplicate data may be sent to the server. At this time, the server needs to do deduplication processing. 5. The host where the server process is located crashes and then restarts Before the client sends a request, the server host goes through a crash-restart process. When the client TCP sends the segment to the server host, the server host's TCP loses all the connection information before the crash, that is, TCP receives a message that does not exist on the connection (that is, the socket data structure cannot be found as we mentioned earlier), so it responds with a RST segment. So far, the various abnormal situations in the TCP protocol have been introduced. After understanding these contents in detail, it will be of great help to analyze and solve subsequent online problems. Of course, there may be other abnormal situations that are not introduced in this article. You are also welcome to leave a message for communication. |
>>: Easy-to-understand network protocols (TCP/IP overview)
How fast is 5G? The upgrade from 4G to 5G is comp...
Introduction To deliver a five-star digital exper...
[[356210]] This article is reprinted from WeChat ...
Today, the use and growth of mobile technology ha...
Recently, Wi-Fi Alliance launched new features fo...
CUBECLOUD is currently holding an anniversary eve...
The theme of this issue of 5G Encyclopedia is: Ho...
An IEEE survey of 350 chief technology officers a...
Recently, the Ministry of Industry and Informatio...
introduction In recent years, with the developmen...
There is only half a month left in 2023, and vari...
With the rise of emerging technologies such as cl...
Do I need to change my phone or SIM card in the 5...
Bandwagonhost has also released a discount code f...
During the Dragon Boat Festival holiday, it is ne...