For students who are engaged in program development, whether it is Web front-end development, Web back-end development, or search engines and big data, almost all development fields involve network programming. For example, when we develop Web servers, in addition to the Web protocol itself relying on the network, we usually need to connect to the database, and the database connection is usually connected to the database server or database cluster through the network. If the load is too high, a cache cluster must be set up.
We basically learned network programming and network protocols when we were in school. But the specific relationship between the two may be a little confusing. Here we first focus on two concepts, one is network programming, and the other is protocol. We know that network protocol is a layered protocol family, that is, it is composed of a group of protocols, each responsible for its own function from bottom to top. So what is a protocol? The literal meaning of protocol is to discuss and negotiate together. In simple terms, it is actually a regulation for multiple parties to communicate. And network protocol is actually a regulation for multiple computing nodes to interact and communicate in the network. If compared with our daily life, the protocol can be understood as a language, such as Mandarin Chinese. If two people communicate without talking, they can understand each other's intentions. For example, if one person speaks Sichuan dialect and the other speaks Zhejiang dialect, it is estimated that communication is almost impossible. The same is true for network protocols. By standardizing the data format, computers can clearly understand each other's intentions. The following article introduces network programming, which is also called socket programming. Socket is usually translated as "socket", but the original meaning should be translated as "interface". That is, the API interface provided by the operating system to developers for network development. This interface can usually support multiple protocols, including TCP, UDP, and IP, etc., by adjusting parameters. The following article introduces socket programming and protocols in detail. Network Programming To facilitate understanding, this article starts with the specific content, that is, to introduce what network programming is through an example. This article will take the TCP protocol as an example to introduce the relationship between network programming and protocols. For simplicity and ease of understanding, this article uses Python as an example. If you don't know the Python programming language, it doesn't matter much. The following code is easy to understand. We know that in network communication, whether it is BS architecture or CS architecture, it is usually divided into server and client, but the browser in BS architecture is the client. Therefore, the example in this article also contains code for the server and client. The code function is very simple, which is to realize the sending of strings between the client and the server. Figure 1 Client-server communication model This code list is the server code. The purpose of this code is to establish a listener on a port on the server and wait for the client to establish a connection. After the connection is established, it waits for the client to send data and sends the data back to the client.
Reading the server code, we can see that it mainly includes socket, bind, listen, accept, recv and send. Among them, listen and accept are worth noting, which are used to listen to the port and accept the client's connection request respectively. The following code listing is the client implementation. The special thing here is that there is a connect function, which is used to establish a connection with the server.
From the above sample code, we can see that the server is usually passive, while the client is more active. The server program establishes a listener for a certain port and waits for the client's connection request. The client sends a connection request to the server. If nothing unexpected happens, the connection is successfully established. At this time, the client and the server can send data to each other. Of course, accidents are common in actual production environments, so various accidents need to be handled from the protocol and interface level. This article will introduce them in detail in the protocol section. In addition, this article implements a basic client-server communication program, which is almost no longer used in actual production. In order to improve the efficiency of data transmission and processing in actual production, asynchronous mode is usually adopted. These contents are beyond the scope of this article and will be gradually introduced in subsequent articles. TCP protocol detailed explanation As mentioned earlier, network protocols are the language used by different computers in a network to communicate with each other. In order to achieve interaction, this language needs to have a certain format. This article takes the TCP protocol as an example to introduce it. The TCP protocol is a reliable transmission protocol. Its reliability is reflected in two aspects. On the one hand, it ensures that the data packets can arrive in the order in which they are sent, and on the other hand, it ensures a certain degree of correctness of the data packets (the reason why it is a certain degree of correctness will be explained in detail later). The implementation of its reliability is based on two technologies. One is that it has a CRC checksum, so that if some data in the data packet is wrong, it can be found through the checksum; the other is that each data packet has a sequence number, so that the order of the data packets can be guaranteed, and if there is a misplaced data packet, it can be requested to be resent. Now that we are talking about the format, let's first look at the data format of the TCP data packet. The following figure shows the format of a TCP data packet, including the source port, destination port, sequence number, and identification bit. There is a lot of content, and it may be a bit dizzying. But from a broad perspective, this data packet actually only contains two parts, one is the header, and the other is the specific data to be transmitted. In the control logic of the TCP protocol, the header plays the most critical role. It is the basis of various features in the TCP protocol, such as establishing a connection, disconnecting a connection, retransmission, and error checking. Figure 2 TCP packet format The meanings of other information in the packet header are relatively clear. This article only introduces the meanings of several flag bits (URG, ACK, PSH, RST, SYN, and FIN). The specific meanings are as follows:
Connection establishment TCP needs to establish a connection before transmitting data. The connection here is not a physical connection. The physical connection has been established based on the underlying protocol. In addition, TCP also assumes that the underlying connection has been successful. The TCP connection is actually a virtual, logical connection. To put it simply and roughly, the client and the server record the sequence numbers of the data packets they receive and set themselves to a certain state. In the TCP protocol, the establishment of a connection is usually called a three-way handshake. From the literal concept, it can be seen that the establishment of a connection requires a three-way confirmation process. Figure 3 Three-way handshake to establish a connection The process of the TCP protocol three-way handshake is shown in the figure. In the initial state, both the client and the server are in a closed state. The main process is divided into three steps:
As can be seen from the above process, the establishment of a connection requires multiple interactions, which is what we call the establishment of a connection is a high-cost operation. In the actual production environment, in order to deal with this problem, the frequency of connection establishment will be reduced. The usual practice is to establish a connection pool and directly obtain a connection from the connection pool when transmitting data, rather than creating a new connection. Some people may think that the process of establishing a connection can be optimized, such as canceling the client's first confirmation, and think that this is useless. It does not have much effect in normal situations. It is mainly used to deal with abnormal situations. Because the network topology is very complex, especially in the wide area network, there are countless network nodes, so various abnormal situations will occur. Therefore, the TCP protocol must ensure reliability in abnormal situations when it is designed. Let's take an example here, which is the case of a connection request timeout. Suppose the client sends a connection request to the server. For various reasons, the request has not reached the server, so the server has not replied with a connection confirmation message. The client connection times out, so the client resends a connection request to the server. This time it is smoother and arrives quickly, and the connection is successfully established. After that, the previous data packet finally reaches the server after a long journey, and the server also sends a reply data packet to the client. The server believes that the connection is successfully established and will maintain the connection. However, the client level believes that the connection has timed out, so it will never close the connection. This will cause residual resources on the server, resulting in a waste of server resources, and over time, there may be no new connection resources available on the server. Another thing to note is that both the client and server sockets have corresponding states, and the states change with the different stages of the connection. The initial state is CLOSE, and the final state is ESTABLISHED after the connection is successfully established. The specific change process is shown in Figure 3. The state changes will be described in detail later in this article. Transferring Data After the connection is established, the client and server can start data transmission. We know that TCP is a reliable transmission, so how is the reliability of transmission guaranteed? It is mainly through the checksum, request sequence number and response sequence number in the packet header (refer to Figure 2). The reliability of TCP data content is guaranteed by checksum. When sending data, TCP calculates the checksum of the entire data packet and stores it in the checksum field of the packet header. The receiver will calculate according to the rules to confirm whether the received data is correct. The process of sending and calculating the checksum is as follows:
The receiver adds all the original codes together and superimposes the high bits. If all are 1, it means the data is correct, otherwise it means the data is wrong. The reliability of TCP data packet sequence is guaranteed by request sequence number and response sequence number. Each request in data transmission will have a request sequence number, and the receiver will send a response sequence number after receiving the data, so that the sender can know whether the data is received correctly, and the receiver can also know whether the data is out of order, thus ensuring the order of data packets. Disconnect TCP closes a connection in four steps, called four waves. The closing of a connection does not necessarily have to be initiated by the client, the server can also initiate the closing of the connection. The process of closing a connection is as follows:
Figure 4 Schematic diagram of closing connection process TCP is full-duplex communication, so when closing a connection, it needs to be closed in both directions. First, the initiator closes the connection on its own end, and then the receiver, after receiving the initiator's close request, not only replies to the close response, but also ensures that a request to close the connection is initiated after the data transmission is completed, ensuring that both directions are closed at the same time. So far, this article has introduced the main contents of network programming based on TCP protocol. Of course, this is just an entry-level introduction. If you want to truly understand TCP protocol and network programming, you still need to learn a lot of content. This account will introduce it to you in the future. |
<<: Someone finally explains the true value of 5G
>>: Challenges of Deploying Wireless Mesh Networks
5G has gradually entered our lives with the resea...
Mr. Dongguo and the wolf, Lu Dongbin and the dog,...
1. Is it my fault that the signal is weak? Whethe...
If we roll up some of the predictions about the f...
On April 5, while China was going crazy for the &...
[[415610]] Cloud and virtualization Cloud computi...
[[442456]] 0. Introduction I believe everyone is ...
[[411229]] If you’re in an enterprise CIO, CFO, o...
Servmix is a foreign hosting company founded in...
Commercial Wi-Fi is a wireless Internet service p...
It has been more than a year since I shared the n...
At the 44th meeting of the ITU-R WP5D, the ITU co...
First, the QUIC multi-process deployment architec...
[51CTO.com original article] Although the COVID-1...
Hostodo has released several promotional packages...