From TCP to Socket, a thorough understanding of network programming

From TCP to Socket, a thorough understanding of network programming

For students who are engaged in program development, whether it is Web front-end development, Web back-end development, or search engines and big data, almost all development fields involve network programming. For example, when we develop Web servers, in addition to the Web protocol itself relying on the network, we usually need to connect to the database, and the database connection is usually connected to the database server or database cluster through the network. If the load is too high, a cache cluster must be set up.

[[257360]]

We basically learned network programming and network protocols when we were in school. But the specific relationship between the two may be a little confusing. Here we first focus on two concepts, one is network programming, and the other is protocol.

We know that network protocol is a layered protocol family, that is, it is composed of a group of protocols, each responsible for its own function from bottom to top. So what is a protocol? The literal meaning of protocol is to discuss and negotiate together. In simple terms, it is actually a regulation for multiple parties to communicate. And network protocol is actually a regulation for multiple computing nodes to interact and communicate in the network. If compared with our daily life, the protocol can be understood as a language, such as Mandarin Chinese. If two people communicate without talking, they can understand each other's intentions. For example, if one person speaks Sichuan dialect and the other speaks Zhejiang dialect, it is estimated that communication is almost impossible. The same is true for network protocols. By standardizing the data format, computers can clearly understand each other's intentions.

The following article introduces network programming, which is also called socket programming. Socket is usually translated as "socket", but the original meaning should be translated as "interface". That is, the API interface provided by the operating system to developers for network development. This interface can usually support multiple protocols, including TCP, UDP, and IP, etc., by adjusting parameters. The following article introduces socket programming and protocols in detail.

Network Programming

To facilitate understanding, this article starts with the specific content, that is, to introduce what network programming is through an example.

This article will take the TCP protocol as an example to introduce the relationship between network programming and protocols. For simplicity and ease of understanding, this article uses Python as an example. If you don't know the Python programming language, it doesn't matter much. The following code is easy to understand. We know that in network communication, whether it is BS architecture or CS architecture, it is usually divided into server and client, but the browser in BS architecture is the client. Therefore, the example in this article also contains code for the server and client. The code function is very simple, which is to realize the sending of strings between the client and the server.

Figure 1 Client-server communication model

This code list is the server code. The purpose of this code is to establish a listener on a port on the server and wait for the client to establish a connection. After the connection is established, it waits for the client to send data and sends the data back to the client.

  1. #!/usr/bin/env python3
  2. #-*- coding:utf-8 -*-
  3. from socket import *
  4. from time import ctime
  5. host = ''  
  6. port = 12345  
  7. buffsize = 2048  
  8. ADDR = (host,port)
  9. # Create a socket based on TCP protocol
  10. tctime = socket (AF_INET,SOCK_STREAM)
  11. tctime.bind(ADDR)
  12. # Listen on the specified address and port
  13. tctime.listen(3)
  14. while True:
  15. print('Wait for connection...')
  16. tctimeClient, addr = tctime .accept()
  17. print("Connection from :",addr)
  18. while True:
  19. data = tctimeClient .recv(buffsize).decode()
  20. if not data:
  21. break
  22. tctimeClient.send(('[%s] %s' % (ctime(),data)).encode())
  23. tctimeClient.close()
  24. tctimeClient.close()

Reading the server code, we can see that it mainly includes socket, bind, listen, accept, recv and send. Among them, listen and accept are worth noting, which are used to listen to the port and accept the client's connection request respectively.

The following code listing is the client implementation. The special thing here is that there is a connect function, which is used to establish a connection with the server.

  1. #!/usr/bin/env python3
  2. #-*- coding:utf-8 -*-
  3. from socket import *
  4. HOST = 'localhost'  
  5. PORT = 12345  
  6. BUFFSIZE = 2048  
  7. ADDR = (HOST,PORT)
  8. tctimeClient = socket (AF_INET,SOCK_STREAM)
  9. tctimeClient.connect(ADDR)
  10. while True:
  11. data = input (" > ")
  12. if not data:
  13. break
  14. tctimeClient.send(data.encode())
  15. data = tctimeClient .recv(BUFFSIZE).decode()
  16. if not data:
  17. break
  18. print(data)
  19. tctimeClient.close()

From the above sample code, we can see that the server is usually passive, while the client is more active. The server program establishes a listener for a certain port and waits for the client's connection request. The client sends a connection request to the server. If nothing unexpected happens, the connection is successfully established. At this time, the client and the server can send data to each other. Of course, accidents are common in actual production environments, so various accidents need to be handled from the protocol and interface level. This article will introduce them in detail in the protocol section.

In addition, this article implements a basic client-server communication program, which is almost no longer used in actual production. In order to improve the efficiency of data transmission and processing in actual production, asynchronous mode is usually adopted. These contents are beyond the scope of this article and will be gradually introduced in subsequent articles.

TCP protocol detailed explanation

As mentioned earlier, network protocols are the language used by different computers in a network to communicate with each other. In order to achieve interaction, this language needs to have a certain format. This article takes the TCP protocol as an example to introduce it.

The TCP protocol is a reliable transmission protocol. Its reliability is reflected in two aspects. On the one hand, it ensures that the data packets can arrive in the order in which they are sent, and on the other hand, it ensures a certain degree of correctness of the data packets (the reason why it is a certain degree of correctness will be explained in detail later). The implementation of its reliability is based on two technologies. One is that it has a CRC checksum, so that if some data in the data packet is wrong, it can be found through the checksum; the other is that each data packet has a sequence number, so that the order of the data packets can be guaranteed, and if there is a misplaced data packet, it can be requested to be resent.

Now that we are talking about the format, let's first look at the data format of the TCP data packet. The following figure shows the format of a TCP data packet, including the source port, destination port, sequence number, and identification bit. There is a lot of content, and it may be a bit dizzying. But from a broad perspective, this data packet actually only contains two parts, one is the header, and the other is the specific data to be transmitted. In the control logic of the TCP protocol, the header plays the most critical role. It is the basis of various features in the TCP protocol, such as establishing a connection, disconnecting a connection, retransmission, and error checking.

Figure 2 TCP packet format

The meanings of other information in the packet header are relatively clear. This article only introduces the meanings of several flag bits (URG, ACK, PSH, RST, SYN, and FIN). The specific meanings are as follows:

  • ACK: Confirm that the sequence number is valid.
  • RST: Reset the connection
  • SYN: initiate a new connection
  • FIN: Release a connection

Connection establishment

TCP needs to establish a connection before transmitting data. The connection here is not a physical connection. The physical connection has been established based on the underlying protocol. In addition, TCP also assumes that the underlying connection has been successful. The TCP connection is actually a virtual, logical connection. To put it simply and roughly, the client and the server record the sequence numbers of the data packets they receive and set themselves to a certain state. In the TCP protocol, the establishment of a connection is usually called a three-way handshake. From the literal concept, it can be seen that the establishment of a connection requires a three-way confirmation process.

Figure 3 Three-way handshake to establish a connection

The process of the TCP protocol three-way handshake is shown in the figure. In the initial state, both the client and the server are in a closed state. The main process is divided into three steps:

  • The client sends a pre-connection packet: The TCP connection is actively initiated by the client. The client sends a data packet (message) to the server. It should be noted that the SYN flag in the data packet is 1. As we have mentioned before, if SYN is 1, it means that it is a data packet for establishing a connection. At the same time, the data packet contains a request sequence number, which is also the basis for establishing a connection.
  • The server replies with connection confirmation: When the server confirms that a connection can be established (the server may not be able to establish a connection because the number of sockets in the system is limited), it will send a response packet to the client. The ACK flag will be set to 1 in the response packet, indicating that it is a server response packet. At the same time, the request sequence number and response sequence number values ​​will be set in the response packet, as shown in Figure 3.
  • The client replies with a connection confirmation: ***, and the client sends a connection confirmation packet again to tell the server that the connection is successfully established.

As can be seen from the above process, the establishment of a connection requires multiple interactions, which is what we call the establishment of a connection is a high-cost operation. In the actual production environment, in order to deal with this problem, the frequency of connection establishment will be reduced. The usual practice is to establish a connection pool and directly obtain a connection from the connection pool when transmitting data, rather than creating a new connection.

Some people may think that the process of establishing a connection can be optimized, such as canceling the client's first confirmation, and think that this is useless. It does not have much effect in normal situations. It is mainly used to deal with abnormal situations. Because the network topology is very complex, especially in the wide area network, there are countless network nodes, so various abnormal situations will occur. Therefore, the TCP protocol must ensure reliability in abnormal situations when it is designed.

Let's take an example here, which is the case of a connection request timeout. Suppose the client sends a connection request to the server. For various reasons, the request has not reached the server, so the server has not replied with a connection confirmation message. The client connection times out, so the client resends a connection request to the server. This time it is smoother and arrives quickly, and the connection is successfully established. After that, the previous data packet finally reaches the server after a long journey, and the server also sends a reply data packet to the client. The server believes that the connection is successfully established and will maintain the connection. However, the client level believes that the connection has timed out, so it will never close the connection. This will cause residual resources on the server, resulting in a waste of server resources, and over time, there may be no new connection resources available on the server.

Another thing to note is that both the client and server sockets have corresponding states, and the states change with the different stages of the connection. The initial state is CLOSE, and the final state is ESTABLISHED after the connection is successfully established. The specific change process is shown in Figure 3. The state changes will be described in detail later in this article.

Transferring Data

After the connection is established, the client and server can start data transmission. We know that TCP is a reliable transmission, so how is the reliability of transmission guaranteed? It is mainly through the checksum, request sequence number and response sequence number in the packet header (refer to Figure 2).

The reliability of TCP data content is guaranteed by checksum. When sending data, TCP calculates the checksum of the entire data packet and stores it in the checksum field of the packet header. The receiver will calculate according to the rules to confirm whether the received data is correct. The process of sending and calculating the checksum is as follows:

  • Divide the pseudo header, TCP header, and TCP data into 16-bit words, and set the checksum field in the TCP header to 0
  • Add all 16-bit numbers using 2's complement addition
  • Invert the calculation result and fill it into the checksum field of the TCP packet header

The receiver adds all the original codes together and superimposes the high bits. If all are 1, it means the data is correct, otherwise it means the data is wrong.

The reliability of TCP data packet sequence is guaranteed by request sequence number and response sequence number. Each request in data transmission will have a request sequence number, and the receiver will send a response sequence number after receiving the data, so that the sender can know whether the data is received correctly, and the receiver can also know whether the data is out of order, thus ensuring the order of data packets.

Disconnect

TCP closes a connection in four steps, called four waves. The closing of a connection does not necessarily have to be initiated by the client, the server can also initiate the closing of the connection. The process of closing a connection is as follows:

  • The initiator sends a packet with the FIN bit set to request the closure of the connection from the sender to the receiver.
  • The receiver sends a response with the ACK flag set to 1 to confirm the closure. At this point, the connection from the initiator to the receiver is complete, which means that the sender can no longer send data to the receiver, but the receiver can still send data to the sender.
  • After the data transmission is completed, the receiver sends a packet with FIN 1 to the initiator, indicating a request to disconnect.
  • The initiator replies with an ACK packet to confirm that the shutdown is successful

Figure 4 Schematic diagram of closing connection process

TCP is full-duplex communication, so when closing a connection, it needs to be closed in both directions. First, the initiator closes the connection on its own end, and then the receiver, after receiving the initiator's close request, not only replies to the close response, but also ensures that a request to close the connection is initiated after the data transmission is completed, ensuring that both directions are closed at the same time.

So far, this article has introduced the main contents of network programming based on TCP protocol. Of course, this is just an entry-level introduction. If you want to truly understand TCP protocol and network programming, you still need to learn a lot of content. This account will introduce it to you in the future.

<<:  Someone finally explains the true value of 5G

>>:  Challenges of Deploying Wireless Mesh Networks

Blog    

Recommend

5G is here, and you can’t hide from it

5G has gradually entered our lives with the resea...

Spring is coming, the cancellation of data roaming charges? Beware of scams

Mr. Dongguo and the wolf, Lu Dongbin and the dog,...

Why is your router's ability to penetrate walls poor?

1. Is it my fault that the signal is weak? Whethe...

5 predictions for 5G adoption in 2021 and beyond

If we roll up some of the predictions about the f...

Tribute to hackers | Review of the exploration of memory virtualization

[[415610]] Cloud and virtualization Cloud computi...

White Box in the Enterprise: Why Isn't It a Popularity?

[[411229]] If you’re in an enterprise CIO, CFO, o...

Tragicservers: $7/year OpenVZ-128MB/10GB/500GB/Los Angeles

It has been more than a year since I shared the n...

6G Proposal: Entering a new stage and meeting new challenges

At the 44th meeting of the ITU-R WP5D, the ITU co...

Trip.com QUIC high availability and performance improvements

First, the QUIC multi-process deployment architec...

Hostodo: $19.99/year KVM-1GB/12GB/4TB/Las Vegas

Hostodo has released several promotional packages...