TCP state transition and production problem practice

TCP state transition and production problem practice

The previous article introduced the main processes of the TCP protocol, including establishing a connection, transmitting data, and disconnecting. If you read the attached figure carefully, you should be able to see that the state of the socket is constantly changing in each process, and different states indicate the stage of the socket.

Figure 1 shows a complete state transition diagram of TCP, which includes all the states of the socket and the triggering conditions for state transitions. Some people may ask, what is the use of knowing these states? We don't use them in our daily programming.

Figure 1 TCP state transition diagram

To illustrate the above problem, we explain it from three perspectives: the meaning of various statuses, how to query the status at the system level, and its application in actual production.

1. Meaning of various statuses

Before answering the question, let's first understand the meaning of each status in detail.

  • CLOSED: This is the initial state of the socket, indicating that the TCP connection is in a newly created "unopened" state or has been "closed".
  • LISTEN: This is the only status on the server side, indicating that a SOCKET on the server side is in the listening state and can accept client connections.
  • SYN_RCVD: Indicates that the server has received a SYN message from the client requesting a connection. Under normal circumstances, we may not be able to observe this state because this state is an intermediate state of the server-side SOCKET in the three-way handshake session when establishing a TCP connection, and it is very short.
  • SYN_SENT: This state corresponds to the SYN_RCVD state. When the client SOCKET executes connect() to connect, it first sends a SYN message, then immediately enters the SYN_SENT state and waits for the server to send the second message in the three-way handshake. The SYN_SENT state indicates that the client has sent a SYN message.
  • ESTABLISHED: Indicates that the TCP connection has been successfully established.
  • FIN_WAIT_1: This state needs to be explained. In fact, the real meaning of both FIN_WAIT_1 and FIN_WAIT_2 states is to wait for the other party's FIN message. The difference between these two states is: FIN_WAIT_1 state is actually when the SOCKET is in the ESTABLISHED state, it wants to actively close the connection and sends a FIN message to the other party. At this time, the SOCKET enters the FIN_WAIT_1 state. When the other party responds with an ACK message, it enters the FIN_WAIT_2 state. Of course, in actual normal circumstances, no matter what the other party is in, it should immediately respond to the ACK message, so the FIN_WAIT_1 state is generally difficult to see, while the FIN_WAIT_2 state can sometimes still be seen with netstat.
  • FIN_WAIT_2: The origin of this state has been explained above. In fact, a SOCKET in the FIN_WAIT_2 state indicates a semi-connection, that is, one party calls close() to actively request to close the connection. Note: FIN_WAIT_2 has no timeout (unlike the TIME_WAIT state). In this state, if the other party does not close (does not cooperate to complete the 4-wave process), then the FIN_WAIT_2 state will remain until the system is restarted. More and more FIN_WAIT_2 states will cause kernel crashes.
  • TIME_WAIT: Indicates that the FIN message from the other party has been received and the ACK message has been sent. A TCP connection in the TIME_WAIT state will wait for 2*MSL ( Max Segment Lifetime, which refers to the maximum lifetime of a TCP message on the Internet). In Linux, you can see this value of the local machine through cat /proc/sys/net/ipv4/tcp_fin_timeout, and then you can return to the CLOSED available state.
  • CLOSING: This state should be rare in actual situations and is a relatively rare exception state. Under normal circumstances, when one party sends a FIN message, it should receive (or receive at the same time) the other party's ACK message first, and then receive the other party's FIN message. However, the CLOSING state means that after one party sends a FIN message, it does not receive the other party's ACK message, but instead receives the other party's FIN message. Under what circumstances will this happen? That is, when both parties close() a SOCKET almost at the same time, both parties send FIN messages at the same time, and then the CLOSING state will appear, indicating that both parties are closing the SOCKET connection.
  • CLOSE_WAIT: Indicates waiting to be closed. How to understand it? When the other party closes() a SOCKET and sends a FIN message to you, your system will undoubtedly respond with an ACK message to the other party, and the TCP connection will enter the CLOSE_WAIT state. Next, you need to check whether you have data to send to the other party. If not, you can close() this SOCKET and send a FIN message to the other party, that is, close the connection from yourself to the other party. If there is data, it depends on the program's strategy to continue sending or discard it. Simply put, when you are in the CLOSE_WAIT state, what needs to be done is to wait for you to close the connection.
  • LAST_ACK: When the passive closing party sends a FIN message and waits for the other party's ACK message, it is in the LAST_ACK state. When the other party's ACK message is received, it can enter the CLOSED available state.

2. Status Monitoring Method

As mentioned above, you can use the netstat command to view the status of TCP connections. Figure 2 is a simple example of executing the command without any parameters.

Figure 2 netstat execution results

As can be seen from the figure above, the status of each TCP connection and UDP and detailed IP address information can be seen through netstat. This command has many parameters, and different parameters can get what we want. Let's take a few specific examples.

1. Display all port information

You can use the -a parameter to list all port information, and you can also use the -t parameter to list only TCP protocol ports, or the -u parameter to list only UDP protocol port information.

  1. [root@itworld123~]# netstat -a # List all ports
  2. [root@itworld123~]# netstat -at # List all TCP ports
  3. [root@itworld123~]# netstat -au # List all UDP ports

2. Display all listening sockets

You can use the -l parameter to list all sockets in listening state. Of course, you can also use the -t or -u parameter to get the desired information. The following is to get a list of TCP sockets in listening state:

  1. root@itworld123:~# netstat -lu

Figure 3 Monitoring status list

3. Check service status

You can view the status of specific services such as listening and sockets. For example, the following command is used to view the status of the ssh service:

  1. root@itworld123:~#netstat -antp | grep ssh

Figure 3 SSH status results

4. Others

Of course, you can also use shell scripts to implement complex queries, such as the following one to count the number of ESTABLISHED states.

  1. netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

The netstat command is very powerful. Due to space constraints, this article can only provide some ideas for reference. For more functions, you can use man. I will not explain them in detail here.

3. The significance of actual production environment

After a lot of talk, let's get back to the point. What is the use of understanding these states? We know that the Linux operating system has a limit on the total number of file handles, and sockets are also file handles, so they are also limited. Understanding the status of the socket will help us understand whether the server has hidden dangers or performance bottlenecks.

Some students may still not understand this, so let's take a simple example. Assuming a server has a maximum of 60,000 handles, if a large number of TIME_WAITs appear on the server due to a certain business scenario, these sockets cannot be released immediately, that is, they cannot be reused immediately, but still occupy the quota of 60,000 handles. As time goes by, all handles may be exhausted, resulting in the problem that the server cannot respond to new connection requests.

In order to help you better understand the significance of these states in actual production, we will give a few examples of problems encountered in actual production.

1. A large number of TIME_WAIT on the server

(1) Description of the phenomenon

A certain object storage service found a large number of TIME_WAIT in the monitoring system. It was confirmed that the server was a newly added server. After repeated confirmation, other servers in the same cluster with the same functions were working normally and there were no large number of TIME_WAIT.

(2) Problem Analysis

According to the protocol, we know that the active closing party will be in this state, and the TCP connection in the TIME_WAIT state will wait for 2*MSL. Therefore, we check the system configuration cat /proc/sys/net/ipv4/tcp_fin_timeout and find that it is the default value. Therefore, it is determined that the waiting time is too long, resulting in the socket being unable to be used.

(3) Problem Solving

Solve it by adjusting the kernel parameters, open the file /etc/sysctl.conf, edit the file, and add the following content:

  1. net.ipv4.tcp_syncookies = 1  
  2. net.ipv4.tcp_tw_reuse = 1  
  3. net.ipv4.tcp_tw_recycle = 1  
  4. net.ipv4.tcp_fin_timeout = 30  

Then execute /sbin/sysctl -p to make the parameters take effect.

The meanings of the above contents are as follows:

  • net.ipv4.tcp_syncookies = 1 means turning on SYN Cookies. When the SYN waiting queue overflows, cookies are enabled to handle it, which can prevent a small amount of SYN attacks. The default value is 0, which means turning it off;
  • net.ipv4.tcp_tw_reuse = 1 means enabling reuse. Allowing TIME-WAIT sockets to be reused for new TCP connections. The default value is 0, which means closed.
  • net.ipv4.tcp_tw_recycle = 1 means enabling fast recycling of TIME-WAIT sockets in TCP connections. The default value is 0, which means it is disabled.
  • net.ipv4.tcp_fin_timeout modifies the system default TIMEOUT time

2. A large number of ESTABLISHED on the server side

(1) Problem description

A large number of ESTABLISHED connections appear on a Tomcat server.

(2) Problem Analysis

Based on the protocol state transition, the initial inference is that there is a problem when the Tomcat server recycles the session, which is generally related to the server's Timeout setting.

View the tomcat configuration file server.xml

  1. < Connector   port = "8080"   protocol = "HTTP/1.1"  
  2. connectionTimeout = "20000"  
  3. redirectPort = "8443"   URIEncoding = "UTF-8"   />  
  4. *****

Let's focus on connectionTimeout. This configuration causes a socket connection to be established. If no FIN is received from the client and no data is received, the connection must wait for 10 seconds before it is released. Due to the large number of concurrent connections on the server and the long timeout period, the connection release is seriously delayed, resulting in a large number of ESTABLISHED connections.

(3) Problem Solving

After analyzing the above issues, we made the following targeted modifications.

  1. connectionTimeout = "20000" changed to connectionTimeout = "100"  
  2. acceptCount = "100" to acceptCount = "5000"  

The problem was solved after modification.

There are many actual examples, but the essence remains the same. We need to be familiar with the TCP protocol and state transitions, so that when problems are encountered in actual production, we can analyze them rationally and easily solve them.

<<:  What will 5G rely on to disrupt data centers?

>>:  5G cannot be rushed into implementation. Three new 4G/5G vulnerabilities have been exposed

Recommend

You are still 11 certifications away from being an IT boss

There is a saying that success is not difficult, ...

How to build your own CAN-bus application layer protocol

With the decline in the price of CAN-bus related ...

How long can the operators’ hard-earned V-shaped rebound last?

The latest data released by the Ministry of Indus...

Why use MAC address when we have IP address?

IP address and MAC address are both very importan...

Understand 5G in one article: Will it subvert the sky-high living costs?

When we were still accustomed to browsing the web...