Detailed explanation of TCP connection status and closing method and TCP parameter optimization under Winserver system

Detailed explanation of TCP connection status and closing method and TCP parameter optimization under Winserver system

To be honest, the TCP connection method on Windows is much more troublesome to set up and optimize than on Linux. However, it is still a problem that some servers on WinServer have to face. The following introduces the TCP optimization parameters under the Windows system.

TCP connection status and closing

1. TCP connection status

First, let's introduce the states in the process of establishing and closing a TCP connection. The TCP connection process is a state transition. Factors that cause the state transition include user calls, specific data packets, and timeouts. The specific states are as follows:

  • CLOSED: Initial state, indicating that there is no connection.
  • LISTEN: A socket on the server side is listening for connection requests from a remote TCP port.
  • SYN_SENT: Waiting for confirmation information after sending a connection request. When the client Socket is connected, it will first send a SYN packet, then enter the SYN_SENT state, and then wait for the server to send the second packet in the three-way handshake.
  • SYN_RECEIVED: After receiving a connection request, it sends back confirmation information and the corresponding connection request, and then waits for confirmation information. It is usually an intermediate state in the three-way handshake process of establishing a TCP connection, indicating that the socket on the server side receives the SYN packet from the client and responds.
  • ESTABLISHED: Indicates that the connection has been established and data transmission is possible.
  • FIN_WAIT_1: The party that actively closes the connection waits for the other party to return an ACK packet. If the socket actively closes the connection in the ESTABLISHED state and sends a FIN packet to the other party (indicating that the socket has no more data to send), it enters the FIN_WAIT_1 state and waits for the other party to return an ACK packet. After that, data can still be read, but data cannot be sent. Under normal circumstances, no matter what state the other party is in, it should immediately return an ACK packet, so the FIN_WAIT_1 state is generally rare.
  • FIN_WAIT_2: After the party that actively closes the connection receives the ACK packet returned by the other party, it waits for the other party to send a FIN packet. After the socket in the FIN_WAIT_1 state receives the ACK packet returned by the other party, it enters the FIN_WAIT_2 state. Since the socket in the FIN_WAIT_2 state needs to wait for the FIN packet sent by the other party, it can often be seen. If a packet with both FIN and ACK is received from the other party in the FIN_WAIT_1 state, it will directly enter the TIME_WAIT state without going through the FIN_WAIT_2 state.
  • TIME_WAIT: The party that actively closes the connection returns an ACK packet after receiving the FIN packet sent by the other party (indicating that the other party no longer has data to send and cannot read or send data thereafter), then waits long enough (2MSL) to ensure that the other party receives the ACK packet (taking into account the possibility of lost ACK packets and the impact of lost duplicate data packets), and finally returns to the CLOSED state to release network resources.
  • CLOSE_WAIT: Indicates that the party that passively closes the connection is waiting to close the connection. After receiving the FIN packet sent by the other party (indicating that the other party no longer has data to send), it returns an ACK packet accordingly and then enters the CLOSE_WAIT state. In this state, if there is still data to be sent by the party, it can continue to send it to the other party, but it cannot read data until the data is sent.
  • LAST_ACK: After the party that passively closes the connection completes sending data in the CLOSE_WAIT state, it can send a FIN packet to the other party (indicating that it has no more data to send), and then wait for the other party to return an ACK packet. After receiving the ACK packet, it returns to the CLOSED state and releases network resources.
  • CLOSING: A relatively rare exception state. Under normal circumstances, after sending a FIN packet, the other party's ACK packet should be received first (or simultaneously) and then the other party's FIN packet should be received. The CLOSING state means that after sending a FIN packet, the other party's ACK packet has not been received, but the other party's FIN packet has been received. There are two situations that may lead to this state: First, if both parties close the connection almost at the same time, then both parties may send FIN packets at the same time; second, if the ACK packet is lost and the other party's FIN packet is sent quickly, the FIN may arrive before the ACK.

The state transition of TCP connection is shown in the following figure

2. How to close a TCP connection

Establishing a TCP connection requires three handshakes, while closing a connection requires four handshakes, and is divided into active closing and passive closing. This is because TCP connections are full-duplex. If I close your connection, it does not mean that you close my connection, so both parties must close it separately. When one party completes its data transmission task, it can send a FIN packet to terminate the connection in this direction, indicating that it no longer has data to send; the party that receives the FIN packet can no longer read data, but can still send data. Take the client actively closing the connection as an example:

  • The client sends a FIN packet to the server, indicating that the client actively closes the connection, and then enters the FIN_WAIT_1 state, waiting for the server to return an ACK packet. After that, the client can no longer send data to the server, but can read data.
  • After receiving the FIN packet, the server sends an ACK packet to the client and then enters the CLOSE_WAIT state. After that, the server can no longer read data, but can continue to send data to the client. After receiving the ACK packet returned by the server, the client enters the FIN_WAIT_2 state and waits for the server to send a FIN packet.
  • After the server completes sending the data, it sends a FIN packet to the client, and then enters the LAST_ACK state, waiting for the client to return an ACK packet. After that, the server can neither read nor send data.
  • After receiving the FIN packet, the client sends an ACK packet to the server, then enters the TIME_WAIT state, waits for a long enough time (2MSL) to ensure that the server receives the ACK packet, and finally returns to the CLOSED state to release network resources. After receiving the ACK packet returned by the client, the server returns to the CLOSED state and releases network resources.

3. Impact on Server and Client

From the above, we can know that the TIME_WAIT state is a more difficult problem to deal with. After the party that actively closes the connection sends the last ACK packet, it will enter the TIME_WAIT state regardless of whether the other party receives it and wait for 2MSL before releasing network resources.

For the client, each connection needs to occupy a port, but the number of available ports allowed by the system is less than 65,000 (this can only be achieved after TCP parameters are optimized). Therefore, if the client initiates too many connections and actively closes them (assuming that there is no reuse of ports or multiple servers are connected), a large number of connections will be in the TIME_WAIT state after being closed, waiting for 2MSL time to release network resources (including ports), so the client will not be able to establish a new connection due to lack of available ports.

For the server (especially the server that handles high-concurrency short connections), the connection established between the server and the client uses the same port, that is, the listening port. It uses a hash table to record each connection on the port and is subject to the maximum number of open file descriptors. Therefore, if the server actively closes the connection, there will also be a large number of connections in the TIME_WAIT state after closing, waiting for 2MSL time before releasing network resources.

There are three ways to deal with this situation:

  • Try to let the client actively close the connection. Since the concurrency of each client is relatively low, there will be no performance bottleneck.
  • Optimize the server's system TCP parameters to achieve a balance between the maximum value, consumption speed, and recovery speed of network resources.
  • Rewrite the TCP protocol and re-implement the underlying code, but this method is very difficult and may affect the stability and security of the system.

TCP parameter optimization under Windows system

Usually, the Windows system parameters are improved by modifying the registry. All optimization operations are implemented by modifying the registry. You need to use the regedit command to enter the registry and create or modify parameters. After the modification is completed, you need to restart the system to make it effective. The parameter values ​​used below are all in decimal.

1. TCPWindowSize

The value of TCPWindowSize indicates the TCP window size. TCP Receive Window (TCP data receive buffer) defines the maximum number of bytes that the sender can send without receiving confirmation information from the receiver. The larger this value is, the less confirmation information is returned, and the better the communication between the sender and the receiver is. A smaller value can reduce the possibility of timeout when the sender is waiting for the receiver to return confirmation information, but this will increase network traffic and reduce effective throughput. TCP dynamically adjusts an integer multiple of the maximum segment length MSS (Maximum Segment Size) between the sender and the receiver. MSS is determined when the connection is established. Since the TCP Receive Window is adjusted to an integer multiple of MSS, the proportion of full-length TCP data segments in data transmission increases, thereby improving network throughput.

By default, TCP will try to optimize the window size based on the MSS, starting at 16KB and maximizing to 64KB. The maximum value of TCPWindowSize is usually 65535 bytes (64KB), the maximum segment length of Ethernet is 1460 bytes, and the maximum integer multiple of 1460 that is less than 64KB is 62420 bytes. Therefore, TCPWindowSize can be set to 62420 in the registry as a performance optimization value suitable for high-bandwidth networks. The specific operations are as follows:

Navigate to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey. Under the Parameters subkey, create or modify a REG_DWORD value named TCPWindowSize. The value range is from 0 to 65535. Set the value to 62420.

2. TCP1323Opts

In order to more efficiently utilize high-bandwidth networks, a much larger TCP window size can be used than the TCP window size described above. This feature is new in Windows 2000 and Windows Server 2003 and is called TCP Window Scaling. It increases the previous limit of 65535 bytes (64KB) to 1073741824 bytes (1GB). On connections with high bandwidth and latency product values ​​(such as satellite connections), it may be necessary to increase the window size to more than 64KB. With TCP Window Scaling, the system can allow for the transmission of larger amounts of data between confirmation messages, increasing network throughput and performance. The time required for the sender and receiver to communicate back and forth is called the round-trip time (RTT). TCP Window Scaling is only truly effective when both sides of the TCP connection are enabled. TCP has a timestamp option that improves the estimate of the RTT value by calculating it more frequently. This option is particularly helpful in estimating the RTT value of connections over longer distances over wide area networks and more accurately adjusting the TCP retransmission timeout. The timestamp provides two areas in the TCP header, one to record the time when the retransmission started and the other to record the time when it was received. Timestamps are particularly useful for TCP Window Scaling, which is the transmission of large data packets before acknowledging the receipt of information. Activating timestamps only adds 12 bytes to the header of each data packet, which has little impact on network traffic. Which is more important, data integrity or maximizing data throughput, is a question that needs to be evaluated. In some environments, such as video streaming, a larger TCP window is required, which is the most important, and data integrity is second. In such environments, TCP Window Scaling can be turned off without timestamps. This feature is only effective when both the sender and the receiver activate TCP Window Scaling and timestamps. However, if timestamps are added when sending packets, after NAT, if the same port has been used before and the timestamp is larger than the timestamp in the SYN sent by this connection, the server will ignore the SYN, which means that the user cannot complete the TCP 3-way handshake normally. A small TCP window is generated initially, and the window size will increase according to the internal algorithm. The specific operation is as follows:

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey. Under the Parameters subkey, create or modify a REG_DWORD value named TCP1323Opts.

The specific meaning of this value is:

  • 0 (default value) disables TCP Window Scaling and timestamps;
  • 1 means only TCP Window Scaling is enabled;
  • 2 means only timestamp is enabled;
  • 3 means enabling both TCP Window Scaling and timestamps.

After TCP1323Opts is set to activate TCP Window Scaling, the value of the registry key TCPWindowSize mentioned above can be increased to a maximum of 1GB. For optimal performance, the value here is best set to a multiple of MSS. The recommended value is 256960 bytes.

3. TCP Control Block Table

For each TCP connection, control variables are stored in a memory block called the TCP Control Block (TCB). The size of the TCB table is controlled by the registry key MaxHashTableSize. On systems with many active connections, a larger table can reduce the time the system takes to locate the TCB table. Partitioning the TCB table can reduce contention for table access. TCP performance can be improved by increasing the number of partitions, especially on multi-processor systems. The registry key NumTcbTablePartitions controls the number of partitions, which defaults to the square of the number of processors. TCBs are usually pre-populated in memory to prevent the TCB from being wasted time repeatedly relocating when TCP repeatedly connects and disconnects. This buffering method improves memory management, but it also limits the number of TCP connections allowed at the same time. The registry key MaxFreeTcbs determines the number of connections that are idle before the TCB becomes available again. It is often set higher than the default value on NT architectures to ensure that there are enough pre-populated TCBs. A new feature has been added since Windows 2000 to reduce the possibility of running out of pre-populated TCBs. If there are more connections in the waiting state than the setting in MaxFreeTWTcbs, all connections that have been waiting for more than 60 seconds will be forcibly closed and enabled again later. After this feature is merged into Windows 2000 Server and Windows Server 2003, MaxFreeTcbs will no longer be used to optimize performance. Specific operations:

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey, create or modify the REG_DWORD value named MaxHashTableSize under the Parameters subkey. The value range is from 1 to 65536 and must be a power of 2. The default value is 512 and it is recommended to set it to 8192. Then create or modify the REG_DWORD value named NumTcbTablePartitions under the Parameters subkey. The value range is from 1 to 65536 and must be a power of 2. The default value is the square of the number of processors and it is recommended to set it to 4 times the number of processor cores.

4. TcpTimedWaitDelay

The value of TcpTimedWaitDelay indicates the time the system must wait before releasing a closed TCP connection and reusing its resources. This time interval is the TIME_WAIT state mentioned in the previous blog (2MSL, twice the longest life cycle of a data packet). If the system shows a large number of connections in the TIME_WAIT state, it will cause a serious drop in concurrency and throughput. By reducing the value of this item, the system can release closed connections faster, thereby providing more resources for new connections, which is particularly beneficial for servers with high concurrency and short connections.

The default value of this item is 240, which means that the resource will be released after waiting for 4 minutes; the minimum value supported by the system is 30, which means the waiting time is 30 seconds.

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey. Under the Parameters subkey, create or modify a REG_DWORD value named TcpTimedWaitDelay. The value range is from 0 to 300. It is recommended that you set the value to 30.

5. MaxUserPort

The value of MaxUserPort indicates the maximum port number that TCP/IP can allocate when an application requests an available port from the system. If the system shows an exception when establishing a connection, it may be due to insufficient anonymous (temporary) ports, especially when the system opens a large number of ports to establish connections with Web services, databases, or other remote resources.

The default value of this item is 5000 in decimal, which is also the minimum value allowed by the system. The default range of port numbers reserved by Windows for anonymous (temporary) ports is from 1024 to 5000. In order to obtain higher concurrency, it is recommended to set this value to at least 32768, or even to the theoretical maximum value of 65534, which is especially useful for clients simulating high-concurrency test environments. Specific operations:

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey. Under the Parameters subkey, create or modify a REG_DWORD value named MaxUserPort. The value range is from 5000 to 65534. The default value is 5000. It is recommended that you set the value to 65534.

6. Dynamic Reserve

The value of the dynamic reserve enables the system to automatically adjust its configuration to accept a large number of sudden connection requests. If a large number of connection requests are received at the same time, exceeding the system's processing capacity, the dynamic reserve will automatically increase the number of pending connections supported by the system (that is, the number of waiting connections that the client has requested but the server has not yet processed. The total number of TCP connections includes the number of connected connections and the number of waiting connections), thereby reducing the number of connection failures. When the system's processing capacity and the number of supported pending connections are insufficient, the client's connection request will be directly rejected.

By default, Windows does not enable dynamic reserve, which can be enabled and set by the following operations:

Navigate to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters registry subkey and create or modify the following REG_DWORD values ​​under the Parameters subkey.

  • EnableDynamicBacklog, the value is 1, which means dynamic reserve is enabled.
  • MinimumDynamicBacklog, with a value of 128, indicates that the minimum number of suspended connections supported is 128.
  • MaximumDynamicBacklog, with a value of 2048, indicates that the maximum number of suspended connections supported is 2048. For servers with high concurrent short connections, it is recommended that the maximum value be set to 1024 or above.
  • DynamicBacklogGrowthDelta, with a value of 128, indicates that the number of supported suspended connections increases by 128, that is, when the number is insufficient, it increases by 128 until the set maximum value, such as 2048, is reached.

7. KeepAliveTime

The value of KeepAliveTime controls how often the system tries to verify that an idle connection is still intact. If there is no activity on the connection for a period of time, the system sends a keep-alive signal, and if the network is normal and the receiver is active, it responds. If you need to be sensitive to the loss of the receiver, that is, you need to find out more quickly if the receiver is lost, consider reducing this value. If there are many idle connections with long periods of inactivity, but few cases of lost receivers, you may need to increase this value to reduce overhead.

By default, if there is no activity on an idle connection for 7200000 milliseconds (2 hours), the system will send a keep-alive message. It is generally recommended to set this value to 1800000 milliseconds so that a lost connection will be detected within 30 minutes.

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey, create or modify a REG_DWORD value named KeepAliveTime under the Parameters subkey, and set the appropriate number of milliseconds for the value.

8. KeepAliveInterval

The value of KeepAliveInterval indicates how often the system repeatedly sends the "keep-alive" signal when no response is received from the other party. If the number of consecutive "keep-alive" signals exceeds the value of TcpMaxDataRetransmissions (described below) without any response, the connection will be abandoned. If the network environment is poor and a longer response time is allowed, consider increasing this value to reduce overhead; if you need to verify whether the recipient has been lost as soon as possible, consider reducing this value or the TcpMaxDataRetransmissions value.

By default, the system will wait for 1000 milliseconds (1 second) before resending the "keep alive" signal if no response is received. This can be modified according to specific needs. For specific operations:

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey, create or modify a REG_DWORD value named KeepAliveInterval under the Parameters subkey, and set the appropriate number of milliseconds for the value.

9. TcpMaxDataRetransmissions

The value of TcpMaxDataRetransmissions indicates TCP data retransmission, the number of times the system retransmits unanswered data segments on an existing connection. If the network environment is very poor, you may need to increase this value to maintain effective communication and ensure that the receiver receives the data; if the network environment is very good, or data loss is usually caused by the loss of the receiver, then you can reduce this value to reduce the time and overhead spent on verifying whether the receiver is lost.

By default, the system will resend the data segments that have not received a response 5 times. You can modify it according to specific needs. The specific operation is:

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey, create or modify a REG_DWORD value named TcpMaxDataRetransmissions under the Parameters subkey. The value range is from 0 to 4294967295. The default value is 5. Set it according to the actual situation.

10. TcpMaxConnectRetransmissions

The value of TcpMaxConnectRetransmisstions indicates the number of times TCP retransmits a non-confirmed connection request (SYN) before TCP exits. For each attempt, the retransmission timeout is twice that of a successful retransmission. In Windows Server 2003, the default timeout number is 2, and the default timeout period is 3 seconds (in the registry key TCPInitialRTT). The timeout period can be increased accordingly in slower WAN connections. Different environments may have different optimal settings, which need to be determined by testing in the actual environment. Do not set the timeout period too large, otherwise the network connection timeout will not occur. Specific operations:

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters registry subkey, create or modify a REG_DWORD value named TcpMaxConnectRetransmisstions under the Parameters subkey. The value range is from 0 to 255, and the default value is 2. Set it according to the actual situation. Then create or modify a REG_DWORD value named TCPInitialRTT under the Parameters subkey, and also set it according to the actual situation.

11. TcpAckFrequency

The value of TcpAckFrequency indicates how often the system sends a reply message. If the value is 2, the system will send a reply after receiving 2 segments, or after receiving 1 segment but not receiving any other segments within 200 milliseconds; if the value is 3, the system will send a reply after receiving 3 segments, or after receiving 1 or 2 segments but not receiving any other segments within 200 milliseconds, and so on. If you want to shorten the response time by eliminating reply delays, it is recommended to set this value to 1. In this case, the system will immediately send a reply to each segment; if the connection is mainly used to transfer large amounts of data and the 200 millisecond delay is not important, you can reduce this value to reduce the overhead of the reply.

By default, the system sets this value to 2, which means that the reply is sent every other segment. The valid range of this value is 0 to 255, where 0 means the default value 2 is used. You can modify it according to your specific needs. The specific operation is:

Browse to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters\Interfaces\\xx (xx is determined by the network adapter) registry subkey, create or modify the REG_DWORD value named TcpAckFrequency under the xx subkey. The value range is from 1 to 13. The default value is 2. Set this value based on how many segments you want to return a response for each segment sent. It is recommended to set it to 5 for a 100M network and 13 for a 1G network.

<<:  An Internet cable makes the whole dormitory building quieter after lights out

>>:  The story behind 2.5 million 5G users in 5 months

Recommend

Operators' 2G/3G network withdrawal may accelerate

As 4G coverage deepens and 5G commercial scope co...

Talking about IPv6 tunnel technology

IPv6 was originally designed without tunnel techn...

Gartner: Enterprises rethink software security strategies

Businesses are rethinking risk management and sof...

Rethinking data center cabling practices to improve energy efficiency

According to a study by researchers from the U.S....

MWC19 Shanghai | Ruijie and the operator industry jointly create a 5G world

[[268489]] Mobile communications, starting with G...

A brief analysis of RoCE network technology

In the era of data being king, people have more s...