You decide whether it is hard or not! Nearly 40 diagrams to explain the TCP three-way handshake and four-way handshake interview questions that are asked thousands of times

You decide whether it is hard or not! Nearly 40 diagrams to explain the TCP three-way handshake and four-way handshake interview questions that are asked thousands of times

Preface

Regardless of whether you are interviewing for a Java, C/C++, Python or other development position, knowledge about TCP is a must-ask question.

No matter how many times TCP abuses me, I will still treat TCP like my first love.

I recall that when Xiaolin was recruiting on campus, he was often rejected because of TCP interview questions. It was really a love-hate relationship....

It doesn't matter if the past is not here, let's eliminate this fear today and face it bravely with a smile!

So Xiaolin sorted out the interview questions about TCP three-way handshake and four-way wave, and discussed them with everyone.

  • TCP Basics

  • TCP connection establishment

  • TCP connection lost

  • Socket Programming

PS: This article does not cover TCP flow control, congestion control, reliable transmission and other aspects. These will be left for the next article!

text

1. Basic knowledge of TCP

Look at the TCP header format

Let's first look at the format of the TCP header. The color-coded fields are those that are most relevant to this article, and the other fields are not described in detail.

(1) Sequence number: A random number generated by a computer when a connection is established as its initial value, which is transmitted to the receiving host through the SYN packet. Each time data is sent, the size of the "number of data bytes" is "accumulated". It is used to solve the problem of disordered network packets.

(2) Confirmation number: refers to the sequence number of the data that is "expected" to be received next time. After receiving this confirmation, the sender can assume that the data before this sequence number has been received normally. It is used to solve the problem of packet loss.

(3) Control bit:

  • ACK: When this bit is 1, the "acknowledgement" field becomes valid. TCP stipulates that this bit must be set to 1 except for the SYN packet when the initial connection is established.
  • RST: When this bit is 1, it indicates that an abnormality occurs in the TCP connection and the connection must be forcibly disconnected.
  • SYC: When this bit is 1, it indicates that a connection is to be established and the initial value of the serial number is set in the "Serial Number" field.
  • FIN: When this bit is 1, it means that no more data will be sent in the future and the connection is expected to be disconnected. When the communication is over and the connection is expected to be disconnected, the hosts on both sides of the communication can exchange TCP segments with the FIN bit set to 1.

2. Why do we need TCP protocol? At which layer does TCP work?

The IP layer is "unreliable". It does not guarantee the delivery of network packets, the in-order delivery of network packets, or the integrity of the data in the network packets.

Relationship between OSI reference model and TCP/IP

If the reliability of network data packets needs to be guaranteed, the upper layer (transport layer) TCP protocol needs to be responsible for it.

Because TCP is a reliable data transmission service working at the transport layer, it can ensure that the network packets received by the receiver are damage-free, gap-free, non-redundant and in-order.

3. What is TCP?

TCP is a connection-oriented, reliable, byte stream-based transport layer communication protocol.

  • Connection-oriented: It must be "one-to-one" to connect. Unlike the UDP protocol, one host can send messages to multiple hosts at the same time, that is, one-to-many is impossible;
  • Reliable: No matter what kind of link changes occur in the network link, TCP can ensure that a message will reach the receiving end;
  • Byte stream: Messages are "boundaryless", so no matter how large the message is, it can be transmitted. And the messages are "ordered". When the "previous" message is not received, even if it has received the following bytes first, it cannot be thrown to the application layer for processing. At the same time, "duplicate" messages will be automatically discarded.

4. What is a TCP connection?

Let's take a look at how RFC 793 defines "connection":

Connections:

The reliability and flow control mechanisms described above require that TCPs initialize and maintain certain status information for each data stream.

The combination of this information, including sockets, sequence numbers, and window sizes, is called a connection.

Simply put, certain status information used to ensure reliability and flow control maintenance. The combination of this information, including Socket, sequence number and window size, is called a connection.

So we can know that establishing a TCP connection requires the client and the server to reach a consensus on the above three pieces of information.

  • Socket: consists of IP address and port number
  • Sequence number: used to solve disorder problems, etc.
  • Window size: used for flow control

5. How to uniquely identify a TCP connection?

The TCP tuple can uniquely identify a connection. The tuple includes the following:

  • Source Address
  • Source Port
  • Destination Address
  • Destination Port

The source address and destination address fields (32 bits) are in the IP header and are used to send messages to the other host via the IP protocol.

The source port and destination port fields (16 bits) are in the TCP header and their function is to tell the TCP protocol to which process the message should be sent.

6. A server with an IP address listens to a port. What is the maximum number of TCP connections?

The server usually listens on a fixed local port, waiting for the client's connection request.

Therefore, the client IP and port are variable, and their theoretical value calculation formula is as follows:

For IPv4, the maximum number of client IP addresses is 2 to the power of 32, and the maximum number of client ports is 2 to the power of 16, which is the maximum number of TCP connections on a single server, approximately 2 to the power of 48.

Of course, the maximum number of concurrent TCP connections on the server side is far from reaching the theoretical upper limit.

  • First of all, the main thing is the file descriptor limit. Sockets are files, so first of all, the number of file descriptors must be configured through ulimit;
  • Another is memory limitation. Each TCP connection takes up a certain amount of memory, and the operating system is limited.

7. What is the difference between UDP and TCP? What are their respective application scenarios?

UDP does not provide complex control mechanisms and uses IP to provide "connectionless" communication services.

The UDP protocol is really very simple, with only 8 bytes (64 bits) in the header. The UDP header format is as follows:

  • Destination and source ports: mainly tell the UDP protocol to which process the message should be sent.
  • Packet length: This field stores the sum of the length of the UDP header and the length of the data.
  • Checksum: Checksum is designed to provide reliable UDP header and data.

Differences between TCP and UDP:

(1) Connection:

  • TCP is a connection-oriented transport layer protocol, and a connection must be established before data is transmitted.
  • UDP does not require a connection and transmits data instantly.

(2) Service Targets:

  • TCP is a one-to-one two-point service, that is, a connection has only two endpoints.
  • UDP supports one-to-one, one-to-many, and many-to-many interactive communications

(3) Reliability:

  • TCP delivers data reliably, and data can arrive on demand without errors, loss, or duplication.
  • UDP is a best-effort delivery method and does not guarantee reliable delivery of data.

(4) Congestion control and flow control:

  • TCP has congestion control and flow control mechanisms to ensure the security of data transmission.
  • UDP does not have this feature. Even if the network is very congested, it will not affect the sending rate of UDP.

(5) Header overhead:

  • The TCP header is relatively long and has a certain amount of overhead. The header is 20 bytes when the "option" field is not used. If the "option" field is used, it will become longer.
  • The UDP header is only 8 bytes and is fixed, with low overhead.

TCP and UDP application scenarios:

Since TCP is connection-oriented and can ensure reliable data delivery, it is often used for:

  • FTP file transfer
  • HTTP/HTTPS

Since UDP is connectionless, it can send data at any time. In addition, UDP processing is simple and efficient, so it is often used for:

  • Communications with a small amount of packets, such as DNS, SNMP, etc.
  • Multimedia communications such as video and audio
  • Broadcast Communications

8. Why does the UDP header not have a "Header Length" field, while the TCP header has a "Header Length" field?

The reason is that TCP has a variable-length "option" field, while the UDP header length does not change, so there is no need for an extra field to record the UDP header length.

9. Why does the UDP header have a "packet length" field, but the TCP header does not have a "packet length" field?

Let's first talk about how TCP calculates the payload data length:

The total IP length and IP header length are known in the IP header format. The TCP header length is known in the TCP header format, so the length of the TCP data can be obtained.

Everyone was curious and asked: "UDP is also based on the IP layer, so the data length of UDP can also be calculated using this formula? Why is there still a "packet length"? "

Asking this question, it does seem that the UDP "packet length" is redundant.

Because for the convenience of network device hardware design and processing, the header length needs to be an integer multiple of 4 bytes.

If the UDP "Packet Length" field is removed, the UDP header length will not be an integer multiple of 4 bytes. So Xiaolin thinks that this may be to complete the UDP header length is an integer multiple of 4 bytes, so the "Packet Length" field is added.

2. TCP connection establishment

1. TCP three-way handshake process and state transition

TCP is a connection-oriented protocol, so a connection must be established before using TCP, and the connection is established through a three-way handshake.

TCP three-way handshake

  • At the beginning, both the client and the server are in the CLOSED state. First, the server actively listens to a port and is in the LISTEN state.

The first message - SYN message

  • The client will randomly initialize the sequence number (client_isn), put this sequence number in the "sequence number" field of the TCP header, and set the SYN flag to 1, indicating a SYN message. Then the first SYN message is sent to the server, indicating that a connection is initiated to the server. This message does not contain application layer data, and the client is then in the SYN-SENT state.

  • After receiving the SYN message from the client, the server first randomly initializes its own sequence number (server_isn), fills this sequence number into the "sequence number" field of the TCP header, and then fills the "confirmation number" field of the TCP header into client_isn + 1, and then sets the SYN and ACK flags to 1. Finally, the message is sent to the client, which does not contain application layer data, and the server is in the SYN-RCVD state.

The third message - ACK message

  • After the client receives the message from the server, it must also respond to the server with the last response message. First, the ACK flag in the TCP header of the response message is set to 1, and then the "confirmation response number" field is filled with server_isn + 1. Finally, the message is sent to the server. This message can carry data from the client to the server, and then the client is in the ESTABLISHED state.
  • After receiving the response message from the client, the server also enters the ESTABLISHED state.

From the above process, we can find that the third handshake can carry data, while the first two handshakes cannot carry data. This is also a frequently asked question in interviews.

Once the three-way handshake is completed, both parties are in the ESTABLISHED state, and the connection is established. The client and server can send data to each other.

2. How to check TCP status in Linux?

To view the TCP connection status, you can use the netstat -napt command in Linux.

3. Why is it a three-way handshake? Not two or four?

I believe the most common answer is: "Because the three-way handshake can ensure that both parties have the ability to receive and send."

There is nothing wrong with this answer, but it is one-sided and does not state the main reason.

Earlier we learned what a TCP connection is:

  • Certain state information is used to ensure reliability and flow control maintenance. The combination of this information, including socket, sequence number and window size, is called a connection.

Therefore, it is important to understand why a three-way handshake is required to initialize the Socket, sequence number, and window size and establish a TCP connection.

Next, we analyze the reasons for the three-way handshake from three aspects:

  • The three-way handshake can prevent the initialization of repeated historical connections (main reason)
  • Three-way handshake is required to synchronize the initial sequence numbers of both parties.
  • Three-way handshake can avoid resource waste

(1) Reason 1: Avoid historical connections

Let's look at the primary reason why TCP connections use a three-way handshake, as stated in RFC 793:

The principle reason for the three-way handshake is to prevent old duplicate connection initiations from causing confusion.

In short, the primary reason for the three-way handshake is to prevent confusion caused by old duplicate connection initializations.

The network environment is complicated. It is often not as we expect that the first data packet sent will reach the target host first. On the contrary, it is very complicated. Due to network congestion and other messy reasons, the old data packet may reach the target host first. So how to avoid the TCP three-way handshake in this case?

The client sends multiple SYN packets to establish a connection. In the case of network congestion, the following happens:

  • An "old SVN message" arrives at the server earlier than the "latest SYN" message;
  • Then the server will return a SYN + ACK message to the client;
  • After receiving the message, the client can determine that this is a historical connection (the sequence number has expired or timed out) based on its own context, and then the client will send a RST message to the server to indicate that the connection is terminated.

If it is a two-way handshake connection, it is impossible to determine whether the current connection is a historical connection. A three-way handshake allows the client (sender) to determine whether the current connection is a historical connection because it has enough context when the client (sender) is ready to send the third message:

  • If it is a historical connection (serial number expired or timed out), the message sent in the third handshake is a RST message, thereby terminating the historical connection;
  • If it is not a historical connection, the third message sent is an ACK message, and the two communicating parties will successfully establish a connection;

Therefore, the main reason why TCP uses three-way handshake to establish a connection is to prevent historical connections from initializing the connection.

(2) Reason 2: Synchronizing the initial sequence numbers of both parties

Both parties in the TCP protocol must maintain a "sequence number". The sequence number is a key factor in reliable transmission. Its functions are:

  • The receiver can remove duplicate data;
  • The receiver can receive the packets in order according to their sequence numbers;
  • It can identify which of the sent data packets have been received by the other party.

It can be seen that the sequence number plays a very important role in the TCP connection. Therefore, when the client sends a SYN message carrying the "initial sequence number", the server needs to send an ACK response message to indicate that the client's SVN message has been successfully received by the server. When the server sends the "initial sequence number" to the client, it still needs to get an acknowledgment from the client. This back and forth process can ensure that the initial sequence numbers of both parties can be reliably synchronized.

Four-way handshake and three-way handshake

The four-way handshake can actually reliably synchronize the initialization sequence numbers of both parties, but because the second and third steps can be optimized into one step, it becomes a "three-way handshake".

The two-way handshake only ensures that the initial sequence number of one party can be successfully received by the other party, but there is no way to ensure that the initial sequence numbers of both parties can be confirmed and received.

(3) Reason 3: Avoiding waste of resources

If there is only a "two-way handshake", when the client's SYN request connection is blocked in the network and the client does not receive an ACK message, it will resend the SYN. ​​Since there is no third handshake, the server does not know whether the client has received the ACK confirmation signal it sent to establish a connection. Therefore, each time a SYN is received, it can only actively establish a connection. What will this cause?

If the client's SYN is blocked and the SYN message is sent repeatedly, the server will establish multiple redundant invalid links after receiving the request, resulting in unnecessary waste of resources.

Two handshakes will cause a waste of resources

That is, the two-way handshake will cause message retention, and the server will repeatedly accept useless connection request SYN messages, resulting in repeated allocation of resources.

(4) Summary

When TCP establishes a connection, the three-way handshake can prevent the establishment of a historical connection, reduce unnecessary resource consumption for both parties, and help both parties synchronize and initialize the sequence number. The sequence number can ensure that data packets are not repeated, discarded, and transmitted in order.

Reasons for not using "two-way handshake" and "four-way handshake":

  • "Two-way handshake": It cannot prevent the establishment of historical connections, which will cause waste of resources on both sides, and it is also impossible to reliably synchronize the sequence numbers of both sides;
  • "Four-way handshake": Three handshakes are enough to establish a reliable connection in theory, so there is no need to use more communications.

4. Why are the initial sequence numbers (ISNs) of the client and server different?

Because messages in the network may be delayed, copied and resent, or lost, this may cause different connections to affect each other. Therefore, in order to avoid mutual impact, the initial sequence numbers of the client and the server are random and different.

5. How is the Initial Sequence Number (ISN) randomly generated?

The starting ISN is based on a clock that increments by 4 milliseconds + 1, making one revolution in 4.55 hours.

RFC1948 proposes a better random generation algorithm for the initialization sequence number ISN.

ISN = M + F (localhost, localport, remotehost, remoteport)

  • M is a timer that increments every 4 milliseconds.
  • F is a hash algorithm that generates a random value based on the source IP, destination IP, source port, and destination port. To ensure that the hash algorithm cannot be easily deduced by the outside, the MD5 algorithm is a better choice.

6. Since the IP layer can fragment, why does the TCP layer still need MSS?

Let's first understand MTU and MSS

  • MTU: The maximum length of a network packet, usually 1500 bytes in Ethernet;
  • MSS: The maximum length of TCP data that a network packet can contain after removing the IP and TCP headers.

If the entire TCP message (header + data) is handed over to the IP layer for fragmentation, what abnormality will occur?

When the IP layer has data (TCP header + TCP data) that exceeds the MTU size to be sent, the IP layer will fragment the data into several pieces to ensure that each piece is smaller than the MTU. After an IP datagram is fragmented, it is reassembled by the IP layer of the target host and then handed over to the upper TCP transport layer.

This seems to be in good order, but there is a hidden danger. If an IP fragment is lost, all fragments of the entire IP message must be retransmitted.

Because the IP layer itself does not have a timeout retransmission mechanism, the TCP of the transport layer is responsible for timeout and retransmission.

When the receiver finds that a piece of the TCP message (header + data) is lost, it will not respond to the other party with an ACK. Then the sender's TCP will resend the "entire TCP message (header + data)" after timeout.

Therefore, it can be concluded that fragmented transmission at the IP layer is very inefficient.

Therefore, in order to achieve the best transmission efficiency, the TCP protocol usually negotiates the MSS value of both parties when establishing a connection. When the TCP layer finds that the data exceeds the MSS, it will first fragment it. Of course, the length of the IP packet formed by it will not be greater than the MTU, so IP fragmentation is naturally not needed.

After TCP layer fragmentation, if a TCP fragment is lost, it is retransmitted in units of MSS instead of retransmitting all fragments, which greatly increases the efficiency of retransmission.

7. What is a SYN attack? How to avoid SVN attacks?

(1) SYN attack

We all know that establishing a TCP connection requires three handshakes. Assuming that an attacker forges SYN messages with different IP addresses in a short period of time, the server enters the SYN_RCVD state every time it receives an SVN message. However, the ACK + SYN message sent by the server cannot get an ACK response from the unknown IP host. Over time, the server's SYN receive queue (unconnected queue) will be filled up, making the server unable to serve normal users.

(2) Avoid SVN attack method 1

One solution is to modify the Linux kernel parameters to control the queue size and what to do when the queue is full.

  • When the speed at which the network card receives data packets is faster than the speed at which the kernel processes them, a queue will be created to store these data packets. The maximum value of this queue is controlled by the following parameters:
    1. net.core.netdev_max_backlog
  • Maximum number of connections in SYN_RCVD state:
    1. net.ipv4.tcp_max_syn_backlog
  • When the processing capacity is exceeded, a RST is directly returned to the new SYN and the connection is discarded:
    1. net.ipv4.tcp_abort_on_overflow

(3) Avoid SVN attack method 2

Let's first look at how the Linux kernel's SYN (unfinished connection establishment) queue and Accpet (completed connection establishment) queue work?

Normal process:

  • When the server receives the SYN message from the client, it will add it to the kernel's "SYN queue";
  • Then send SYN + ACK to the client and wait for the client to respond with an ACK message;
  • After receiving the ACK message, the server removes it from the "SVN Queue" and puts it into the "Accept Queue";
  • The application calls the accpet() socket interface to retrieve the connection from the "Accept Queue".

Application is too slow:

  • If the application is too slow, the "Accept Queue" will be full.

Under attack from SVN:

  • If the system is continuously attacked by SVN, the "SYN queue" will be full.

The tcp_syncookies method can be used to deal with SYN attacks:

  1. net.ipv4.tcp_syncookies = 1  

  • When the "SYN queue" is full, the subsequent servers receive SYN packets and do not enter the "SYN queue";
  • Calculate a cookie value and return it to the client with the "sequence number" in SYN + ACK.
  • When the server receives the response message from the client, it will check the legitimacy of the ACK packet. If it is legal, it will be directly placed in the "Accept Queue".
  • Finally, the application calls the accpet() socket interface to retrieve the connection from the "Accept Queue".

3. TCP connection disconnection

1. TCP four-wave process and state transition

All good things must come to an end, and this is also true for TCP connections. TCP disconnects by waving four times.

Both parties can actively disconnect, and after disconnection, the "resources" in the host will be released.

The client actively closes the connection - TCP wave four times

  • The client intends to close the connection and sends a message with the FIN flag in the TCP header set to 1, which is also called a FIN message. The client then enters the FIN_WAIT_1 state.
  • After receiving the message, the server sends an ACK response message to the client, and then the server enters the CLOSED_WAIT state.
  • After the client receives the ACK response message from the server, it enters the FIN_WAIT_2 state.
  • After the server has processed the data, it also sends a FIN message to the client, and then the server enters the LAST_ACK state.
  • After receiving the FIN message from the server, the client returns an ACK response message and then enters the TIME_WAIT state.
  • After the server receives the ACK response message, it enters the CLOSE state, and the server has completed the closing of the connection.
  • After 2MSL, the client automatically enters the CLOSE state, and the client also completes the connection closure.
  • You can see that a FIN and an ACK are required in each direction, so it is often called four waves.

One thing to note here is that only when the connection is closed actively will there be a TIME_WAIT state.

2. Why does it take four waves?

If we review the process of both parties sending FIN packets four times, we can understand why it is necessary to do it four times.

  • When closing the connection, when the client sends a FIN to the server, it only means that the client will no longer send data but can still receive data.
  • When the server receives the FIN message from the client, it first returns an ACK response message. The server may still have data to process and send. When the server no longer sends data, it sends a FIN message to the client to indicate that it agrees to close the connection now.

From the above process, we can see that the server usually needs to wait for the data to be sent and processed, so the server's ACK and FIN are generally sent separately, resulting in one more handshake than the three-way handshake.

3. Why is the waiting time of TIME_WAIT 2MSL?

MSL is Maximum Segment Lifetime, the maximum survival time of a message. It is the longest time that any message can exist on the network. If it exceeds this time, the message will be discarded. Because TCP messages are based on the IP protocol, and there is a TTL field in the IP header, which is the maximum number of routes that an IP datagram can pass through. This value decreases by 1 for each router that processes it. When this value is 0, the datagram will be discarded, and an ICMP message will be sent to notify the source host.

The difference between MSL and TTL: The unit of MSL is time, while TTL is the number of routing hops. Therefore, MSL should be greater than or equal to the time when TTL is consumed to 0 to ensure that the message has been naturally destroyed.

TIME_WAIT waits for 2 times of MSL. A more reasonable explanation is that there may be data packets from the sender in the network. When these data packets from the sender are processed by the receiver, they will send a response to the other party, so it takes 2 times the time to wait for the round trip.

For example, if the passive closing party does not receive the last ACK message of the disconnection, it will trigger a timeout and resend the Fin message. After the other party receives the FIN, it will resend ACK to the passive closing party. The total time is exactly 2 MSLs.

The 2MSL time starts from the time the client sends ACK after receiving FIN. If the client receives a FIN message resent by the server during the TIME-WAIT time because the client's ACK is not transmitted to the server, the 2MSL time will be restarted.

In Linux, 2MSL defaults to 60 seconds, so 1MSL is 30 seconds. The Linux system stays in TIME_WAIT for a fixed 60 seconds.

Its name defined in the Linux kernel code is TCP_TIMEWAIT_LEN:

  1. #define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
  2. state, about 60 seconds */

If you want to change the length of TIME_WAIT, you can only modify the value of TCP_TIMEWAIT_LEN in the Linux kernel code and recompile the Linux kernel.

4. Why is TIME_WAIT state needed?

The TIME-WAIT state will only occur on the party that actively initiates the closing of the connection.

The TIME-WAIT state is needed mainly for two reasons:

  • Prevent "old" packets with the same "quadruple" from being received;
  • Ensure that the "passively closed connection" party can be closed correctly, that is, ensure that the final ACK can be received by the passive closing party, thereby helping it to close normally.

(1) Reason 1: Preventing data packets from old connections

Assuming that TIME-WAIT has no waiting time or the time is too short, what will happen after the delayed data packet arrives?

Exception in receiving historical data

  • As shown in the yellow box above, the SEQ = 301 message sent by the server before closing the connection was delayed by the network.
  • At this time, after the TCP connection with the same port is reused, the delayed SEQ = 301 arrives at the client. Then the client may receive this expired message normally, which will cause serious problems such as data confusion.

Therefore, TCP has designed such a mechanism. After 2MSL, the data packets in both directions are discarded, so that the data packets of the original connection disappear naturally in the network, and the data packets that appear again must be generated by the newly established connection.

(2) Reason 2: Ensure the connection is closed correctly

RFC 793 points out that another important role of TIME-WAIT is:

TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.

That is to say, the role of TIME-WAIT is to wait for enough time to ensure that the final ACK can be received by the passive closing party, thereby helping it to close normally.

Assuming that TIME-WAIT has no waiting time or the time is too short, what problems will the disconnection cause?

Exceptions that do not ensure a normal disconnect

  • As shown in the red box in the figure above, if the last ACK message of the client's four waves is lost in the network, if the client's TIME-WAIT is too short or non-existent, it will directly enter the CLOSE state, and the server will remain in the LASE-ACK state.
  • When the client initiates a SYN request message to establish a connection, the server will send a RST message to the client, and the connection establishment process will be terminated.

If TIME-WAIT waits long enough, two situations will occur:

  • If the server receives the last ACK message of four waves normally, the server closes the connection normally.
  • If the server does not receive the last ACK message after four waves, it will resend the FIN connection closing message and wait for a new ACK message.

Therefore, after the client waits for 2MSL time in the TIME-WAIT state, it can be guaranteed that the connection between both parties can be closed normally.

5. What are the dangers of too much TIME_WAIT?

If the server has TCP in TIME-WAIT state, it means that the disconnection request was actively initiated by the server.

There are two main hazards of too much TIME-WAIT state:

  • The first is memory resource usage;
  • The second is the occupation of port resources. A TCP connection consumes at least one local port.

The second hazard will cause serious consequences. You should know that port resources are also limited. Generally, the ports that can be opened are 32768~61000, which can also be specified by the following parameter settings:

  1. net.ipv4.ip_local_port_range

If there are too many TIME_WAIT states on the server side and all port resources are occupied, new connections cannot be created.

6. How to optimize TIME_WAIT?

Here are several ways to optimize TIME-WAIT, all with advantages and disadvantages:

  • Turn on net.ipv4.tcp_tw_reuse and net.ipv4.tcp_timestamps options;
  • net.ipv4.tcp_max_tw_buckets
  • Use SO_LINGER in the program, and the application is forced to use RST to close.

(1) Method 1: net.ipv4.tcp_tw_reuse and tcp_timestamps

When the following Linux kernel parameters are enabled, the socket in TIME_WAIT can be reused for new connections.

  1. net.ipv4.tcp_tw_reuse = 1  

To use this option, there is another prerequisite, which is to enable support for TCP timestamps, namely:

  1. net.ipv4.tcp_timestamps = 1 (default is 1)

This timestamp field is in the "options" of the TCP header and is used to record the current timestamp of the TCP sender and the latest timestamp received from the peer.

With the introduction of timestamps, the 2MSL problem we mentioned earlier no longer exists, because duplicate data packets will be naturally discarded due to expired timestamps.

Warm reminder: net.ipv4.tcp_tw_reuse should be used with caution, because it must enable the timestamp support net.ipv4.tcp_timestamps. When the client and server host time are not synchronized, the client's message will be directly rejected. Xiaolin encountered this at work... It took a long time to troubleshoot.

(2) Method 2: net.ipv4.tcp_max_tw_buckets

The default value is 18000. Once the number of connections in TIME_WAIT in the system exceeds this value, the system will reset the status of all TIME_WAIT connections.

This method is too violent and only treats the symptoms rather than the root cause. It creates far more problems than it solves, so it is not recommended.

(3) Method 3: Using SO_LINGER in the program

We can set the behavior of calling close to close the connection by setting the socket options.

  1. struct linger so_linger;
  2. so_linger.l_onoff = 1 ;
  3. so_linger.l_linger = 0 ;
  4. setsockopt(s, SOL_SOCKET, SO_LINGER, &so_linger,sizeof(so_linger));

If l_onoff is non-zero and l_linger is 0, a RST flag will be sent to the peer immediately after close is called. The TCP connection will skip four waves, i.e. the TIME_WAIT state, and will be closed directly.

However, this provides a possibility to cross the TIME_WAIT state, but it is a very dangerous behavior and is not worth promoting.

7. What if the connection is established but the client suddenly fails?

TCP has a keep-alive mechanism. The principle of this mechanism is as follows:

Define a time period. During this period, if there is no connection-related activity, the TCP keep-alive mechanism will start to work. At every time interval, a probe message will be sent. The probe message contains very little data. If several consecutive probe messages are not responded to, the current TCP connection is considered to be dead, and the system kernel will notify the upper-level application of the error information.

In the Linux kernel, there are corresponding parameters to set the keep-alive time, the number of keep-alive detections, and the time interval of keep-alive detections. The following are the default values:

  1. net.ipv4.tcp_keepalive_time = 7200  
  2. net.ipv4.tcp_keepalive_intvl = 75    
  3. net.ipv4.tcp_keepalive_probes = 9  
  • tcp_keepalive_time=7200: indicates that the keepalive time is 7200 seconds (2 hours), that is, if there is no connection-related activity within 2 hours, the keepalive mechanism will be activated
  • tcp_keepalive_intvl=75: means each detection interval is 75 seconds;
  • tcp_keepalive_probes=9: means that if there is no response after 9 detections, the other party is considered unreachable and the connection is terminated.

That is to say, in Linux system, it takes at least 2 hours, 11 minutes and 15 seconds to find a "dead" connection.

This time is a bit long. We can also set the above keep-alive related parameters according to actual needs.

If TCP keepalive is enabled, the following situations need to be considered:

  • The first is that the peer program is working normally. When the TCP keep-alive detection message is sent to the peer, the peer will respond normally, so the TCP keep-alive time will be reset and wait for the next TCP keep-alive time to arrive.
  • The second is that the peer program crashes and restarts. When the TCP keepalive detection message is sent to the peer, the peer can respond, but since there is no valid information about the connection, a RST message will be generated, and it will soon be discovered that the TCP connection has been reset.
  • The third is that the peer program crashes, or the peer is unreachable due to other reasons. When the TCP keep-alive detection message is sent to the peer, there is no response, and after several consecutive keep-alive detection times, TCP will report that the TCP connection has died.

4. Socket Programming

1. How to program socket for TCP?

  • The server and client initialize the socket and obtain the file descriptor;
  • The server calls bind and binds to the IP address and port;
  • The server calls listen to monitor;
  • The server calls accept and waits for the client to connect;
  • The client calls connect to initiate a connection request to the server's address and port;
  • The server accept returns the file descriptor of the socket used for transmission;
  • The client calls write to write data; the server calls read to read data;
  • When the client disconnects, it calls close. When the server reads data, it will read EOF. After processing the data, the server calls close to indicate that the connection is closed.

It should be noted here that when the server calls accept, if the connection is successful, a connected socket will be returned, which will be used to transmit data later.

Therefore, the listening socket and the socket actually used to transmit data are "two" sockets, one is called the listening socket and the other is called the completed connection socket.

After a successful connection is established, both parties begin to read and write data through the read and write functions, just like writing something to a file stream.

2. What is the meaning of the backlog parameter when listening?

Two queues are maintained in the Linux kernel:

  • Unfinished connection queue (SYN queue): Receives a SYN connection request and is in the SYN_RCVD state;
  • Completed connection queue (Accpet queue): The TCP three-way handshake process has been completed and is in the ESTABLISHED state.

SYN Queue and Accpet Queue

  1. int listen (int socketfd, int backlog)
  • Parameter 1 socketfd is the socketfd file descriptor
  • Parameter 2: backlog. This parameter has changed a lot in history.

In the early Linux kernel backlog was the SYN queue size, that is, the outstanding queue size.

After Linux kernel 2.2, backlog becomes the accept queue, that is, the length of the queue for completed connection establishment, so now backlog is generally considered to be the accept queue.

3. At which step of the three-way handshake is accept sent?

Let's first look at what the client sends when it connects to the server.

Client connects to server

  • The client's protocol stack sends a SYN packet to the server and tells the server the current sending sequence number client_isn. The client enters the SYNC_SENT state.
  • After receiving this packet, the protocol stack on the server side sends an ACK response to the client. The response value is client_isn+1, indicating the confirmation of the SYN packet client_isn. At the same time, the server also sends a SYN packet to tell the client that the current sending sequence number is server_isn, and the server enters the SYNC_RCVD state.
  • After the client protocol stack receives ACK, the application returns from the connect call, indicating that the one-way connection from the client to the server is successfully established. The client status is ESTABLISHED. At the same time, the client protocol stack will also respond to the SYN packet from the server, and the response data is server_isn+1;
  • After the response packet arrives at the server, the server protocol stack causes the accept blocking call to return. At this time, the one-way connection from the server to the client is successfully established, and the server enters the ESTABLISHED state.

From the above description, we can know that the client connect successfully returns in the second handshake, and the server accept successfully returns after the three-way handshake is successful.

4. The client calls close, what is the process of disconnecting the connection?

Let's see what happens when the client actively calls close?

The client calls the close procedure

  • The client calls close, indicating that the client has no data to send, and then sends a FIN message to the server, entering the FIN_WAIT_1 state;
  • When the server receives the FIN packet, the TCP protocol stack will insert an EOF into the receiving buffer for the FIN packet. The application can sense this FIN packet through the read call. This EOF will be placed after other received data that is already queued, which means that the server needs to handle this abnormal situation, because EOF means that no additional data will arrive on the connection. At this point, the server enters the CLOSE_WAIT state;
  • Then, after processing the data, EOF is read, so close is called to close the socket, which will cause a FIN packet to be sent, and then the socket will be in the LAST_ACK state.
  • The client receives the FIN packet from the server and sends an ACK confirmation packet to the server. At this time, the client will enter the TIME_WAIT state;
  • After the server receives the ACK confirmation packet, it enters the final CLOSE state;
  • After the client enters the 2MSL time, it also enters the CLOSED state.

<<:  Shifting gears to 5G: Operators will experience both hardship and sweetness in 2019

>>:  How Desktop Cloud Helps Application Innovation

Recommend

UUUVPS: 60 yuan/month-1GB/30GB/4M/Hong Kong CN2 line

UUUVPS is now holding a three-year anniversary ev...

Networking in Pictures: What is Virtual Router Redundancy Protocol (VRRP)?

VRRP is a commonly used fault-tolerant protocol t...

They are all chips, so why can’t computer CPUs be used in mobile phones?

This question is actually very simple. As long as...

6 trends that will boost the impact of IoT in 2018

In 2016-2017, the trend of IoT was widely accepte...

SRv6—A killer for 5G technology implementation

The development of 5G services has put forward hi...

How to deal with the nightmare of network outage

Reducing network outages is becoming an increasin...

How to Choose and Buy Network Automation Tools

The concept of network automation has been around...

A network administrator's self-cultivation: TCP protocol

Today, let’s continue with the network administra...