Let us say goodbye to TCP together!

Let us say goodbye to TCP together!

PS: This article does not involve knowledge about TCP flow control, congestion control, reliable transmission, etc. These knowledge are in this article: Are you still worried about TCP retransmission, sliding window, flow control, and congestion control? You won’t worry after reading the diagram.

1. Basic knowledge of TCP

Look at the TCP header format

Let's first look at the format of the TCP header. The color-coded fields are those that are most relevant to this article, and the other fields are not described in detail.

TCP header format

Sequence number: A random number generated by a computer when a connection is established as its initial value, which is transmitted to the receiving host through a SYN packet. Each time data is sent, the size of the "number of data bytes" is "accumulated". It is used to solve the problem of disordered network packets.

Confirmation number: refers to the sequence number of the data that is "expected" to be received next time. After receiving this confirmation, the sender can assume that the data before this sequence number has been received normally. It is used to solve the problem of packet loss.

Control bits:

  • ACK: When this bit is 1, the "acknowledgement" field becomes valid. TCP stipulates that this bit must be set to 1 except for the SYN packet when the initial connection is established.
  • RST: When this bit is 1, it indicates that an abnormality occurs in the TCP connection and the connection must be forcibly disconnected.
  • SYN: When this bit is 1, it indicates that a connection is to be established and the initial value of the sequence number is set in the "Sequence Number" field.
  • FIN: When this bit is 1, it means that no more data will be sent in the future and the connection is expected to be disconnected. When the communication is over and the connection is expected to be disconnected, the hosts on both sides of the communication can exchange TCP segments with the FIN bit set to 1.

Why is TCP protocol needed? At which layer does TCP work?

The IP layer is "unreliable". It does not guarantee the delivery of network packets, the in-order delivery of network packets, or the integrity of the data in the network packets.

Relationship between OSI reference model and TCP/IP

If the reliability of network data packets needs to be guaranteed, the upper layer (transport layer) TCP protocol needs to be responsible for it.

Because TCP is a reliable data transmission service working at the transport layer, it can ensure that the network packets received by the receiver are damage-free, gap-free, non-redundant and in-order.

What is TCP?

TCP is a connection-oriented, reliable, byte stream-based transport layer communication protocol.

  • Connection-oriented: It must be a one-to-one connection. Unlike the UDP protocol, one host can send messages to multiple hosts at the same time, which means one-to-many communication is impossible.
  • Reliable: No matter what kind of link changes occur in the network link, TCP can ensure that a message will reach the receiving end;
  • Byte stream: Messages are "boundless", so no matter how big they are, they can be transmitted. And messages are "ordered". When the "previous" message is not received, even if it receives the following bytes first, it cannot be thrown to the application layer for processing, and "duplicate" messages will be automatically discarded.

What is a TCP connection?

Let's take a look at how RFC 793 defines "connection":

Connections: The reliability and flow control mechanisms described above require that TCPs initialize and maintain certain status information for each data stream. The combination of this information, including sockets, sequence numbers, and window sizes, is called a connection.

Simply put, certain status information used to ensure reliability and flow control maintenance. The combination of this information, including Socket, sequence number and window size, is called a connection.

So we can know that establishing a TCP connection requires the client and the server to reach a consensus on the above three pieces of information.

  • Socket: consists of IP address and port number
  • Sequence number: used to solve disorder problems, etc.
  • Window size: used for flow control

How to uniquely identify a TCP connection?

The TCP tuple can uniquely identify a connection. The tuple includes the following:

  • Source Address
  • Source Port
  • Destination Address
  • Destination Port

TCP Quad

The source address and destination address fields (32 bits) are in the IP header and are used to send messages to the other host via the IP protocol.

The source port and destination port fields (16 bits) are in the TCP header and their function is to tell the TCP protocol to which process the message should be sent.

There is a server with an IP listening on a port. What is the maximum number of TCP connections?

The server usually listens on a fixed local port, waiting for the client's connection request.

Therefore, the client IP and port are variable, and their theoretical value calculation formula is as follows:

For IPv4, the maximum number of client IP addresses is 2 to the power of 32, and the maximum number of client ports is 2 to the power of 16, which is the maximum number of TCP connections on a single server, approximately 2 to the power of 48.

Of course, the maximum number of concurrent TCP connections on the server side is far from reaching the theoretical upper limit and is affected by the following factors:

  • File descriptor limit: Each TCP connection is a file. If the file descriptors are full, too many open files will occur. Linux has three restrictions on the number of open file descriptors:

System level: The maximum number of files that can be opened by the current system can be viewed through cat /proc/sys/fs/file-max;

User level: specifies the maximum number of files that a user can open, which can be viewed by running cat /etc/security/limits.conf;

Process level: the maximum number of open files that a single process can open, which can be viewed through cat /proc/sys/fs/nr_open;

  • Memory limitation: Each TCP connection takes up a certain amount of memory. The operating system's memory is limited. If the memory resources are full, OOM will occur.

What is the difference between UDP and TCP? What are their respective application scenarios?

UDP does not provide complex control mechanisms and uses IP to provide "connectionless" communication services.

The UDP protocol is really very simple, with only 8 bytes (64 bits) in the header. The UDP header format is as follows:

UDP header format

  • Destination and source ports: mainly tell the UDP protocol to which process the message should be sent.
  • Packet length: This field stores the sum of the length of the UDP header and the length of the data.
  • Checksum: The checksum is designed to provide reliable UDP header and data to prevent the receipt of UDP packets that are damaged during network transmission.

Differences between TCP and UDP:

1. Connect

  • TCP is a connection-oriented transport layer protocol, and a connection must be established before data is transmitted.
  • UDP does not require a connection and transmits data instantly.

2. Service Target

  • TCP is a one-to-one two-point service, that is, a connection has only two endpoints.
  • UDP supports one-to-one, one-to-many, and many-to-many interactive communications .

3. Reliability

  • TCP delivers data reliably, and data can arrive on demand without errors, loss, or duplication.
  • UDP is a best-effort delivery method and does not guarantee reliable delivery of data.

4. Congestion control and flow control

  • TCP has congestion control and flow control mechanisms to ensure the security of data transmission.
  • UDP does not have this feature. Even if the network is very congested, it will not affect the sending rate of UDP.

5. Header Overhead

  • The TCP header is relatively long and has a certain amount of overhead. The header is 20 bytes when the "option" field is not used. If the "option" field is used, it will become longer.
  • The UDP header is only 8 bytes and is fixed, with low overhead.

6. Transmission method

  • TCP is a streaming transmission with no boundaries, but guarantees order and reliability.
  • UDP sends packets one by one, which has boundaries, but packet loss and disorder may occur.

7. Different Sharding

  • If the TCP data size is larger than the MSS size, it will be fragmented at the transport layer. After the target host receives it, it will also assemble the TCP data packet at the transport layer. If a fragment is lost in the middle, only the lost fragment needs to be transmitted.
  • If the UDP data size is larger than the MTU size, it will be fragmented at the IP layer. After the target host receives it, it will assemble the data at the IP layer and then pass it to the transport layer.

TCP and UDP application scenarios:

Since TCP is connection-oriented and can ensure reliable data delivery, it is often used for:

  • FTP file transfer;
  • HTTP/HTTPS;

Since UDP is connectionless, it can send data at any time. In addition, UDP processing is simple and efficient, so it is often used for:

  • Communications with a small amount of packets, such as DNS, SNMP, etc.
  • Multimedia communications such as video and audio;
  • Broadcast communications;

Why does the UDP header not have a "header length" field, while the TCP header has a "header length" field?

The reason is that TCP has a variable-length "option" field, while the UDP header length does not change, so there is no need for an extra field to record the UDP header length.

Why does the UDP header have a "packet length" field, but the TCP header does not have a "packet length" field?

Let's first talk about how TCP calculates the payload data length:

The total IP length and IP header length are known in the IP header format. The TCP header length is known in the TCP header format, so the length of the TCP data can be obtained.

Everyone was curious and asked: "UDP is also based on the IP layer, so the data length of UDP can also be calculated using this formula? Why is there still a "packet length"? "

Asking this question, it does seem that the UDP "packet length" is redundant.

Because for the convenience of network equipment hardware design and processing, the header length needs to be an integer multiple of 4 bytes.

If the UDP "Packet Length" field is removed, the UDP header length will not be an integer multiple of 4 bytes. So Xiaolin thinks that this may be to complete the UDP header length is an integer multiple of 4 bytes, so the "Packet Length" field is added.

2. TCP connection establishment

TCP three-way handshake process and state transition

TCP is a connection-oriented protocol, so a connection must be established before using TCP, and the connection is established through a three-way handshake. The three-way handshake process is as follows (PS: the SYS_SENT state in the figure is SYN_SENT, I won’t redraw it for laziness):

TCP three-way handshake

  • At the beginning, both the client and the server are in the CLOSED state. First, the server actively listens to a port and is in the LISTEN state.

The first message - SYN message

The client will randomly initialize the sequence number (client_isn), put this sequence number in the "sequence number" field of the TCP header, and set the SYN flag to 1, indicating a SYN message. Then the first SYN message is sent to the server, indicating that a connection is initiated to the server. This message does not contain application layer data, and the client is then in the SYN-SENT state.

The second message - SYN + ACK message

  • After receiving the SYN message from the client, the server first randomly initializes its own sequence number (server_isn), fills this sequence number into the "sequence number" field of the TCP header, and then fills the "confirmation number" field of the TCP header into client_isn + 1, and then sets the SYN and ACK flags to 1. Finally, the message is sent to the client, which does not contain application layer data, and the server is in the SYN-RCVD state.

The third message - ACK message

  • After the client receives the message from the server, it must also respond to the server with the last response message. First, the ACK flag in the TCP header of the response message is set to 1, and then the "confirmation response number" field is filled with server_isn + 1. Finally, the message is sent to the server. This message can carry data from the client to the server, and then the client is in the ESTABLISHED state.
  • After receiving the response message from the client, the server also enters the ESTABLISHED state.

From the above process, we can find that the third handshake can carry data, while the first two handshakes cannot carry data. This is also a frequently asked question in interviews.

Once the three-way handshake is completed, both parties are in the ESTABLISHED state. At this time, the connection has been established and the client and server can send data to each other.

How to view TCP status in Linux?

To view the TCP connection status, you can use the netstat -napt command in Linux.

TCP connection status check

Why is it a three-way handshake? Not two or four?

I believe the most common answer is: "Because the three-way handshake can ensure that both parties have the ability to receive and send."

There is nothing wrong with this answer, but it is one-sided and does not state the main reason.

Earlier we learned what a TCP connection is:

  • Certain state information is used to ensure reliability and flow control maintenance. The combination of this information, including socket, sequence number and window size, is called a connection.

Therefore, it is important to understand why a three-way handshake is required to initialize the Socket, sequence number, and window size and establish a TCP connection.

Next, we analyze the reasons for the three-way handshake from three aspects:

  • The three-way handshake can prevent the initialization of repeated historical connections (main reason)
  • Three-way handshake is required to synchronize the initial sequence numbers of both parties.
  • Three-way handshake can avoid resource waste

Reason 1: Avoid historical connections

Let's look at the primary reason why TCP connections use a three-way handshake, as stated in RFC 793:

The principle reason for the three-way handshake is to prevent old duplicate connection initiations from causing confusion.

In short, the primary reason for the three-way handshake is to prevent confusion caused by old duplicate connection initializations.

Let's consider a scenario where the client first sends a SYN (seq = 90) message, but it is blocked by the network and the server does not receive it. Then the client resends the SYN (seq = 100) message. Note that it is not a retransmission of SYN. ​​The sequence number of the retransmitted SYN is the same. Let's see how the three-way handshake prevents historical connections:

Three-way handshake to avoid historical connections

The client sends multiple SYN packets to establish a connection in succession. In the case of network congestion:

  • An "old SYN message" arrives at the server earlier than the "latest SYN" message;
  • Then the server will return a SYN + ACK message to the client;
  • After receiving the message, the client can determine that this is a historical connection (the sequence number has expired or timed out) based on its own context, and then the client will send a RST message to the server to indicate that the connection is terminated.

If it is a two-way handshake connection, the historical connection cannot be blocked. Then why can't the TCP two-way handshake block the historical connection?

Let me first state the conclusion. This is mainly because in the case of two handshakes, the "passive initiator" has no intermediate state to give to the "active initiator" to prevent the historical connection, which causes the "passive initiator" to establish a historical connection, resulting in a waste of resources.

Think about it, in the case of two handshakes, the "passive initiator" enters the ESTABLISHED state after receiving the SYN message, which means that it can send data to the other party at this time, but the "active initiator" has not entered the ESTABLISHED state at this time. Assuming that this is a historical connection, the active initiator determines that this connection is a historical connection, then it will return a RST message to disconnect the connection, and the "passive initiator" enters the ESTABLISHED state during the first handshake, so it can send data, but it does not know that this is a historical connection. It will only disconnect after receiving the RST message.

Two-way handshake cannot prevent historical connections

It can be seen that in the above scenario, the "passive initiator" did not block the historical connection before sending data to the "active initiator", causing the "passive initiator" to establish a historical connection and send data in vain, which completely wasted the resources of the "passive initiator".

Therefore, to solve this problem, it is best to block the historical connection before the "passive initiator" sends data, that is, before establishing a connection, so as not to waste resources. To achieve this function, a three-way handshake is required.

Therefore, the main reason why TCP uses a three-way handshake to establish a connection is to prevent "historical connections" from initializing the connection.

Reason 2: Synchronize the initial sequence numbers of both parties

Both parties in the TCP protocol must maintain a "sequence number". The sequence number is a key factor in reliable transmission. Its functions are:

  • The receiver can remove duplicate data;
  • The receiver can receive the packets in order according to their sequence numbers;
  • It can identify which of the sent data packets have been received by the other party (known through the sequence number in the ACK message);

It can be seen that the sequence number plays a very important role in the TCP connection. Therefore, when the client sends a SYN message carrying the "initial sequence number", the server needs to send an ACK response message to indicate that the client's SYN message has been successfully received by the server. When the server sends the "initial sequence number" to the client, it still needs to get an ACK response from the client. This back and forth can ensure that the initial sequence numbers of both parties can be reliably synchronized.

Four-way handshake and three-way handshake

The four-way handshake can actually reliably synchronize the initialization sequence numbers of both parties, but because the second and third steps can be optimized into one step, it becomes a "three-way handshake".

The two-way handshake only ensures that the initial sequence number of one party can be successfully received by the other party, but there is no way to ensure that the initial sequence numbers of both parties can be confirmed and received.

Reason 3: Avoid waste of resources

If there is only a "two-way handshake", when the client's SYN request connection is blocked in the network and the client does not receive an ACK message, it will resend the SYN. ​​Since there is no third handshake, the server does not know whether the client has received the ACK confirmation signal it sent to establish a connection. Therefore, each time a SYN is received, it can only actively establish a connection. What will this cause?

If the client's SYN is blocked and the SYN message is sent repeatedly, the server will establish multiple redundant invalid links after receiving the request, resulting in unnecessary waste of resources.

Two handshakes will cause a waste of resources

That is, the two-way handshake will cause message retention, and the server will repeatedly accept useless connection request SYN messages, resulting in repeated allocation of resources.

summary

When TCP establishes a connection, the three-way handshake can prevent the establishment of a historical connection, reduce unnecessary resource consumption for both parties, and help both parties synchronize and initialize the sequence number. The sequence number can ensure that data packets are not repeated, discarded, and transmitted in order.

Reasons for not using "two-way handshake" and "four-way handshake":

  • "Two-way handshake": cannot prevent the establishment of historical connections, resulting in a waste of resources on both sides, and cannot reliably synchronize the sequence numbers of both sides;
  • "Four-way handshake": Three handshakes are enough to establish a reliable connection in theory, so there is no need to use more communications.

Why is the initialization sequence number required to be different each time a TCP connection is established?

There are two main reasons:

  • To prevent historical messages from being received by the next connection with the same quadruple (main aspect);
  • For security reasons, it is necessary to prevent hackers from forging TCP packets with the same sequence number from being received by the other party.

Next, let’s talk about the first point in detail.

Assume that each time a connection is established, the initialization sequence numbers of the client and server start from 0:

The process is as follows:

  • The client and server establish a TCP connection. When the client sends a data packet, it is blocked by the network. At this time, the server process is restarted, so it sends a RST message to disconnect .
  • Then, the client establishes another connection with the server using the same quadruple as the previous connection;
  • After the new connection is established, the data packet that was blocked by the network in the previous connection arrives at the server. The sequence number of the data packet happens to be within the server's receiving window, so the data packet will be received normally by the server, which will cause data confusion.

It can be seen that if the initialization sequence numbers of the client and the server are the same each time a connection is established, it is easy for the historical message to be received by the next connection with the same four-tuple.

If the initialization sequence numbers of the client and server are "different" each time a connection is established, there is a high probability that the sequence numbers of historical messages are "not" in the receiving window of the other party, thus avoiding historical messages to a large extent, as shown in the following figure:

On the contrary, if the initialization sequence numbers of the client and server are the same each time a connection is established, there is a high probability that the sequence number of the historical message is just within the receiving window of the other party, resulting in the successful reception of the historical message by the new connection.

Therefore, each time the initialization sequence number is different, it can largely avoid the historical message being received by the next connection with the same four-tuple. Note that it can be avoided to a large extent, not completely (because the sequence number will have the problem of wrapping, so it is necessary to use the timestamp mechanism to judge the historical message. For details, see the article: How does TCP avoid historical messages?).

How is the Initial Sequence Number (ISN) randomly generated?

The starting ISN is based on a clock that increments by 4 milliseconds + 1, making one revolution in 4.55 hours.

RFC793 mentions the random generation algorithm for the initialization sequence number ISN: ISN = M + F(localhost, localport, remotehost, remoteport).

  • M is a timer that increments every 4 milliseconds.
  • F is a hash algorithm that generates a random value based on the source IP, destination IP, source port, and destination port. To ensure that the hash algorithm cannot be easily deduced by the outside, the MD5 algorithm is a better choice.

As you can see, the random number will increase based on the clock timer, and it is basically impossible to randomly generate the same initialization sequence number.

Since the IP layer can fragment, why does the TCP layer still need MSS?

Let's first get to know MTU and MSS:

MTU and MSS

  • MTU: The maximum length of a network packet, usually 1500 bytes in Ethernet;
  • MSS: The maximum length of TCP data that a network packet can contain after removing the IP and TCP headers;

If the entire TCP message (header + data) is handed over to the IP layer for fragmentation, what abnormality will occur?

When the IP layer has data (TCP header + TCP data) that exceeds the MTU size to be sent, the IP layer must fragment the data into several pieces to ensure that each piece is smaller than the MTU. After an IP datagram is fragmented, it is reassembled by the IP layer of the target host and then handed over to the upper TCP transport layer.

This seems to be in good order, but there is a hidden danger. If an IP fragment is lost, all fragments of the entire IP message must be retransmitted.

Because the IP layer itself does not have a timeout retransmission mechanism, the TCP of the transport layer is responsible for timeout and retransmission.

When the receiver finds that a piece of the TCP message (header + data) is lost, it will not respond to the other party with an ACK. Then the sender's TCP will resend the "entire TCP message (header + data)" after timeout.

Therefore, it can be concluded that fragmented transmission at the IP layer is very inefficient.

Therefore, in order to achieve the best transmission efficiency, the TCP protocol usually negotiates the MSS value of both parties when establishing a connection. When the TCP layer finds that the data exceeds the MSS, it will first fragment it. Of course, the length of the IP packet formed by it will not be greater than the MTU, so IP fragmentation is naturally not needed.

Negotiate MSS during the handshake phase

After TCP layer fragmentation, if a TCP fragment is lost, it is retransmitted in units of MSS instead of retransmitting all fragments, which greatly increases the efficiency of retransmission.

What happens if the first handshake is lost?

When the client wants to establish a TCP connection with the server, the first thing it sends is a SYN message, and then enters the SYN_SENT state.

After that, if the client fails to receive the SYN-ACK message from the server (the second handshake), the "timeout retransmission" mechanism will be triggered and the SYN message will be retransmitted.

Different versions of operating systems may have different timeout periods, some are 1 second, and some are 3 seconds. This timeout period is hard-coded in the kernel. If you want to change it, you need to recompile the kernel, which is troublesome.

When the client does not receive the SYN-ACK message from the server after 1 second, the client will resend the SYN message. How many times should it resend?

In Linux, the maximum number of retransmissions of the client's SYN message is controlled by the tcp_syn_retries kernel parameter. This parameter is customizable and the default value is generally 5.

Usually, the first timeout retransmission is after 1 second, the second timeout retransmission is after 2 seconds, the third timeout retransmission is after 4 seconds, the fourth timeout retransmission is after 8 seconds, and the fifth timeout retransmission is after 16 seconds. That's right, each timeout is twice as long as the previous one.

After the fifth timeout retransmission, the client will continue to wait for 32 seconds. If the server still does not respond with an ACK, the client will stop sending SYN packets and disconnect the TCP connection.

Therefore, the total time is 1+2+4+8+16+32=63 seconds, about 1 minute.

What happens if the second handshake is lost?

When the server receives the first handshake from the client, it will return a SYN-ACK message to the client. This is the second handshake. At this time, the server will enter the SYN_RCVD state.

The SYN-ACK message of the second handshake actually has two purposes:

  • The ACK in the second handshake is a confirmation message for the first handshake;
  • The SYN in the second handshake is the message initiated by the server to establish a TCP connection;

So, if the second handshake is lost, something interesting is sent. What happens?

Because the second handshake message contains the ACK confirmation message of the first handshake to the client, if the client does not receive the second handshake for a long time, then the client thinks that its SYN message (first handshake) may be lost, so the client will trigger the timeout retransmission mechanism and retransmit the SYN message.

Then, because the second handshake contains the server's SYN message, when the client receives it, it needs to send an ACK confirmation message to the server (the third handshake), and the server will consider that the SYN message has been received by the client.

Then, if the second handshake is lost, the server will not receive the third handshake, so the server will trigger the timeout retransmission mechanism and retransmit the SYN-ACK message.

Under Linux, the maximum number of retransmissions of a SYN-ACK packet is determined by the tcp_synack_retries kernel parameter, and the default value is 5.

Therefore, when the second handshake is lost, both the client and the server will retransmit:

  • The client will retransmit the SYN message, which is the first handshake. The maximum number of retransmissions is determined by the tcp_syn_retries kernel parameter;
  • The server will retransmit the SYN-AKC message, which is the second handshake. The maximum number of retransmissions is determined by the tcp_synack_retries kernel parameter.

What happens if the third handshake is lost?

After the client receives the SYN-ACK message from the server, it will return an ACK message to the server, which is the third handshake. At this time, the client state enters the ESTABLISH state.

Because the ACK of the third handshake is a confirmation message for the SYN of the second handshake, when the third handshake is lost, if the server party is slow to receive the confirmation message, it will trigger the timeout retransmission mechanism and retransmit the SYN-ACK message until the third handshake is received or the maximum number of retransmissions is reached.

Note that ACK messages will not be retransmitted. When ACK is lost, the other party will retransmit the corresponding message.

What is a SYN attack? How to avoid a SYN attack?

SYN Attack

We all know that establishing a TCP connection requires three handshakes. Suppose an attacker forges SYN packets with different IP addresses in a short period of time. Every time the server receives a SYN packet, it enters the SYN_RCVD state. However, the ACK + SYN packet sent by the server cannot get an ACK response from the unknown IP host. Over time, the semi-connected queue of the server will be filled up, making the server unable to serve normal users.

SYN Attack

Avoid SYN attack method 1:

One solution is to modify the Linux kernel parameters to control the queue size and what to do when the queue is full.

  • When the speed at which the network card receives data packets is faster than the speed at which the kernel processes them, a queue will be created to store these data packets. The maximum value of this queue is controlled by the following parameters:
 net .core .netdev_max_backlog
  • Maximum number of connections in SYN_RCVD state:
 net .ipv4 .tcp_max_syn_backlog
  • When the processing capacity is exceeded, a RST is directly returned to the new SYN and the connection is discarded:
 net .ipv4 .tcp_abort_on_overflow

Method 2 to avoid SYN attacks:

Let's first look at how the Linux kernel's SYN queue (semi-connection queue) and Accpet queue (full connection queue) work?

Normal process

Normal process:

  • When the server receives the SYN message from the client, it adds it to the kernel's "SYN queue";
  • Then send SYN + ACK to the client and wait for the client to respond with an ACK message;
  • After the server receives the ACK message, it removes it from the "SYN Queue" and puts it into the "Accept Queue";
  • The application takes out the connection from the "Accept Queue" by calling the accpet() socket interface.

Application is too slow

Application is too slow:

If the application is too slow, the "Accept Queue" will be full.


Under SYN attack

Under SYN attack:

  • If the server is continuously attacked by SYN, the SYN queue (semi-connection queue) will be full, making it impossible to establish new connections.

The tcp_syncookies method can be used to deal with SYN attacks:

 net .ipv4 .tcp_syncookies = 1 

tcp_syncookies to deal with SYN attacks

  • When the "SYN queue" is full, subsequent servers receiving SYN packets will not enter the "SYN queue";
  • Calculate a cookie value and return it to the client with the "sequence number" in SYN + ACK;
  • When the server receives the response message from the client, it will check the legitimacy of the ACK packet. If it is legal, it will be directly put into the "Accept Queue";
  • Finally, the application calls the accpet() socket interface to retrieve the connection from the "Accept Queue".

3. TCP connection disconnection

TCP four-wave process and state transition

All good things must come to an end, and this is also true for TCP connections. TCP disconnects by waving four times.

Both parties can actively disconnect. After disconnection, the "resources" in the host will be released. The process of four waves is as follows:

The client actively closes the connection - TCP wave four times

  • The client intends to close the connection and sends a message with the FIN flag in the TCP header set to 1, which is also called a FIN message. The client then enters the FIN_WAIT_1 state.
  • After receiving the message, the server sends an ACK response message to the client, and then the server enters the CLOSED_WAIT state.
  • After the client receives the ACK response message from the server, it enters the FIN_WAIT_2 state.
  • After the server has processed the data, it also sends a FIN message to the client, and then the server enters the LAST_ACK state.
  • After receiving the FIN message from the server, the client returns an ACK response message and then enters the TIME_WAIT state .
  • After the server receives the ACK response message, it enters the CLOSED state, and the server has completed the closing of the connection .
  • After 2MSL, the client automatically enters the CLOSED state, and the client also completes the connection closure.

You can see that a FIN and an ACK are required in each direction, so it is often called four waves.

One thing to note here is that only when the connection is closed actively will there be a TIME_WAIT state.

Why does it take four times to wave?

If we review the process of both parties sending FIN packets four times, we can understand why it is necessary to do it four times.

  • When closing the connection, when the client sends a FIN to the server, it only means that the client will no longer send data but can still receive data.
  • When the server receives the FIN message from the client, it first returns an ACK response message. The server may still have data to process and send. When the server no longer sends data, it sends a FIN message to the client to indicate that it agrees to close the connection now.

From the above process, we can see that the server usually needs to wait for the data to be sent and processed, so the server's ACK and FIN are generally sent separately, resulting in one more handshake than the three-way handshake.

What happens if the first wave is lost?

When the client (the active closing party) calls the close function, it sends a FIN message to the server, trying to disconnect from the server. At this time, the client's connection enters the FIN_WAIT_1 state.

Under normal circumstances, if the ACK from the server (passive closing party) is received in time, it will quickly change to the FIN_WAIT2 state.

If the first wave is lost, and the client fails to receive the ACK from the passive party, the timeout retransmission mechanism will be triggered and the FIN message will be retransmitted. The number of retransmissions is controlled by the tcp_orphan_retries parameter.

When the client retransmits the FIN message for more than tcp_orphan_retries times, it will no longer send FIN messages and directly enter the close state.

What happens if the second wave is lost?

When the server receives the first wave from the client, it will first return an ACK confirmation message, and the server connection will enter the CLOSE_WAIT state.

As we mentioned earlier, ACK messages will not be retransmitted, so if the server's second wave is lost, the client will trigger the timeout retransmission mechanism and retransmit the FIN message until it receives the server's second wave or reaches the maximum number of retransmissions.

It should be mentioned here that when the client receives the second wave, that is, after receiving the ACK message sent by the server, the client will be in the FIN_WAIT2 state. In this state, it needs to wait for the server to send the third wave, that is, the FIN message from the server.

For the connection closed by the close function, since data can no longer be sent and received, the FIN_WAIT2 state cannot last too long. The tcp_fin_timeout controls how long the connection lasts in this state. The default value is 60 seconds.

This means that for a connection closed by calling close, if no FIN message is received after 60 seconds, the client (the active closing party) will directly close the connection.

What happens if the third wave is lost?

When the server (passive closing party) receives the FIN message from the client (active closing party), the kernel will automatically reply with ACK, and the connection is in the CLOSE_WAIT state. As the name suggests, it means waiting for the application process to call the close function to close the connection.

At this time, the kernel has no right to close the connection on behalf of the process. The process must actively call the close function to trigger the server to send a FIN message.

When the server is in the CLOSE_WAIT state and the close function is called, the kernel will send a FIN message and the connection will enter the LAST_ACK state, waiting for the client to return ACK to confirm the connection is closed.

If the ACK is not received for a long time, the server will resend the FIN message. The number of retransmissions is still controlled by the tcp_orphan_retries parameter, which is the same as the way the client resends the FIN message.

What happens if the fourth wave is lost?

When the client receives the FIN message of the third wave from the server, it will return an ACK message, which is the fourth wave. At this time, the client connection enters the TIME_WAIT state.

In Linux systems, the TIME_WAIT state will last for 2MSL before entering the closed state.

Then, the server (passive closing party) remains in the LAST_ACK state until it receives the ACK message.

If the ACK message of the fourth wave does not reach the server, the server will resend the FIN message. The number of retransmissions is still controlled by the tcp_orphan_retries parameter introduced earlier.

Why is the TIME_WAIT waiting time 2MSL?

MSL is Maximum Segment Lifetime, the maximum survival time of a message. It is the longest time that any message can exist on the network. If it exceeds this time, the message will be discarded. Because TCP messages are based on the IP protocol, and there is a TTL field in the IP header, which is the maximum number of routes that an IP datagram can pass through. This value decreases by 1 for each router that processes it. When this value is 0, the datagram will be discarded, and an ICMP message will be sent to notify the source host.

The difference between MSL and TTL: The unit of MSL is time, while TTL is the number of routing hops. Therefore, MSL should be greater than or equal to the time when TTL is 0 to ensure that the message has been naturally destroyed.

The TTL value is usually 64. Linux sets MSL to 30 seconds, which means that Linux believes that the time for a data packet to pass through 64 routers will not exceed 30 seconds. If it exceeds this time, it is considered that the packet has disappeared in the network.

TIME_WAIT waits for 2 times of MSL. A more reasonable explanation is that there may be data packets from the sender in the network. When these data packets from the sender are processed by the receiver, they will send a response to the other party, so it takes 2 times the time to wait for a round trip.

For example, if the passive closing party does not receive the last ACK message of the disconnection, it will trigger a timeout and resend the FIN message. After the other party receives the FIN, it will resend ACK to the passive closing party. The total time is exactly 2 MSLs.

It can be seen that the 2MSL duration is equivalent to allowing at least one message loss. For example, if the ACK is lost within one MSL, the FIN resent by the passive party will arrive within the second MSL, and the connection in the TIME_WAIT state can cope with it.

Why not 4 or 8 MSLs? You can imagine a bad network with a packet loss rate of 1%. The probability of losing packets twice in a row is only 1 in 10,000. This probability is so small that it is more cost-effective to ignore it than to solve it.

The 2MSL time starts from the time the client sends ACK after receiving FIN. If the client receives a FIN message resent by the server during the TIME-WAIT time because the client's ACK is not transmitted to the server, the 2MSL time will be restarted.

In Linux, 2MSL defaults to 60 seconds, so 1MSL is 30 seconds. The Linux system stays in TIME_WAIT for a fixed 60 seconds.

Its name defined in the Linux kernel code is TCP_TIMEWAIT_LEN:

 #define TCP_TIMEWAIT_LEN ( 60 * HZ ) /* how long to wait to destroy TIME-WAIT
state, about 60 seconds */

If you want to change the length of TIME_WAIT, you can only modify the value of TCP_TIMEWAIT_LEN in the Linux kernel code and recompile the Linux kernel.

Why is TIME_WAIT state needed?

The TIME-WAIT state will only occur on the party that actively initiates the closing of the connection.

The TIME-WAIT state is needed mainly for two reasons:

  • Prevent data in historical connections from being incorrectly received by subsequent connections with the same four-tuple;
  • Ensure that the party that "passively closes the connection" can be closed correctly;

Reason 1: Prevent data in historical connections from being erroneously received by subsequent connections with the same quadruple.

To better understand this reason, let us first understand the sequence number (SEQ) and the initial sequence number (ISN).

  • The sequence number is a TCP header field that identifies a byte of the data stream from the TCP sender to the TCP receiver. Because TCP is a reliable protocol for byte streams, in order to ensure the order and reliability of messages, TCP assigns a number to each byte in each transmission direction to facilitate confirmation after successful transmission, retransmission after loss, and to ensure that there is no disorder at the receiving end. The sequence number is a 32-bit unsigned number, so it loops back to 0 after reaching 4G.
  • Initial sequence number, when TCP establishes a connection, the client and server will each generate an initial sequence number, which is a random number generated based on the clock to ensure that each connection has a different initial sequence number. The initial sequence number can be regarded as a 32-bit counter, the value of which increases by 1 every 4 microseconds, and a cycle takes 4.55 hours.

I captured a packet for you. The Seq in the figure below is the sequence number, and the red boxes are the initial sequence numbers generated by the client and the server respectively.

TCP packet capture

As we know above, the sequence number and the initialization sequence number do not increase infinitely, and they will wrap around to the initial value, which means that it is impossible to judge the new and old data based on the sequence number.

Assuming that TIME-WAIT has no waiting time or the time is too short, what will happen after the delayed data packet arrives?

The TIME-WAIT time is too short and the datagram of the old connection is received.

As shown above:

  • The SEQ = 301 message sent by the server before closing the connection was delayed by the network.
  • Then, the server reopened a new connection with the same four-tuple. The previously delayed SEQ = 301 arrived at the client at this time, and the sequence number of the data packet was just within the client's receiving window. Therefore, the client will receive this data packet normally, but this data packet is left over from the previous connection, which will cause serious problems such as data confusion.

In order to prevent data in historical connections from being received incorrectly by the connection of the same quadruple, TCP designed the TIME_WAIT state, which will last for 2MSL duration, which is enough for packets in both directions to be discarded, so that the original connected packets will naturally disappear in the network, and the packets that appear again must be generated by the newly established connection.

Reason 2: Ensure that the party that "passively closes the connection" can be closed correctly.

In RFC 793, another important role of TIME-WAIT is:

TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.

That is, the function of TIME-WAIT is to wait enough time to ensure that the last ACK can be received by the passive shutdown party, thereby helping it shut down normally.

If the last ACK message (the fourth wave) of the client (the passive shutdown party) is lost in the network, then according to the TCP reliability principle, the server (the passive shutdown party) will resend the FIN message.

Suppose the client does not have a TIME_WAIT state, but will directly enter the CLOSED state after sending the last ACK message. If the ACK message is lost, the server will retransmit the FIN message, and then the client has entered the closed state. After receiving the FIN message retransmitted by the server, it will return the RST message.

TIME-WAIT The time is too short and the connection is not closed properly

As shown in the picture above:

  • If the last ACK message of the client waved four times in the red box in the red box of the red client, if the client TIME-WAIT is too short or not, it will directly enter the CLOSED state, and the server will always be in the LASE_ACK state.
  • When the client initiates the SYN request message to establish the connection, the server will send the RST message to the client, and the connection establishment process will be terminated.

The server receives this RST and interprets it as an error (Connection reset by peer), which is not an elegant way to terminate for a reliable protocol.

To prevent this from happening, the client must wait long enough to ensure that the peer receives the ACK. If the peer does not receive the ACK, the TCP retransmission mechanism will be triggered, and the server will resend a FIN, which will take just two MSLs.

TIME-WAIT time is normal, ensuring that the connection is closed normally

But you may say that the resent ACK may still be lost, yes, but TCP has been waiting for so long and has been considered to be the ultimate righteousness.

What are the dangers of too much TIME_WAIT?

There are two main hazards of too much TIME-WAIT state:

  • The first is memory resource usage;
  • The second is the occupation of port resources, and a TCP connection consumes at least one local port;

The second hazard will cause serious consequences. You should know that port resources are also limited. Generally, the ports that can be opened are 32768~61000, and can also be specified by setting the following parameters:

 net .ipv4 .ip_local_port_range

If the TIME_WAIT status of the "initiating connection party" is too large and all port resources are occupied, it will cause the new connection to be created.

The client (initiating the connection party) is restricted by port resources:

  • If too many TIME_WAIT clients will cause the port resources to be occupied, because there are only 65536 ports, and being full will cause the new connection to be unable to be created.

The server (passive connectivity party) is restricted by system resources:

  • Since a quadruple represents a TCP connection, the server can theoretically establish many connections, and the server does only listen to one port. However, too many connections will occupy system resources, such as file descriptors, memory resources, CPU resources, thread resources, etc.

How to optimize TIME_WAIT?

Here are several ways to optimize TIME-WAIT, all with advantages and disadvantages:

  • Turn on the net.ipv4.tcp_tw_reuse and net.ipv4.tcp_timestamps options;
  • net.ipv4.tcp_max_tw_buckets;
  • Use SO_LINGER in the program, and the application is forced to use RST to close.

Method 1: net.ipv4.tcp_tw_reuse and tcp_timestamps

After the following Linux kernel parameters are enabled, the socket in TIME_WAIT can be reused for the new connection.

One thing to note is that the tcp_tw_reuse function can only be used by the client (connection initiator), because this function is enabled, when the connect() function is called, the kernel will randomly find a connection with a time_wait status of more than 1 second for the new connection to be reused.

 net .ipv4 .tcp_tw_reuse = 1

There is another prerequisite for using this option, which requires the support for TCP timestamps to be turned on, that is:

 net .ipv4 .tcp_timestamps = 1 (default is 1 )

The timestamp field is in the "option" of the TCP header. It consists of a total of 8 bytes to represent the timestamp. The first 4-byte field is used to save the time when the data packet is sent, and the second 4-byte field is used to save the time when the data packet was last received.

Due to the introduction of timestamps, the 2MSL problem we mentioned earlier no longer exists because duplicate packets will be discarded naturally due to the expiration of timestamps.

Method 2: net.ipv4.tcp_max_tw_buckets

The default value is 18000. Once the number of connections in TIME_WAIT in the system exceeds this value, the system will reset the subsequent TIME_WAIT connection status. This method is relatively violent.

Method 3: Use SO_LINGER in the program

We can set the behavior of calling close to close the connection by setting the socket options.

 struct linger so_linger ;
so_linger .l_onoff = 1 ;
so_linger .l_linger = 0 ;
setsockopt ( s , SOL_SOCKET , SO_LINGER , & so_linger , sizeof ( so_linger ) ) ;

If l_onoff is non-zero and l_linger is 0, a RST flag will be sent to the peer immediately after close is called. The TCP connection will skip four waves, i.e. the TIME_WAIT state, and will be closed directly.

However, this provides a possibility to cross the TIME_WAIT state, but it is a very dangerous behavior and is not worth promoting.

The methods introduced above are all trying to get past the TIME_WAIT state, which is actually not good. Although the TIME_WAIT state lasts a little long and seems unfriendly, it is designed to avoid messy things.

However, the book "UNIX Network Programming" says: TIME_WAIT is our friend, it is helpful to us, don't try to avoid this state, but should figure it out.

If the server wants to avoid too many connections in TIME_WAIT state, it should never actively disconnect the connection and let the client disconnect. The clients distributed in different locations will bear the TIME_WAIT.

What if the connection has been established but the client suddenly fails?

TCP has a mechanism that keeps alive. The principle of this mechanism is as follows:

Define a time period. During this period, if there is no connection-related activity, the TCP keep-alive mechanism will start to work. At every time interval, a probe message will be sent. The probe message contains very little data. If several consecutive probe messages are not responded to, the current TCP connection is considered to be dead, and the system kernel will notify the upper-level application of the error information.

In the Linux kernel, there are corresponding parameters to set the keep-alive time, the number of keep-alive detections, and the time interval of keep-alive detections. The following are the default values:

 net .ipv4 .tcp_keepalive_time = 7200
net .ipv4 .tcp_keepalive_intvl = 75
net .ipv4 .tcp_keepalive_probes = 9
  • tcp_keepalive_time=7200: means that the keep-alive time is 7200 seconds (2 hours), which means that if there is no connection-related activity within 2 hours, the keep-alive mechanism will be activated;
  • tcp_keepalive_intvl=75: means each detection interval is 75 seconds;
  • tcp_keepalive_probes=9: means that if there is no response after 9 detections, the other party is considered unreachable and the connection is terminated.

That is to say, in Linux system, it takes at least 2 hours, 11 minutes and 15 seconds to find a "dead" connection.

Note that if an application wants to use the TCP keep-alive mechanism, it needs to set the SO_KEEPALIVE option through the socket interface to take effect. If it is not set, it cannot use the TCP keep-alive mechanism.

If TCP keep alive is enabled, the following situations need to be considered:

  • The first type is that the peer program works normally. When the TCP keep-alive probe message is sent to the peer, the peer will respond normally, so that the TCP keep-alive time will be reset, waiting for the next TCP keep-alive time to arrive.
  • The second type is that the peer program crashes and restarts. When the TCP keeps active probe message is sent to the peer, the peer can respond, but since there is no valid information for the connection, an RST message will be generated, so you will soon find that the TCP connection has been reset.
  • The third type is that the peer program crashes, or the peer will not reach the message due to other reasons. When the TCP keep-alive detection message is sent to the peer, it will sink into the sea and there is no response. It will last several times and after the number of keep-alive detections is reached, TCP will report that the TCP connection has died.

The TCP keeps alive mechanism detection time is a bit long, and we can implement a heartbeat mechanism at the application layer ourselves.

For example, web service software generally provides the keepalive_timeout parameter to specify the timeout time of the HTTP long connection. If the timeout time of the HTTP long connection is set to be 60 seconds, the web service software will start a timer. If the client does not initiate a new request within 60 seconds after the last HTTP request, the timer time will be triggered to release the connection.

Heartbeat mechanism of web services

What happens if the connection has been established but the client's process crashes?

I did an experiment myself and used kill -9 to simulate the process crash. I found that after kill drops the process, the server will send a FIN message and wave to the client four times.

How should four Socket programming be used for TCP?

Client and server work based on TCP protocol

  • The server and client initialize the socket to get the file descriptor;
  • The server calls bind and will be bound to the IP address and port;
  • The server calls listen to listen;
  • The server calls accept and waits for the client to connect;
  • The client calls connect and initiates a connection request to the server address and port;
  • Server accept returns the file descriptor of the socket used for transmission;
  • The client calls write to write to write; the server calls read to read data;
  • When the client disconnects, close will be called. When the server reads the data, EOF will be read. After the data is processed, the server calls close, indicating that the connection is closed.

It should be noted here that when the server calls accept, if the connection is successful, a socket that has completed the connection will be returned, which will be used to transmit data in the future.

Therefore, the listener socket and the socket that is actually used to transmit data are "two" sockets, one is called the listener socket and the other is called the completed connection socket.

After a successful connection is established, both parties begin to read and write data through read and write functions, just like writing something into a file stream.

What is the meaning of parameter backlog when listening?

Two queues are maintained in the Linux kernel:

  • Semi-connection queue (SYN queue): Received a SYN connection establishment request, in the SYN_RCVD state;
  • Full connection queue (Accpet queue): The TCP three-time handshake process has been completed and is in ESTABLISHED state;

SYN queue and Accpet queue

 int listen ( int socketfd , int backlog )
  • Parameter 1: socketfd is the socketfd file descriptor.
  • Parameter 2: backlog, this parameter has certain changes in historical versions.

In the early days, the backlog of the Linux kernel was the SYN queue size, that is, the unfinished queue size.

After Linux kernel 2.2, backlog becomes an accept queue, which is the length of the queue that has been established for connections, so backlog is usually considered an accept queue.

However, the upper limit value is the size of the kernel parameter somaxconn, which means that the accpet queue length = min(backlog, somaxconn).

If you want to learn more about TCP semi-connection queues and full connection queues, you can read this article: What happens if the TCP semi-connection queues and full connection queues are full? How to deal with it?

accept What step does three handshakes happen?

Let’s first look at what is sent when the client connects to the server?

Client connection to the server

  • The client's protocol stack sends a SYN packet to the server and tells the server that the currently sending serial number client_isn, and the client enters the SYN_SENT state;
  • After receiving this packet, the server-side protocol stack performs an ACK reply with the client. The reply value is client_isn+1, indicating confirmation of the SYN packet client_isn. At the same time, the server also sends a SYN packet, telling the client that my sending serial number is server_isn, and the server enters the SYN_RCVD state;
  • After the client protocol stack receives the ACK, the application returns from the connect call, indicating that the one-way connection between the client and the server is successfully established, and the client's status is ESTABLISHED. At the same time, the client protocol stack will also respond to the SYN packet on the server side, and the reply data is server_isn+1;
  • After the reply packet arrives on the server side, the server-side protocol stack causes the accept blocking call to return. At this time, the one-way connection from the server to the client is also successfully established, and the server side also enters the ESTABLISHED state.

From the above description process, we can know that the client connect successfully returns in the second handshake, and the server accept successful returns in the third handshake.

The client called close. What is the process of disconnecting the connection?

Let’s see what happens when the client actively calls close?

The client calls the close process

  • The client calls close, indicating that the client has no data to send, then a FIN message will be sent to the server and enter the FIN_WAIT_1 state;
  • When the server receives the FIN message, the TCP protocol stack will insert a file ending character EOF for the FIN packet into the receiving buffer. The application can sense the FIN packet through read calls. This EOF will be placed after other received data that have been queued for, which means that the server needs to handle this abnormal situation, because EOF means that no additional data has arrived on the connection. At this time, the server enters the CLOSE_WAIT state;
  • Then, after processing the data, EOF will naturally be read, so close is also called to close to close its socket, which will cause the server to issue a FIN packet, which will then be in the LAST_ACK state;
  • The client receives the FIN packet from the server and sends an ACK confirmation packet to the server. At this time, the client will enter the TIME_WAIT state;
  • After the server receives the ACK confirmation packet, it enters the final CLOSE state;
  • After 2MSL time, the client also enters the CLOSE state;

<<:  Six ways 5G can save the global supply chain

>>:  Edge computing and 5G: What’s next for enterprise IT?

Recommend

Microsoft discontinues SQL Server on Windows Containers Beta project

In 2017, Microsoft launched the SQL Server on Win...

What else to look forward to in the communications industry in 2021?

[[373658]] This article is reprinted from the WeC...

How SD-WAN is changing the network services market

As technology continues to evolve, SD-WAN (wide-a...

Practical tips: Teach you step by step to solve the problem of WiFi interference

Suppose there is a large classroom that can accom...

Netty Getting Started Practice: Simulating IM Chat

Almost all frameworks we use have network communi...

Five common OSPF problems

I am Man Guodong, a lecturer at 51CTO Academy. On...

What is the difference between Industrial IoT and Consumer IoT?

Much has been written about the consumer Internet...

When will 5G become mainstream, or is it already mainstream?

Is 5G still waiting for a "killer app"?...

5G network equipment security assessment escort "new infrastructure"

Unlike 4G mobile communication technology, which ...

Dynamic routing OSPF basics, area division, LSA types, one minute to understand

1. OSPF Message OSPF protocol packets are directl...