Let's talk about TCP

Let's talk about TCP

In our daily development, we will more or less be involved in network transmission.

This article mainly summarizes some key points of TCP. As a developer, although there are so many infrastructures (frameworks, components) that help us shield these details, I still think it is helpful to understand some of its basic principles, especially when you encounter some difficult problems in a distributed environment, some principle knowledge may help you find the answer quickly.

[[266526]]

1. Origin

TCP is a transport layer protocol. Its full name is Transmission Control Protocol. This protocol is defined in IETF RFC 793.

Before the Internet, our computers were independent of each other, and each machine had its own operating system and kept itself running.

Therefore, in order to connect these computers and enable the transmission and interaction of data and resources based on a "channel", IETF developed the TCP protocol.

So, what is IETF? It is a respected technical organization called Internet Engineering Task Force.

This is an open organization founded in 1985. The important network protocols we mention now, such as HTTP, TCP, and IP, all come from this organization.

It can be said that IETF is the originator of the Internet. Without it, there would be no prosperous Internet today.

It is worth mentioning that IETF is not a powerful organization. It is a self-organized and self-managed team "from the people" that highly values ​​the spirit of freedom and equality.

The underlying mechanism of the entire Internet is composed of a set of standard network protocols. In order to make it easier to understand, people have defined the so-called "network layered model".

When studying computer network courses, two network models are mentioned, as follows:

  • The OSI model, full name Open System Interconnection, was proposed by the International Organization for Standardization (ISO). It was mainly used to solve the problem that various network technology vendors could not unify their protocols at that time. It abstracted the entire network architecture into 7 layers, from the lowest physical layer, data link layer to the top application layer.

In the past, many people were often confused by OSI and ISO due to the numerous terms.

  • TCP/IP, or TCP/IP Protocol Suite, is a communication model based on TCP and IP protocols. This model uses a protocol stack to implement many communication protocols and abstracts the communication system into four layers. The TCP/IP model originated from the ARPAnet project of the U.S. Department of Defense (abbreviated as DoD) and has since been maintained by the IETF organization.

As can be seen from the figure above, TCP/IP is basically a simplified version of the OSI model, and of course it is easier to understand.

Below the network layer, some technical means and concepts involved in the physical layer and data link layer are relatively obscure and difficult to understand. For example, optical cables, repeaters, switches, etc. require some professional background to fully understand.

For most software applications, it is undoubtedly simpler to refer to the parts below the network layer as the "network interface layer".

Therefore, although the OSI model is very complete and comprehensive, it has been eliminated by the TCP/IP model and is rarely mentioned today when Internet applications are prevalent.

Figure - TCP/IP Network Model

2. TCP Protocol

TCP is the most important transport layer protocol in the entire TCP/IP protocol suite. It defines a connection-oriented, reliable, stream-based transmission method.

HTTP is based on TCP, so it is not an exaggeration to say that TCP is one of the protocols of the entire Internet.

At the same time, when we use the HTTP protocol to implement interaction between application systems, we often have to deal with TCP, so it is necessary to understand some basic mechanisms.

1. What are the characteristics of TCP?

  • First of all, TCP is connection-based, that is, before data transmission, the client and the server (or the two communicating parties) need to establish a trusted connection. After the data transmission is completed, the connection is disconnected through a protocol, and the two communicating parties release resources. This involves the so-called "three-way handshake" and "four-way handshake".
  • Secondly, TCP is reliable. It defines a "timeout retransmission mechanism" for data packets. Simply put, each data packet will wait for a response after it is sent out. If no response is received within a specified time, the sender will retransmit a certain number of times to ensure reliable data transmission.
  • Finally, TCP is stream-based, which means that the application layer does not need to pay attention to the boundaries of data packets when transmitting data. TCP will automatically buffer, group, and merge data according to the network environment during data transmission.

This is completely different from the message-based protocol (UDP). Of course, stream-based transmission also ensures the order of data transmission and reception, so each data packet is accompanied by a sequence number belonging to the current connection.

2. How to understand full-duplex?

Full-duplex is a term in communications and is not often mentioned in the field of software development.

This means that data is transmitted in both directions at the same time, and TCP is a full-duplex based trusted transmission protocol.

Of course, UDP can also achieve full-duplex transmission, but TCP can only achieve point-to-point transmission and cannot support broadcast or multicast (packet)

Blackboard: The difference with half-duplex is that only one direction can be transmitted at a time

3. How are TCP packets organized?

The most primitive way to see through a protocol is to look at its data packets. The format of a TCP message is as follows:

The fields here include:

(1) Source port: indicates the port number used by the sender for the target host to respond.

(2) Destination port: indicates the port number of the target host to be connected.

(3) Sequence number: Indicates the order of the data packets sent, usually the sequence number of the last sent packet + 1.

If the data packet is the first packet in the entire TCP connection (SYN packet), the value is randomly generated.

(4) Acknowledgement number: Indicates the data that the local TCP has received. Its value represents the sequence number of the next byte expected to be sent by the other end.

In fact, it tells the other party that the bytes before this sequence number minus 1 have been received correctly.

If the data packet is the first packet in the entire TCP connection (SYN packet), the confirmation number is generally 0.

(5) Data offset: Indicates the total length of the TCP packet header (header length) in units of 32 bits (4 bytes), which is used to determine the starting position of the user data area.

In the absence of variable content, the TCP header size is 20 bytes, corresponding to a value of 5.

(6) Flag: Urgent flag (URG): When turned on, it indicates that this data packet is in an urgent state and should be processed first.

  • Confirmation flag (ACK): When turned on, it indicates that the confirmation number is valid, otherwise the confirmation number is ignored
  • Push flag (PSH): When turned on, it indicates that the data should be delivered to the application process as soon as possible, without having to wait until the cache is full, such as in the telnet scenario
  • Reset flag (RST): When turned on, it indicates that a TCP connection error has occurred and the data packet has been illegally rejected.
  • Synchronization flag (SYN): When turned on, it indicates that the connection is established.
  • Termination flag (FIN): When turned on, it indicates that a connection is released

(7) Window size: indicates the number of bytes of data packets expected to be received, used for congestion control.

(8) Checksum: Check the TCP message header and data area.

(9) Urgent pointer: In the emergency state (URG is on), it indicates the location (end) of the urgent data in the window.

(10) Options (variable): used to support some special variables, such as the maximum packet size (MSS).

(11) Padding: used to ensure that the variable option is an integer multiple of 32 bits.

Blackboard: Generally, the TCP header is 20 bytes, plus the 20-byte IP header, a data packet contains at least 40 bytes of header

3. TCP workflow

Chain refers to link, which is a concept at the physical layer, such as optical cable or wireless electromagnetic waves.

But the link mentioned here actually means network connection, that is, the concept of the IP upper layer.

Then, a normal TCP communication process includes link establishment (establishing a connection), data transmission, and link teardown (closing the connection).

As shown in the following figure:

(Picture from the Internet)

As shown in the figure above, when TCP is used for data transmission, it is inevitable to go through these two stages:

  • Three-way handshake to establish a connection
  • Perform data transfer, read and write on both sides
  • Wave four times to release the connection

Next, we will focus on the process of chain building and chain tearing down.

4. Three-way handshake

When establishing a TCP connection, three interactions are required, also known as a three-way handshake.

  • The client initiates a connection request, sends a SYN packet (SYN=i) to the server, and enters the SYN-SEND state, waiting for the server to confirm.
  • After receiving the SYN packet, the server must confirm the client's SYN (ack=i+1) and send a SYN packet (SYN=k) at the same time, that is, a SYN+ACK packet. At this time, the server enters the SYN-RECV state
  • The client receives the SYN+ACK packet from the server and sends a confirmation message ACK (ack=k+1) to the server. After that, the client and server enter the ESTABLISHED state and both parties can start transmitting data.

When talking about the three-way handshake, there are several issues that need attention:

Question 1. Why is it a three-way handshake?

This question is always asked in technical interviews. The original question is, can we shake hands twice, or four times?

The answer is that TCP is a reliable transmission, and when establishing a connection, it should go through the confirmation process at both ends, as shown in the above process.

Only in the case of a three-way handshake, when both the client and the server have gone through a true (SYN+ACK) confirmation process, is the connection considered credible.

In addition, if there are only two handshakes, once the network is unstable and causes the SYN packet to be retransmitted, it will directly lead to repeated connection establishment, wasting resources.

Question 2. What is a syn flood attack?

SYN flood is a classic DDOS attack method that exploits the vulnerability in the TCP three-way handshake.

In the figure above, you can see that when the server receives SYN, it enters the SYN-RECV state. The connection at this time is called a half-connection and will be written into a half-connection queue by the server.

Imagine if the attacker continuously sends a large number of SYN packets to the server in a short period of time without responding, the server's semi-connected queue will soon be filled up, making it unable to work.

The means of implementing syn flood can be to forge the source IP so that the server's response will never reach the client (the handshake cannot be completed);

Of course, the same purpose can be achieved by setting client firewall rules.

It is difficult to block syn floods. It can be mitigated by enabling syn_cookies, but this is usually not the best solution.

The best way is to solve it through a professional firewall. Basically all the big cloud computing companies have this capability.

Question 3. How to optimize semi-connected queues and fully connected queues

Here we mention a "semi-connection queue" (syns queue), and there is also a "full connection queue" (accept queue) corresponding to it.

The former is used to temporarily store incomplete connections, and the latter is a queue that a connection enters after it is successfully established.

The default size of the semi-connection queue can be adjusted via kernel parameters:

  1. echo 4096 >    /proc/ sys/net/ipv4/tcp_max_syn_backlog

Blackboard: tcpmaxsynbacklog is invalid when syncookies is enabled. The two options conflict.

For a full connection queue, if the server fails to remove the connection in it through the accept call in time, it will cause the queue to overflow (connection failure)

Kernel tuning method for the size of the full connection queue:

  1. echo 4096 >    /proc/ sys/net/core/somaxconn

So, is kernel tuning the only method that can affect these two parameters? The answer is no.

In fact, when calling socket listen at the application layer, it also supports setting a backlog parameter. The relationship between these parameters is as follows:

  1. Semi-connected queue length = min(backlog, kernel net.core.somaxconn, kernel tcp_max_syn_backlog)
  2. Full connection queue length = min(backlog, kernel net.core.somaxconn)

Blackboard: General application servers such as Netty and Tomcat support setting backlog parameters, but when actually tuning, you also need to consider the configuration of kernel parameters.

5. Four Waves

When releasing the connection, since TCP is full-duplex, both ends must close it separately. The process is as follows:

  • The client sends a FIN to close the data transmission from the client to the server, and the client enters the FINWAIT1 state.
  • After receiving FIN, the server sends an ACK to the client, confirming that the sequence number is the received sequence number + 1 (the same as SYN, one FIN occupies one sequence number). The server enters the CLOSEWAIT state, and the client enters the FINWAIT2 state.
  • The server sends a FIN to close the data transmission from the server to the client, and the server enters the LASTACK state.
  • After the client receives the FIN, it enters the TIMEWAIT state, and then sends an ACK to the server, confirming that the sequence number is the received sequence number + 1. The server enters the CLOSED state and completes the release.

There are two ways to close a connection: active closing and passive closing. To simplify the understanding, we take the client as the active closing party and the server as the passive closing party.

Issues that require attention during the four waves:

Question 1. Why four waves?

The party that sends FIN is actively closing (client), while the other party is passively closing (server).

When one party sends a FIN, it means that no more data will be sent on this side.

When the passive closing party receives the other party's FIN, there may be data to be sent at this time, so it cannot send FIN immediately (that is, it cannot send FIN and ACK together).

Instead, it waits for its own data to be sent before sending a FIN separately, so the whole process requires four interactions.

Question 2. What is half-closed

After receiving the ACK response to the first FIN, the client enters the FINWAIT2 state. At this time, the server is in the CLOSEWAIT state, which is called half-closed.

From half-close to full-close, it is necessary to wait for the second FIN confirmation to complete. At this time, the client must wait for the server's FIN to enter TIMEWAIT.

If the other party does not send FIN for a long time, it will time out after waiting for a while. This can be controlled by the kernel parameter tcpfin_timeout, which defaults to 60s.

Question 3. Why does the server have a large number of closewait

A server connection in a half-closed state will be in the closewait state until the server sends a FIN.

Then at the application layer, calling socket.close() will execute the sending of FIN. If the server has a large number of connections in the CLOSE_WAIT state, the possible reasons are:

  • The server is under too much pressure and there is no time to call close
  • There is a connection leak problem (Bug), the server did not close the connection in time

Question 4. What problems does timewait bring?

When the client receives the FIN from the other party, it will enter the TIMEWAIT state, where it will remain for a while before entering the CLOSE state.

The main reason for doing this is to close the connection reliably. When TCP was designed for reliability, many factors of network instability were taken into consideration, such as:

The ACK sent to the other party may not be received in time. At this time, the other party may retransmit FIN. If CLOSE is entered in advance, RST instead of ACK will be returned, which will affect the closing process.

Therefore, the TIMEWAIT state will last for a period of time by default, and will be closed safely after confirming that there will be no more retransmitted data packets.

Blackboard: The default duration of timewait here is 2*MSL (a total of 1 minute). This MSL is called Max Segment Lifetime, which is the preset maximum life cycle of a data packet transmitted in the network. The default MSL is 30s, of course, this value can be greatly reduced now. It can be seen how bad the network conditions were when it was first designed.

So what problems does timewait bring?

If you frequently actively close connections, a large number of timewait connections may be generated. Since the timewait connection occupies a handle and a small amount of memory (4K), it may affect the establishment of other connections, such as:

There is an error "too many open files".

How to solve it:

  • Reuse connections to avoid frequent closing, such as using a connection pool
  • Parameter tuning, such as enabling the tcptwreuse option to support the reuse of timewait connections.

Blackboard: The timewait problem was discovered in the HTTP protocol, so KeepAlive was defined in HTTP 1.1 to support connection reuse.

Question 5. What is RST and why does it appear?

RST is a special mark used to indicate that the connection should be terminated immediately. The following situations will generate RST:

  • Send data to a non-listened port
  • The other party has called close to close the connection
  • There is some data that has not been processed (receive buffer). When requesting to close the connection, RST will be sent to force closing.
  • Some requests timed out

The RST mechanism is sometimes used to perform port scans, as follows:

-> Port is open and can accept SYN

-> Port closed, responds to RST

summary

The original article just wanted to summarize some details of TCP parameter tuning. I didn't expect that TCP involves so many things. There are so many details and pitfalls in just a simple handshake and waving process.

It can be said that in order to ensure the reliability of data transmission, the early designers did consider too many things. Of course, this also paved the way for the implementation of upper-level applications.

<<:  25 Fudao companies cooperated to develop the scada system, committed to the construction of industrial big data platform and industrial Internet

>>:  Hot Topic | Why is the United States determined to "kill" Huawei?

Recommend

5G converged applications must be a “team competition”

With the popularization of the Internet, 5G integ...

How to prevent 5G from creating a new digital divide

There is no doubt that more pervasive 5G technolo...

What is 5G voice like now?

In the 5G era, real-time communication is still a...

In-depth | Only IT people can understand "Journey to the West"

As a TV series that has been rebroadcast thousand...

Will Huawei's 5G industry see a turnaround in 2021?

Since 2018, some Western countries, led by the Un...

CloudCone: $16.5/year-dual-core/1GB/50GB/3TB@1Gbps/Los Angeles data center

CloudCone's Christmas Sale has begun. The mer...

...

RackNerd: $39/month-E5-2690/4GB/120GB/5TB/San Jose & Seattle & Dallas, etc.

The tribe has shared a lot of cheap RackNerd VPS ...

5G in 2021: Expectations and Developments

5G is the fastest growing mobile technology in hi...

Wi-Fi Alliance: Wi-Fi 6 and 6E have been "rapidly adopted"

By 2025, Wi-Fi 6 and Wi-Fi 6E are expected to exc...

In the 5G era, indoor experience quality is as important as outdoor

In previous generations of mobile networks, outdo...