Detailed explanation of TCP packet sticking, packet unpacking and communication protocol

Detailed explanation of TCP packet sticking, packet unpacking and communication protocol

In TCP programming, we use protocols to solve the problems of sticky packets and unpacking packets. This article will explain in detail the causes of TCP sticky packets and half packets, and how to solve the problems of sticky packets and unpacking packets through protocols. Let you know the reasons.

[[279480]]

1 TCP Packet Sticking and Unpacking Diagram

Since the TCP transmission protocol is stream-oriented and has no message protection boundary, multiple messages sent by one party may be combined into a large message for transmission, which is called packet sticking; or a message sent may be split into multiple small messages, which is called packet unpacking.

The following figure demonstrates the process of packet sticking and unpacking. The client sends two data packets D1 and D2 to the server. The number of bytes read by the server at one time is uncertain, so the following situations may exist:

These situations are described as follows:

The server reads two independent data packets, D1 and D2, twice, without any packet sticking or unpacking.

The server receives two packets at a time, D1 and D2 are glued together, which is called TCP glue packet.

The server reads the data packet twice. The first time, it reads the complete D1 packet and part of the D2 packet. The second time, it reads the remaining content of the D2 packet. This is called TCP unpacking.

The server reads the data packet twice. The first time, it reads the partial content D1_1 of the D1 packet. The second time, it reads the remaining content D1_2 of the D1 packet and the complete D2 packet.

Since the data sent by the sender may be stuck or unpacked, it is difficult for the receiver to distinguish them. Therefore, a scientific mechanism must be provided to solve the problem of sticking and unpacking. This is the role of the protocol.

Before introducing the protocol, let us first understand the causes of packet sticking and packet unpacking.

2 Causes of sticking and unpacking

The author summarizes the causes of sticking and unpacking problems into the following three types:

  • Socket buffer and sliding window
  • MSS/MTU Limitation
  • Nagle's algorithm

2.1 Socket Buffer and Sliding Window

Each TCP socket has a send buffer (SO_SNDBUF) and a receive buffer (SO_RCVBUF) in the kernel. TCP's full-duplex working mode and TCP's sliding window depend on the filling status of these two independent buffers.

SO_SNDBUF:

When a process sends data, it is assumed that a send method is called. In the simplest case (also the general case), the data is copied into the kernel send buffer of the socket, and then send returns to the upper layer. In other words, when send returns, the data may not be sent to the other end (similar to write to a file), send only copies the data in the application layer buffer into the kernel send buffer of the socket.

SO_RCVBUF:

Cache the received data into the kernel. If the application process has not called read to read, the data will be cached in the corresponding socket's receive buffer. To be more specific, regardless of whether the process reads the socket, the data sent by the other end will be received by the kernel and cached in the socket's kernel receive buffer. What read does is to copy the data in the kernel buffer to the application layer user's buffer, nothing more.

Sliding Window:

During the three-way handshake, the TCP connection will send its own window size to the other party, which is actually the value specified by SO_RCVBUF. When sending data afterwards, the sender must first confirm that the receiver's window is not full. If it is not full, it can send.

After each data transmission, the sender reduces the window size of the other party that it maintains, indicating that the available space of the other party's SO_RCVBUF becomes smaller.

When the receiver starts to process the data in SO_RCVBUF, it reads the data from the socket's receive buffer in the kernel. At this time, the available space of the receiver's SO_RCVBUF becomes larger, that is, the window size becomes larger. The receiver returns its latest window size to the sender in the form of an ack message. At this time, the sender sets the receiver's window size that it maintains to the window size returned by the ack message.

In addition, the sender can continuously send messages to the receiver as long as the other party's SO_RCVBUF space can buffer data, that is, window size>0. When the receiver's SO_RCVBUF is full, the window size=0, and the sender can no longer send data and must wait for the receiver's ack message to obtain the latest available window size.

2.2 MSS/MTU Fragmentation

MTU (Maximum Transmission Unit) is the link layer's limit on the maximum data that can be sent at one time. MSS (Maximum Segment Size) is the maximum length of the data portion of a TCP message and is the transport layer's limit on the maximum data that can be sent at one time.

To understand MSS/MTU, you first need to review the TCP/IP five-layer network model.

When data is transmitted, some additional information is added to each layer:

  • Application layer: only cares about the data DATA to be sent, writes the data into the socket buffer SO_SNDBUF in the kernel and returns, and the operating system takes the data in SO_SNDBUF for sending.
  • Transport layer: TCP Header (20 bytes) is added before DATA
  • Network layer: Add an IP Header to the TCP message, that is, add its own network address to the message. The length of the IP Header in IPv4 is 20 bytes, and the length of the IP Header in IPV6 is 40 bytes.
  • Link layer: Add Datalink Header and CRC. Add SMAC (Source Machine, MAC address of the data sender), DMAC (Destination Machine, MAC address of the data receiver) and Type fields. The total length of SMAC+DMAC+Type+CRC is 18 bytes.
  • Physical layer: transmission

After reviewing this basic content, let's look at MTU and MSS. MTU is the limitation of Ethernet data transmission. Each Ethernet frame cannot exceed 1518 bytes. Excluding the 14-byte header (DMAC+SMAC+Type field) and the 4-byte tail (CRC check) of the Ethernet frame, the maximum size of the remaining data field that carries the upper layer protocol can only be 1500 bytes. We call it MTU.

MSS is the result of subtracting the IP Header of the network layer and the TCP Header of the transport layer from the MTU. This is the maximum size of actual application data that the TCP protocol can send at one time.

  1. MSS = MTU(1500)-IP Header(20 or 40)-TCP Header(20)

Due to the different lengths of IPV4 and IPV6, in IPV4, the Ethernet MSS can reach 1460 bytes; in IPV6, the Ethernet MSS can reach 1440 bytes.

When the sender sends data, when the amount of data in SO_SNDBUF is greater than the MSS, the operating system will split the data so that each part is smaller than the MSS, which also forms unpacking. Then, the TCP Header is added to each part to form multiple complete TCP messages for sending. Of course, when passing through the network layer and data link layer, the corresponding content will be added respectively.

Another thing to note is that the local loopback address (lookback) does not need to go through Ethernet, so it is not limited by Ethernet MTU = 1500. Enter the ifconfig command on the Linux server to view the MTU size of different network cards, as follows:

The above picture shows 2 network card information:

  • eth0 needs to use Ethernet, so the MTU is 1500;
  • lo is a local loopback, which does not need to go through Ethernet and is therefore not subject to the 1500 limit.

2.3 Nagle algorithm

In the TCP/IP protocol, no matter how much data is sent, a protocol header (TCP Header + IP Header) must always be added in front of the data (DATA). At the same time, when the other party receives the data, it also needs to send ACK to confirm.

Even a character typed from the keyboard, which takes up one byte, may result in a 41-byte packet on transmission, including 1 byte of useful information and 40 bytes of header data. This translates into a 4000% consumption, which is unacceptable for a heavily loaded network. It is called "silly window syndrome".

In order to make the best use of network bandwidth, TCP always wants to send as much data as possible. (A connection will set the MSS parameter, so TCP/IP hopes to send data in MSS-sized data blocks each time.) The Nagle algorithm is to send as large a block of data as possible to avoid the network being filled with many small data blocks.

The basic definition of the Nagle algorithm is that at any time, there can be at most one unconfirmed small segment. The so-called "small segment" refers to a data block smaller than the MSS size, and the so-called "unconfirmed" means that after a data block is sent out, no ACK is received from the other party to confirm that the data has been received.

Rules of Nagle's algorithm:

  1. If the data length in SO_SNDBUF reaches MSS, sending is allowed;
  2. If the SO_SNDBUF contains FIN, indicating a request to close the connection, the remaining data in the SO_SNDBUF is sent first and then closed;
  3. If the TCP_NODELAY=true option is set, sending is allowed. TCP_NODELAY cancels the TCP confirmation delay mechanism, which is equivalent to disabling the Negale algorithm. Under normal circumstances, when the server receives data, it will not send ACK to the client immediately, but will delay the sending of ACK for a period of time (usually 40ms). It hopes that the server will send response data to the client within time t, so that ACK can be sent together with the response data, as if the response data is piggybacking on the ACK. Of course, the TCP confirmation delay of 40ms is not always constant. The delayed confirmation time of the TCP connection is generally initialized to a minimum value of 40ms, and then it is continuously adjusted according to parameters such as the retransmission timeout (RTO) of the connection, the time interval between the last received data packet and the current received data packet, etc. In addition, the confirmation delay can be canceled by setting the TCP_QUICKACK option.
  4. When the TCP_CORK option is not set, if all small data packets (packet length less than MSS) sent out are confirmed, they are allowed to be sent;
  5. If none of the above conditions are met, but a timeout occurs (usually 200ms), it will be sent immediately.

3 Communication Protocol

After understanding the reasons for packet sticking and packet unpacking, let's analyze how the receiver can distinguish between them. The reason is simple: if there is incomplete data (packet unpacking), you need to continue waiting for data until a complete request or response can be formed.

By defining a communication protocol, we can solve the problem of packet sticking and unpacking. The role of the protocol is to define the format of the transmitted data. In this way, when receiving data:

If the package is glued, you can distinguish different packages according to this format

If it is unpacked, wait until the data can form a complete message for processing.

3.1 Fixed-Length Agreement

Fixed-length protocol: As the name implies, it specifies that a message must have a fixed length. For example, we stipulate that every 3 bytes represent a valid message. If we send the following 9 bytes in 4 times:

  1. + ---+----+------+----+  
  2. | A | BC | DEFG | HI |
  3. + ---+----+------+----+  

According to the protocol, we can determine that there are three valid request messages, as follows:

  1. + -----+-----+-----+  
  2. | ABC | DEF | GHI |
  3. + -----+-----+-----+  

In a fixed-length agreement:

  • The sender must ensure that the length of the message is fixed. If the message byte length does not meet the requirements, such as the specified length is 1024 bytes, but the actual content to be sent is only 900 bytes, the insufficient part can be supplemented with 0. Therefore, fixed-length protocols may waste bandwidth.
  • The receiver considers that a complete message has been read each time a fixed-length content is read.

Tip: Netty provides FixedLengthFrameDecoder, which supports decoding a fixed length of bytes as a complete message.

3.2 Special Character Separator Protocol

Add special characters such as carriage return or space at the end of the packet to split it. For example, when parsing by line, when encountering characters \n, \r\n, it is considered a complete data packet. For the following binary byte stream:

  1. + --------------+  
  2. | ABC\nDEF\r\n |
  3. + --------------+  

Then according to the protocol, we can judge that there are 2 valid request messages here

  1. + -----+-----+  
  2. | ABC | DEF |
  3. + -----+-----+  

In the special character delimiter protocol:

  • The sender needs to add a special delimiter at the end of a message when sending it;
  • The receiver, when receiving a message, needs to detect the special delimiter and can only process it when a complete message is detected.

When using the special character delimiter protocol, it is important to note that the special characters we choose must not appear in the message body, otherwise incorrect unpacking may occur. For example, the sender wants to treat "12\r\n34" as a complete message. If it is split by line, it will be split into 2 messages by mistake. One solution is that the sender pre-encodes the content to be sent in base64. Since base64 encoding only contains 64 characters: 0-9, az, AZ, +, /, we can choose special characters other than these 64 characters as delimiters.

Tip: Netty provides DelimiterBasedFrameDecoder to decode based on special characters. In fact, the cache server redis we are familiar with also uses line breaks to distinguish a complete message.

3.3 Variable-length protocol

The message is divided into a message header and a message body. In the message header, we use an integer, such as an int, to indicate the length of the message body. The message body is actually the binary data bytes to be sent. The following is a basic format:

  1. header body
  2. + --------+----------+  
  3. | Length | Content |
  4. + --------+----------+  

In the variable length protocol:

  • Before sending data, the sender needs to obtain the binary byte size of the content to be sent, and then add an integer in front of the content to be sent to indicate the length of the binary byte of the message body.
  • When parsing, the receiver first reads the content length Length, whose value is the number of bytes occupied by the actual message body content (Content). After that, it must read this many bytes of content to consider it a complete data message.

Tip: Netty provides LengthFieldPrepender to encode the actual content Content and add the Length field. The recipient uses LengthFieldBasedFrameDecoder to decode.

3.4 Serialization

Serialization is not essentially about solving the problem of sticking and unpacking packets, but about making network development more convenient. In the variable-length protocol, we can add a length field before the actual data to be sent, indicating the length of the data to be sent. This actually gives us a good idea that we can convert an object into binary bytes for communication, such as using a Request object to represent a request and a Response object to represent a response.

There are many serialization frameworks. When choosing, we mainly consider the speed of serialization/deserialization, the volume occupied by serialization, multi-language support, etc. The following is a list of popular serialization frameworks in the industry:

Tip: xml and json also belong to the category of serialization framework, which are not listed in the above table.

Some network communication RPC frameworks usually support multiple serialization methods. For example, dubbo supports hessian, json, kyro, fst, etc. When supporting multiple serialization frameworks, a field is usually required in the protocol to indicate the serialization type. For example, we can transform the format of the above variable-length protocol into:

  1. + --------+-------------+------------+  
  2. | Length | serializer | Content |
  3. + --------+-------------+------------+  

Here, 1 byte is used to represent the value of the Serializer, and different values ​​are used to represent different frameworks.

The sender, after selecting the serialization framework and encoding, needs to specify the value of the Serializer field.

The receiver, when decoding, selects the corresponding framework for deserialization based on the value of Serializer;

3.5 Compression

Usually, in order to save network overhead, you can consider compressing data during network communication. Common compression algorithms include lz4, snappy, gzip, etc. When choosing a compression algorithm, we mainly consider the compression ratio and decompression efficiency.

We can add a compress field in the network communication protocol to indicate the compression algorithm used:

  1. + --------+-----------+----------------+----------------+  
  2. | Length | serializer| compress | Content |
  3. + --------+-----------+----------------+----------------+  

Usually, we do not need to use a byte to indicate the compression algorithm used. One byte can identify 256 possible situations, and there are only a few commonly used compression algorithms. Therefore, usually only 2 to 3 bits are needed to indicate the compression algorithm used.

In addition, since the compression ratio is not too high when the amount of data is relatively small, there is no need to compress all the sent data. Compression is only considered when it exceeds a certain size. For example, when RocketMQ producer sends a message, the default message size exceeds 4K before compression. Therefore, the compress field should have a value indicating that no compression algorithm is used, such as 0.

3.6 Error Check Code

Some communication protocols also include error checking codes in the data transmitted. Typical algorithms include CRC32 and Adler32. Java supports both of these methods, java.util.zip.Adler32 and java.util.zip.CRC32.

  1. + --------+-----------+----------------+----------------+---------+  
  2. | Length | serializer| compress | Content | CRC32 |
  3. + --------+-----------+----------------+----------------+---------+  

Here we will not explain CRC32 and Adler32 in detail, but mainly consider why we need to perform verification?

Some people say that it is because of security considerations, but this reason does not seem to be sufficient, because we already have TLS layer encryption, and the role of CRC32 and Adler32 should not be for security considerations.

I agree with a colleague's point of view: during the transmission of binary data, electromagnetic interference may cause a high level to become a low level, or a low level to become a high level. In this case, the data is equivalent to being contaminated. At this time, the correctness of the data can be verified by checking values ​​such as CRC32.

In addition, the verification mechanism is usually optional in the communication protocol and does not need to be turned on. Although it can ensure the correctness of the data, calculating the verification value will also bring some additional performance loss. For example, in MySQL master-slave synchronization, although the CRC32 verification is turned on by default in the higher version, it can also be disabled through configuration.

3.7 Summary

This section uses some basic cases to explain how to solve the problems of sticking and unpacking packets through protocols in TCP programming. In actual development, our protocols are usually more complicated. For example, some RPC frameworks will add an ID to the protocol to uniquely identify a request, and some RPC frameworks that support two-way communication, such as sofa-bolt, will also add a direction information. Of course, the so-called complexity is nothing more than adding a field to the protocol for a certain purpose. As long as you understand the meaning of these fields, it will not be complicated.

<<:  TCP/IP, UDP, HTTP, MQTT, CoAP: five IoT protocols

>>:  When WiFi6 collides with 5G, is it a crisis or a business opportunity?

Recommend

Enable IPv6 protocol and experience IPv6 website

IPV4 resources have been exhausted and there is n...

By the end of 2021, my country will have 10.1 5G base stations per 10,000 people

The Ministry of Industry and Information Technolo...

Wireless charging is convenient, but how does it work?

In recent years, wireless charging has been widel...

5G, you will be able to make phone calls

Yes, the title is correct. 5G will enable phone c...

Fairytale Town: $4.19/month KVM-1GB/10G SSD/1TB/Japan Data Center

Fairytale Town is a Chinese hosting company estab...

ZJI: New Hong Kong (Ali/Kwaiwan) E3 high-frequency servers available, 25% off

ZJI is the original well-known WordPress host com...

RackNerd: $19.99/year KVM-1.8GB/28GB/3TB/Los Angeles Data Center

RackNerd has launched some promotions in Los Ange...

10gbiz: $3.58/month KVM-1GB/30GB/15M unlimited/Los Angeles data center

10gbiz is a newly opened foreign hosting service ...

Opportunities and strategic choices for operators in the cloud-based world

[[257522]] 1. With the support of policies, the c...

What new entrepreneurial opportunities will 5G+AI and 5G+IoT generate?

What entrepreneurial opportunities are there in t...