Three pictures tell you the principles of Linux TCP/IP protocol stack

Three pictures tell you the principles of Linux TCP/IP protocol stack

It is no exaggeration to say that today's Internet is built on TCP/IP. Understanding the principles of the protocol stack is very helpful for debugging network IO performance and solving network problems. This article will take you to see how the kernel controls network data flow.

TCP Features

We all know that the original intention of TCP protocol design is to ensure fast, orderly and error-free data transmission. So the characteristics are summarized as follows:

  • Connection-oriented, a connection can be represented by a five-tuple (remote IP, remote port, local IP, local port, transport layer protocol).
  • Data is full-duplex
  • The data is ordered, that is, the received data must be in the order in which it was sent.
  • Flow control: The sender can dynamically adjust the size of the data sent through the receiver's sliding window size.
  • Congestion control: The sender calculates the window size based on the ACK status and the congestion algorithm.

After understanding the characteristics of TCP, let's take a look at the actual process of data transmission.

[[269232]]

Data transmission

Let’s first look at the picture:

The figure above shows the process of data flow in hardware, and the figure below shows the process of data in the protocol stack:

The whole process is divided into three major areas: user area, kernel area, and device. The device here is the network card. The process is as follows:

  1. The user application calls the write system call
  2. Confirm file descriptor
  3. Copy data to the socket buffer
  4. Create TCP fragment and calculate checksum
  5. Add IP header, perform IP routing, calculate checksum
  6. Add Ethernet protocol header and execute ARP
  7. Tell the network card chip to send data
  8. The network card gets data from the memory and sends it. The sending is completed and the interrupt tells the CPU

Data Reception

Look directly at the hardware data flow diagram:

First, the NIC writes the received data packet into its memory. Then it verifies it and sends it to the host's main memory. The buffer in the main memory is allocated by the driver, and the driver will tell the NIC the allocated buffer description. If there is not enough buffer to receive the NIC's data packet, the NIC will discard the data packet. Once the data packet is copied to the main memory, the NIC will inform the host OS through an interrupt.

The driver then checks if it can handle the new packet. If it can, the driver will package the data packet into a structure that the OS recognizes (linux sk_buffer) and push it to the upper layer. After the link layer receives the frame and checks if it passes, it will deframe it according to the protocol and push it to the IP layer.

After unpacking, the IP layer will decide whether to push the packet to the upper layer or forward it to other IPs based on the IP information contained in the packet. If it is determined that it needs to be pushed to the upper layer, the IP header will be unpacked and pushed to the TCP layer.

After decoding the message, TCP will find the corresponding TCB according to its four-tuple, and then process the message through the TCP protocol. After receiving the message, it will add the message to the receiving message, and then send an ACK to the other end according to the TCP status.

Of course, the above process will be affected by NAT and other Netfilters, which we will not discuss here and have not studied in depth. Of course, for the sake of performance, experts have also made a lot of efforts in various aspects, such as RDMA, DPDK and other major software and hardware technologies, and zero-copy, checksum offload, etc.;

Summarize

Modern hardware and software TCP/IP protocol stacks have no problem sending 1~2GiB/s over a single link (tested). If you want to explore better performance, you can try technologies such as RMDA, which optimize performance by bypassing the kernel to reduce copying, etc., but this may depend on the hardware.

<<:  This explains the relationship between IP address, subnet mask, and gateway, and even those without technical skills can understand it.

>>:  How is IPv6 represented? How is IPv4 converted to IPv6?

Recommend

Foreign media: Global investment and deployment in 5G will accelerate in 2020

Foreign media reported that in 2020, global inves...

An article on learning Go network library Gnet analysis

Introduction We analyzed the Go native network mo...

How to promote 5G packages in small and medium-sized cities

From the perspective of package value, the curren...

How do the three major operators promote cloud-network integration?

In recent years, with the rapid development of cl...

How should a small LAN with less than 10 or 100 people be established?

What is a local area network? The so-called local...

5 Reasons Why Process Industries Need Low-Power Wide Area Networks

From oil and gas, refineries and chemicals to pha...