This article is reprinted from the WeChat public account "Kaida Neigong Xiuxian", written by Zhang Yanfei allen. To reprint this article, please contact the WeChat public account "Kaida Neigong Xiuxian". In the network development model, there is a method that is very easy for developers to use, that is, synchronous blocking network IO (usually called BIO in Java). For example, if we want to request a piece of data from the server, then a code demo in C language might look like this:
However, in high-concurrency server development, the performance of this network IO is extremely poor. 1. The process is likely to be blocked during recv, resulting in a process switch 2. When the connection data is ready, the process will be awakened again, and it is another process switch 3. A process can only wait for one connection at a time. If there are many concurrent connections, many processes are required. If we use one sentence to summarize it, it is: synchronous blocking network IO is a stumbling block on the road to high-performance network development! As the saying goes, only by knowing yourself and the enemy can you win every battle. So today we will not talk about optimization, but only deeply analyze the internal implementation of synchronous blocking network IO. Although there are only two or three lines of code in the demo above, the user process and the kernel actually do a lot of work together. First, the user process initiates the instruction to create a socket, and then switches to kernel state to complete the initialization of the kernel object. Next, when Linux receives data packets, hard interrupts and ksoftirqd processes are processing them. When the ksoftirqd process completes the processing, it notifies the relevant user process. From the creation of a socket by a user process to the arrival of a network packet at the network card and its receipt by the user process, the overall flow chart is as follows: Today we will use diagrams and source code analysis to disassemble each of the above steps in detail and see how they are implemented in the kernel. After reading this article, you will have a deep understanding of the reasons for the poor performance of synchronously blocked network IO! 1. Create a socket After the socket function call in the source code at the beginning is executed, the kernel creates a series of socket-related kernel objects (yes, not just one). The relationship between them is shown in the figure. Of course, this object is more complicated than the figure shows. I only show the content related to today's topic in the figure. Let's look through the source code to see how the above structure is created.
sock_create is the main place to create a socket, and sock_create calls __sock_create.
In __sock_create, first call sock_alloc to allocate a struct sock object. Then get the protocol family's operation function table and call its create method. For the AF_INET protocol family, the inet_create method is executed.
In inet_create, the operation method implementation set inet_stream_ops and tcp_prot defined for TCP are found according to the type SOCK_STREAM, and they are set to socket->ops and sock->sk_prot respectively. We can see sock_init_data further down. In this method, the sk_data_ready function pointer in sock is initialized and set to the default sock_def_readable().
When a data packet is received on the soft interrupt, the process waiting on the sock will be awakened by calling the sk_data_ready function pointer (actually set to sock_def_readable()). We will talk about this later when introducing soft interrupts, but just remember this here. At this point, a TCP object, more precisely a SOCK_STREAM object under the AF_INET protocol family, has been created. This consumes the overhead of a socket system call. 2. Waiting to receive messages Next, let's look at the underlying implementation that the recv function depends on. First, by tracing with the strace command, we can see that the clib library function recv will execute the recvfrom system call. After entering the system call, the user process enters the kernel state, executes a series of kernel protocol layer functions, and then checks whether there is data in the receiving queue of the socket object. If not, it adds itself to the waiting queue corresponding to the socket. Finally, the CPU is released, and the operating system will select the next ready process to execute. The whole flow chart is as follows: After reading the whole flowchart, let's look at the source code in more detail. The focus of today's study is how recvfrom blocks its own process in the end (if we do not use the O_NONBLOCK flag).
sock_recvmsg ==> __sock_recvmsg => __sock_recvmsg_nosec
Call recvmsg in the socket object ops. Recall the socket object diagram above. You can see from the diagram that recvmsg points to the inet_recvmsg method.
Here we encounter another function pointer, this time calling the recvmsg method under sk_prot in the socket object. As above, we can conclude that this recvmsg method corresponds to the tcp_recvmsg method.
Finally we see what we want to see. skb_queue_walk is accessing the receive queue under the sock object. If no data is received, or not enough data is received, sk_wait_data is called to block the current process.
Let's take a closer look at how sk_wait_data blocks the current process. First, under the DEFINE_WAIT macro, a wait queue item wait is defined. On this new wait queue item, the callback function autoremove_wake_function is registered, and the current process descriptor current is associated with its .private member.
Then, sk_sleep is called in sk_wait_data to obtain the wait queue list head wait_queue_head_t under the sock object. The source code of sk_sleep is as follows:
Then call prepare_to_wait to insert the newly defined wait queue item wait into the wait queue of the sock object.
In this way, when the kernel receives the data and generates the ready time, it can find the waiting item on the socket waiting queue, and then find the callback function and the process waiting for the socket ready event. Finally, sk_wait_event is called to give up the CPU, and the process will enter a sleep state, which will result in a process context overhead. In the next section we will see how the process is woken up. 3. Soft interrupt module Next, let's change our perspective and look at the soft interrupt responsible for receiving and processing data packets. I won't go into details about how the network packet is received by the network card and finally handed over to the soft interrupt for processing. If you are interested, please read the previous article "Illustrated Linux Network Packet Receiving Process". Today we will start directly from the TCP protocol receiving function tcp_v4_rcv. After receiving the data packet in the soft interrupt (that is, the ksoftirqd process in Linux), if it is a TCP packet, it will execute the tcp_v4_rcv function. Then, if it is a data packet in the ESTABLISH state, it will eventually split the data and put it into the receiving queue of the corresponding socket. Then it calls sk_data_ready to wake up the user process. Let's look at the code in more detail:
In tcp_v4_rcv, we first query the corresponding socket on the local machine based on the source and dest information in the header of the received network packet. After finding it, we directly enter the receiving main function tcp_v4_do_rcv to see it.
We assume that the packet being processed is in the ESTABLISH state, so it enters the tcp_rcv_established function for processing.
In tcp_rcv_established, the received data is placed on the socket's receive queue by calling the tcp_queue_rcv function. As shown in the following source code
After the reception is completed by calling tcp_queue_rcv, sk_data_ready is called to wake up the user process waiting on the socket. This is another function pointer. Recall the sock_init_data function we executed in the socket creation process above. In this function, sk_data_ready has been set to the sock_def_readable function (you can press ctrl + f to search the previous text). It is the default data ready processing function.
In sock_def_readable, we access wait under sock->sk_wq again. Recall that at the end of the previous call to recvfrom, we added the wait queue associated with the current process to wait under sock->sk_wq through DEFINE_WAIT(wait). The next step is to call wake_up_interruptible_sync_poll to wake up the process that is blocked on the socket because of waiting for data.
__wake_up_common implements wakeup. Note that the parameter nr_exclusive passed to this function call is 1, which means that even if multiple processes are blocked on the same socket, only one process will be woken up. Its purpose is to avoid panic.
In __wake_up_common, find a waiting queue item curr, and then call its curr->func. Recall that when we executed the recv function earlier, we used DEFINE_WAIT() to define the details of the waiting queue item, and the kernel set curr->func to autoremove_wake_function.
In autoremove_wake_function, default_wake_function is called.
The task_struct passed in when calling try_to_wake_up is curr->private. This is the process item that was blocked because of waiting. When this function is executed, the process that was blocked because of waiting on the socket is pushed into the runnable queue, which will be another process context switch overhead. summary OK, let's summarize the above process. The kernel notifies the network packet of the operating environment in two parts: The first part is the process where our own code is located. The socket() function we call will enter the kernel state to create the necessary kernel objects. After entering the kernel state, the recv() function is responsible for checking the receive queue and blocking the current process to give up the CPU when there is no data to process. The second part is the context of hard interrupt and soft interrupt (system process ksoftirqd). In these components, after processing the packet, it will be placed in the socket's receiving queue. Then, according to the socket kernel object, the process in its waiting queue that is blocked due to waiting is found and then woken up. Each time a process is waiting for data on a socket, it has to be taken off the CPU. Then another process is switched. When the data is ready, the sleeping process will be awakened. There are two process context switching overheads in total. According to previous tests, each switch takes about 3-5 us (microseconds). If it is a network IO-intensive application, the CPU will keep doing useless work such as process switching. This model is completely unusable in the server role. This is because the socket and process in this simple model are one-to-one. Now we need to carry thousands, even tens or millions of user connection requests on a single machine. If we use the above method, we have to create a process for each user request. I believe you have never seen anyone do this in any primitive server network programming. If I were to give it a name, it would be single-channel non-multiplexing (a term coined by Fei Ge himself). So is there a more efficient network IO model? Of course there is, and that is the select, poll and epoll that you are familiar with. Next time, Fei Ge will start to disassemble the implementation source code of epoll, so stay tuned! This mode is still used in the client role, because your process may have to wait for the MySQL data to be returned successfully before rendering the page and returning it to the user, otherwise you can't do anything. Please note that I am talking about roles, not specific machines. For example, for your php/java/golang interface machine, when you receive user requests, you are in the server role. But when you request redis, you become the client role. However, there are some well-encapsulated network frameworks such as Sogou Workflow, Golang's net package, etc., which have already abandoned this inefficient model in the role of network client! |
>>: Deploy on demand: China Telecom plans to open 320,000 5G base stations in 2021
2020 is a critical year for my country's 5G c...
We live in a technologically advanced age where h...
The construction and development of 5G has gone t...
The need for secure, reliable, and easy-to-use co...
With the rapid development of cloud computing, cl...
At present, edge computing has been widely recogn...
At the Huawei Developer Conference on September 1...
A ruling [PDF] made public on Tuesday by the U.S....
LigaHosting.ro is a Romanian hosting company that...
The Internet of Things (IoT) is a term that is be...
The Domain Name System (DNS) is one of the founda...
As an Internet user, you have more or less heard ...
When people mention P2P now, they will think of t...
London, UK, May 17, 2021 - The Global Mobile Supp...
In the past two years, cloud computing companies ...