What role does network communication play in RPC calls? RPC is a way to solve inter-process communication. An RPC call is essentially a process of network information exchange between a service consumer and a service provider. The service caller sends a request message through network IO, the service provider receives and parses it, and after processing the relevant business logic, it sends a response message to the service caller. The service caller receives and parses the response message, and after processing the relevant response logic, an RPC call ends. It can be said that network communication is the basis of the entire RPC call process. 1 Common Network I/O ModelsNetwork communication between two PCs is the operation of network IO by the two PCs. Synchronous blocking IO, synchronous non-blocking IO (NIO), IO multiplexing and asynchronous non-blocking IO (AIO). Only AIO is asynchronous IO, and the others are synchronous IO. 1.1 Synchronous blocking I/O (BIO)By default, all sockets in Linux are blocking. After the application process initiates an IO system call, the application process is blocked and transferred to the kernel space for processing. After that, the kernel starts waiting for data. After the data is received, the data in the kernel is copied to the user memory. After the entire IO processing is completed, the process is returned. Finally, the application process is unblocked and the business logic is run. The system kernel processes IO operations in two stages:
In these two stages, the threads of IO operations in the application process will always be in a blocked state. If developed based on Java multi-threading, each IO operation will occupy a thread until the IO operation is completed. After the user thread initiates the read call, it is blocked and gives up the CPU. The kernel waits for the network card data to arrive, copies the data from the network card to the kernel space, then copies the data to the user space, and then wakes up the user thread. 1.2 IO multiplexingThe most widely used IO model in high-concurrency scenarios, such as Java's NIO, Redis, and Nginx's underlying implementation are applications of this type of IO model:
The IO of multiple network connections can be registered to a multiplexer (select). When the user process calls select, the entire process will be blocked. At the same time, the kernel will "monitor" all the sockets that select is responsible for. When the data in any socket is ready, select will return. At this time, the user process calls the read operation again to copy the data from the kernel to the user process. When a user process initiates a select call, the process will be blocked. It will not return until it finds that the socket that the select is responsible for has ready data, and then initiate a read. The whole process is more complicated than blocking IO, and it seems to waste more performance. But the biggest advantage is that users can process IO requests of multiple sockets at the same time in one thread. Users can register multiple sockets, and then continuously call select to read the activated sockets, so as to achieve the purpose of processing multiple IO requests at the same time in the same thread. In the synchronous blocking model, it must be implemented through multi-threading. It’s like when we go to a restaurant to eat. This time we go together with several people. We leave one person to wait in line for a table in the restaurant, while the others go shopping. After our friends who have waited in line inform us that it’s time to eat, we go and enjoy it directly. In essence, multiplexing is still synchronous blocking. 1.3 Why is blocking IO and IO multiplexing most commonly used?The application of network IO requires the support of the system kernel and programming language. Most system kernels support blocking IO, non-blocking IO and IO multiplexing, but signal-driven IO and asynchronous IO are only supported by high-version Linux system kernels. Whether C++ or Java, high-performance network programming frameworks are based on the Reactor model, such as Netty. The Reactor model is based on IO multiplexing. In non-high-concurrency scenarios, synchronous blocking IO is the most common. The most widely used and most fully supported by system kernels and programming languages are blocking IO and IO multiplexing, which meet the needs of most network IO application scenarios. 1.4 Which network IO model should the RPC framework choose?IO multiplexing is suitable for high concurrency, using fewer processes (threads) to handle more socket IO requests, but it is more difficult to use. Blocking IO blocks the process (thread) every time it processes an IO request for a socket, but it is easier to use. In scenarios where the concurrency is low and the business logic only needs to perform IO operations synchronously, blocking IO can meet the needs and does not require a select call. The overhead is lower than IO multiplexing. Most RPC calls are high-concurrency calls. After comprehensive consideration, RPC chooses IO multiplexing. The best framework is Netty, a framework based on the Reactor mode. In Linux, epoll should also be enabled to improve system performance. 2 Zero-copy2.1 Network IO reading and writing processEach write operation of the application process writes data to the buffer in the user space, and the CPU then copies the data to the system kernel buffer, and the DMA then copies the data to the network card, which then sends it out. For a write operation, the data must be copied twice before it can be sent out through the network card, and the user process read operation is the opposite, and the data must also be copied twice before the application can read it. For a complete read or write operation of an application process, data must be copied back and forth between the user space and the kernel space. Each copy requires the CPU to perform a context switch (from the user process to the system kernel, or from the system kernel to the user process). Doesn't this waste CPU and performance? Is there any way to reduce the data copying between processes and improve the efficiency of data transmission? This requires zero copy: cancel the data copy operation between user space and kernel space. Every read and write operation of the application process makes the application process write or read data to the user space, just like writing or reading data directly to the kernel space, and then copy the data in the kernel to the network card through DMA, or copy the data in the network card to the kernel. 2.2 ImplementationDoes it mean that if both user space and kernel space write data to the same place, there is no need to copy it? Think of virtual memory? Virtual Memory There are two implementations of zero copy: mmap+writeSolved by virtual memory. sendfileNginx sendfile 3 Netty Zero CopyThe selection of RPC framework in network communication framework is based on the framework implemented in Reactor mode, such as Netty, which is the first choice for Java. Does Netty have a zero copy mechanism? What is the difference between the zero copy in Netty framework and the zero copy I mentioned before? The zero copy in the previous section is the zero copy at the OS layer, which avoids data copy operations between user space and kernel space and improves CPU utilization. Netty zero copy is a little different. It is completely based on user space, that is, JVM, and tends to optimize data operations. The significance of Netty doing thisDuring the transmission process, RPC will not send all the binary data of the request parameters to the peer machine at once. It may be split into several data packets or merged with data packets of other requests. Therefore, the message must have boundaries. After receiving the message, the machine at one end must process the data packets, split and merge the data packets according to the boundaries, and finally obtain a complete message. After receiving the message, is the segmentation and merging of the data packets completed in user space or in kernel space? Of course it is in user space, because the processing of data packets is handled by the application, so is there any possibility of data copy operation here? It may exist, of course it is not the copy between user space and kernel space, but the copy processing operation in the internal memory of user space. Netty's zero copy is to solve this problem and optimize data operations in user space. So how does Netty optimize data operations?
Many internal ChannelHandler implementation classes in the Netty framework handle the unpacking and sticking problems in TCP transmission through CompositeByteBuf, slice, and wrap operations. Netty solves the data copy between user space and kernel spaceNetty's ByteBuffer uses Direct Buffers, using off-heap direct memory for Socket read and write operations. The final effect is the same as the effect achieved by the virtual memory I just explained. Netty also provides the FileChannel.transferTo() method that wraps NIO in FileRegion to implement zero copy, which is the same as the sendfile method in Linux in principle. 4 ConclusionThe benefit of zero copy is that it avoids unnecessary CPU copies, freeing the CPU to do other things. It also reduces the context switching of the CPU between user space and kernel space, thereby improving network communication efficiency and the overall performance of the application. Netty zero copy is different from OS zero copy. Netty zero copy tends to optimize data operations in user space, which is of great significance for handling the unpacking and sticking problems in TCP transmission, and is also of great significance for applications to process request data and return data. |
<<: Four ways to ensure service availability in the face of traffic bursts
>>: I finally figured out the service flow limit issue.
In the previous section, we introduced how networ...
[[334143]] This article is reproduced from Leipho...
At present, various regions continue to accelerat...
[[352946]] On November 13, the website of the U.S...
[[188759]] "In the past, I had to go to seve...
Aoyo Host is a long-established hosting company e...
In the post-epidemic era, hybrid office has becom...
On August 16, Google and Facebook jointly announc...
Since 4G, the bandwidth of carriers has increased...
We have received the official announcement from D...
The State Council Information Office held a press...
On the afternoon of August 21, the final of the K...
The tribe shared news about FantomNetworks twice ...