IO multiplexing technology is an important knowledge point, whether it is for interviews or daily technical accumulation. Many high-performance technical frameworks have it. So, what is IO multiplexing? What problems does IO multiplexing solve? Let's analyze it together today. 1. Common IO modelsThere are four common network IO models: synchronous blocking IO (Blocking IO, BIO), synchronous non-blocking IO (NIO), IO multiplexing, and asynchronous non-blocking IO (Async IO, AIO). AIO is asynchronous IO, and the others are synchronous IO. 1. Synchronous blocking IO-BIOSynchronous blocking IO: During the thread processing, if an IO operation is involved, the current thread will be blocked until the IO operation is completed, and the thread will continue to process the subsequent process. As shown in the figure below, the server will assign a new thread to process each socket of the client. The business processing of each thread is divided into two steps. When the IO operation (such as loading a file) is encountered after the processing of step 1 is completed, the current thread will be blocked until the IO operation is completed, and the thread will continue to process step 2. Actual usage scenario: Using the thread pool method in Java to connect to the database uses the synchronous blocking IO model. Disadvantages of the model: Because each client requires a new thread, it is bound to cause threads to be frequently blocked and switched, which will cause overhead. 2. Synchronous non-blocking IO-NIO (New IO)Synchronous non-blocking IO: During thread processing, if IO operations are involved, the current thread will not be blocked, but will process other business codes, and then wait for a while to check whether the IO interaction is completed. As shown in the following figure: Buffer is a buffer used to cache read and write data; Channel is a channel responsible for the background connection IO data; and the main function of Selector is to actively query which channels are in the ready state. Selector reuses a thread to query the ready channels, which greatly reduces the overhead of frequent thread switching caused by IO interaction. Actual usage scenarios: Java NIO is based on this IO interaction model to support business code to implement synchronous non-blocking design for IO, thereby reducing the overhead of frequent thread blocking and switching in the traditional synchronous blocking IO interaction process. The classic case of NIO is the Netty framework, and the underlying Elasticsearch actually adopts this mechanism. 3.IO multiplexingThe following section on what IO multiplexing is will explain this in detail. 4. Asynchronous non-blocking IO-AIOAIO is the abbreviation of asynchronous IO. For AIO, it does not notify the thread when the IO is ready, but notifies the thread after the IO operation is completed. Therefore, AIO will not block at all. At this time, our business logic will become a callback function, which will be automatically triggered by the system after the IO operation is completed. AIO is used in netty5, but despite great efforts, the performance of netty5 did not make a big leap over netty4, so netty5 was eventually offline. Next, our protagonist today, IO multiplexing, comes into play 2. What is IO multiplexing?Presumably, when we learn a new technology or a new concept, the biggest question is the concept itself. IO multiplexing is no exception. To figure out what IO multiplexing is, we can start with the "road" in IO multiplexing. Road: The original meaning is road, such as: asphalt road in the city, mud road in the countryside, these must be familiar to everyone. So: what does the road in IO refer to? Don’t be in a hurry, let’s first take a look at what IO is? In computers, IO stands for input and output. Direct information interaction is achieved through the underlying IO devices. According to different operation objects, it can be divided into disk I/O, network I/O, memory mapping I/O, etc. As long as there is an interactive system of input and output type, it can be considered as an I/O system. Finally, let's take a look at "path" and "multipath" In socket programming, the five elements [ClientIp, ClientPort, ServerIp, ServerPort, Protocol] can uniquely identify a socket connection. Based on this premise, a port of the same service can establish socket connections with n clients, which can be roughly described by the following figure: Therefore, each socket connection between a client and a server can be considered as "one path", and multiple socket connections between multiple clients and the server are "multiple paths". Thus, IO multiplexing is the input and output streams on multiple socket connections, and multiplexing is the input and output streams on multiple socket connections being processed by one thread. Therefore, IO multiplexing can be defined as follows: IO multiplexing in Linux means that one thread handles multiple IO streams. 3. What are the implementation mechanisms of IO multiplexing?Let's first look at the basic socket model to contrast it with the IO multiplexing mechanism below. The pseudo code is as follows The network communication process is as follows: The basic socket model can realize the communication between the server and the client, but each time the program calls the accept function, it can only handle one client connection. When there are a large number of client connections, the processing performance of this model is relatively poor. Therefore, Linux provides a high-performance IO multiplexing mechanism to solve this dilemma. In Linux, the operating system provides three IO multiplexing mechanisms: select, poll, and epoll. We mainly analyze the principles of the three multiplexing mechanisms from the following four aspects:
1.Selection mechanismAn important function in the select mechanism is select(), which takes 4 input parameters and returns an integer. The prototype and parameter details of select() are as follows: (1) How many sockets can select monitor? Answer: 1024 (2) Which socket events can select monitor? Answer: The select() function has three fd_set sets, which represent the three types of events to be monitored, namely read data events (__readfds set), write data events (__writefds set) and exception events (__exceptfds set). When the set is NULL, it means that the corresponding event does not need to be processed. (3) How does select perceive the ready fd? Answer: You need to traverse the fd set to find the ready descriptor. (4) How does the select mechanism achieve network communication? Code implementation: Select to implement network communication process as shown below: The shortcomings of the select function:
2. Poll mechanismThe main function of the poll mechanism is the poll() function. The prototype definition of the poll() function is The pollfd structure contains three member variables: fd, events, and revents, which represent the file descriptor to be monitored, the event type to be monitored, and the actual event type. (11) How many sockets can poll monitor? Answer: Customized, but the system needs to be able to withstand (2) What events in the socket can poll monitor? The event types to be monitored and actually occurred in the pollfd structure are represented by the following three macro definitions, namely POLLRDNORM, POLLWRNORM and POLLERR, which represent readable, writable and error events respectively. (3) How does poll obtain the ready fd? Answer: Similar to select, you need to traverse the fd set to find the ready descriptor. (4) How does the poll mechanism achieve network communication? Poll implementation code: The network communication process implemented by poll is as follows: The poll mechanism solves the limitation that a single select process can only listen to 1024 sockets at most, but it does not solve the problem of polling to obtain ready fd. 3.epoll mechanism(1) The structure of epoll epoll was proposed in the 2.6 kernel, and uses the epoll_event structure to record the fd to be monitored and the event type it monitors. Definition of epoll_event structure and epoll_data structure: The epoll interface is relatively simple, with three functions: ① int epoll_create(int size); Create an epoll handle, and size is used to tell the kernel how many listeners there are. The epoll instance maintains two structures inside, one for recording the fd to be monitored and the other for the ready fd. For the ready file descriptors, they will be returned to the user program for processing. ② int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); epoll's event registration function, epoll_ctl, adds, modifies, or deletes events of interest to the epoll object, and returns 0 if successful, otherwise returns -1. At this time, the error type needs to be determined based on the errno error code. It is different from select(), which tells the kernel what type of event to listen for when listening for events, but registers the event type to be listened for here first. The events returned by the epoll_wait method must have been added to epoll through epoll_ctl. ③ int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout); Waiting for events to be generated is similar to the select() call. The events parameter is used to get the event set from the kernel, maxevents is the size of the events set, and is not greater than the size of epoll_create(), and the timeout parameter is the timeout (in milliseconds, 0 will return immediately, -1 will be uncertain, and some say it is permanently blocked). The function returns the number of events that need to be processed, 0 means timeout, and -1 means error. You need to check the errno error code to determine the error type. (2) About the two working modes of epoll, ET and LT epoll has two working modes: LT (level triggered) mode and ET (edge triggered) mode. By default, epoll works in LT mode, which can handle blocking and non-blocking sockets. The EPOLLET in the above table indicates that an event can be changed to ET mode. ET mode is more efficient than LT mode, and it only supports non-blocking sockets. (3) Differences between ET mode and LT mode When a new event arrives, the event can be obtained from the epoll_wait call in ET mode. However, if the socket buffer corresponding to the event is not processed completely this time, and no new event arrives on the socket again, the event cannot be obtained from the epoll_wait call again in ET mode. On the contrary, in LT mode, as long as there is data in the socket buffer corresponding to an event, the event can always be obtained from epoll_wait. Therefore, it is simpler to develop epoll-based applications in LT mode and less prone to errors. In ET mode, if the buffer data is not processed thoroughly when an event occurs, the user request in the buffer will not be responded to. (4) FAQ How many sockets can epoll monitor? Answer: Customized, but the system needs to be able to withstand How does epoll get the ready fd? Answer: The epoll instance maintains two structures inside, which record the fd to be monitored and the fd that is ready. How does epllo achieve network communication? The following code implements it: The process of epoll network communication is as follows: 4. Differences among the threeThe differences between select, poll, and epoll can be summarized in the following table:
The comparison chart of the three to achieve network communication is convenient for everyone to see the differences: Technical framework using IO multiplexing
SummarizeThis article analyzes a variety of IO models, focusing on the principle of IO multiplexing and source code analysis of each method. Because the IO multiplexing model is very helpful for understanding high-performance frameworks such as Redis and Nginx, it is recommended that you refer to the source code and study it more. |
<<: Istio configuration security: How to avoid misconfiguration
A few days ago, we posted simple test information...
background When the AP and AC are in the same net...
[[272811]] The "unlimited data" package...
[51CTO.com original article] Recently, Huawei'...
Yesterday, I was chatting with a friend who works...
The Internet has been quietly changing over the y...
[[423414]] This article mainly talks about puppet...
At the beginning, I actually don’t recommend anyo...
The communications field has always been the weak...
This article is reprinted with permission from AI...
At the Huawei Day0 Lighting Up the Future Summit ...
1. How to locate the problem that an Eth-Trunk in...
On November 16, 2016, GFIC2016, hosted by DVBCN&a...
[[379905]] Preface Countdown to Chinese New Year~...
According to Sina Technology, at the 2021 Technol...