From the network I/O model to Netty, let’s first take a deeper look at I/O multiplexing

From the network I/O model to Netty, let’s first take a deeper look at I/O multiplexing

In the previous article, we learned about the five network I/O models of Unix standards, and learned about their core differences and their respective advantages and disadvantages. In particular, the I/O multiplexing model has a very good advantage in high-concurrency scenarios. Netty also uses the I/O multiplexing model.

So how does Netty implement I/O multiplexing?

Netty is actually also a packaged framework. Its essence is to use the Java NIO package (New IO, not the NIO of the network I/O model, Nonblocking IO) package, which uses I/O multiplexing.

Therefore, this article serves as a prerequisite knowledge + high-frequency interview question chapter (manual dog head), let's take a deeper look at the I/O multiplexing model.

[[381555]]

This article is expected to take 5 minutes to read and will focus on answering the following two questions:

  • What are the implementations of I/O multiplexing mode? select/poll/epoll
  • What is the difference between select/poll/epoll

1. Implementation of I/O multiplexing mode

This is the diagram we used in the previous article on I/O multiplexing. Let’s review the I/O multiplexing model.


The IO of multiple processes can be registered to a multiplexer (selector), and then one process calls select, which will monitor all registered IO.

For example.

In BIO mode, one teacher (application process/thread) can only handle the problem of one student (IO stream) at a time. If there are 10 students, 10 teachers are needed to provide one-on-one explanations.

In the IO multiplexing model, we configure a monitor (multiplexer Selector) for the teacher. The monitor is responsible for observing which of the 10 students in the class wants to ask questions. Once a student raises his hand, the monitor will feedback to the teacher to deal with the question of the student who raised his hand.

In this way, only one teacher is needed, and the teacher only needs to pay attention to the feedback from the class monitor to deal with the corresponding students' problems in a timely manner.

Let's take a closer look at three implementations of I/O multiplexing: select, poll, and epoll.

  • It should be noted that select, poll, and epoll are all implementations of IO multiplexing, and are essentially synchronous I/O, because they all need to be responsible for reading and writing after the read and write events are ready, which means that the read and write process is blocking.
  • Asynchronous I/O does not need to be responsible for reading and writing. The implementation of asynchronous I/O will be responsible for copying data from the kernel to user space.

2. select

The Linux system provides a function select for developers to use the select multiplexing mechanism.


What this function does is:

Through polling, you can monitor multiple file descriptors at the same time to see if any of the three types of IO events, read, write, or exception, has occurred.

Finally, it returns the number of file descriptors where IO events occurred, as well as the file descriptors in which the three events, read events, write events, and exception events, occurred respectively (the three parameters of readfds, writefds, and errorfds).

  • File descriptor is a term in computers that is used to describe the abstract concept of a reference to a file.
  • Everything in Linux is a file, including IO devices. Therefore, to operate a device, you need to open the device file. When you open a file, you will get the file descriptor fd (file discriptor) of the file, which is a very small integer.

We use the teacher-monitor-classmate model to understand this process.

  • The teacher gave the student list (xxxxfds) to the class monitor and asked him to pay attention to all the students in the class.
  • The monitor constantly checks the status of each student in the class (checks all fd_sets) until the timeout or a student raises his hand.
  • Once a student raises his hand, the class monitor will mark the names of the students who have changed on the student list and return the total number of students who have changed to the teacher.
  • The teacher can obtain the number of students who raised their hands, and see which students have experienced events (read, write, exception) on the student list (xxxxfds).
  • After the teacher gets the student list, he will check the status of each student in the class and perform IO processing based on specific read, write, and exception events.

Note that under the select function, the teacher only knows that some students have changed, but to know which students have changed, he needs to poll the class list (xxxfds), find the students who raised their hands, and then communicate with them.

The disadvantages of select are obvious:

  • It has an indifferent polling time complexity of O(n). Each call needs to poll fd_set. The more it processes at the same time, the longer the polling time.
  • Each time the select function is called, all fd_sets need to be copied from user state to kernel state for polling. If the fd_set is large, the performance will be greatly affected.

3. poll

The implementation of poll is very similar to select, so we will not repeat it, but just introduce the difference. The poll function is as follows:


The main difference is the way of describing the fd set. poll uses the pollfd structure instead of the fd_set structure. The pollfd structure uses a linked list instead of an array, which results in no limit on the length of pollfd. However, if the length of pollfd is too large, performance will be degraded.

In addition, the principles of the two are basically the same, that is, multiple descriptors are also polled and processed according to the status of the descriptors.

Therefore, the defects of the two are basically the same.

4. epoll

The full name of epoll is eventpoll, which is implemented based on event events and is a Linux-specific I/O multiplexing function.

It is very different from select\poll in implementation and usage:

  • Epoll completes tasks through a set of functions rather than a single function.
  • Epoll puts the file descriptors fd that the user cares about in an event table, instead of passing all file descriptor sets (fds) back and forth like select/poll.
  • Epoll requires an additional file descriptor fd to represent this event table.

Different from select, which uses three fd_sets to correspond to read/write/abnormal IO changes, epoll specifically defines an epoll_event structure, which is used as a logical encapsulation of read/write/abnormal IO changes, called an event.


4.1 Three core functions of epoll

epoll divides the original select/poll call into three functions.


  • Call int epoll_create(int size) to create an epoll handle object, and return a file descriptor fd pointing to the event table. If you check /proc/process id/fd/ in Linux, you can see this fd, so after using epoll, you must call close() to close it, otherwise fd may be exhausted.
  • The size parameter does not limit the maximum number of descriptors that epoll can monitor, but is only a suggestion for the kernel to initially allocate internal data structures.

  • Call epoll_ctl to add the connected socket to the epoll object.
  • epfd is the event table id returned by epoll_creat.
  • op indicates the specific operation, including adding the monitoring event EPOLL_CTL_ADD of fd, deleting the monitoring event EPOLL_CTL_DEL of fd, and modifying the monitoring event EPOLL_CTL_MOD of fd.
  • fd is the fd (file descriptor) that needs to be monitored
  • Event tells the kernel which event to monitor

  • Call epoll_wait to collect connections for events that occur
  • The return value indicates the total number of file descriptors that are ready to continue.
  • epfd represents the event table id.
  • events represents an array of ready events. If event_wait detects an event, it copies the ready event from the event table to this array. (More efficient than select/poll!!)
  • maxevents indicates the maximum number of events to monitor.

4.2 Implementation principle of epoll

When a process calls the epoll_create() method, the kernel space creates an eventpoll structure. There are two member variables in this structure that are closely related to the use of epoll. The structure is as follows:


  • Red-black tree root node rbr: The root node of the red-black tree, this tree stores all the events that need to be monitored that are added to epoll
  • Linked list rdlist: The linked list stores events that meet the conditions and will be returned to the user through epoll_wait

Use the epoll_ctl() method to add the newly added monitoring event event to the red-black tree rbr. It also registers a callback function for the kernel interrupt handler to tell the kernel that if the interrupt of this handle arrives, put it in the ready list.

Once a file descriptor is ready, the kernel will use a callback mechanism similar to callback to quickly activate the file descriptor, and the triggered event will be added to the eventpoll linked list rdlist by the callback function .

When calling the epoll_wait() method to check whether an event has occurred, it only needs to check whether there are elements in the rdlist linked list in the eventpoll object. If there is data in the linked list, the corresponding modified event event is copied to the events array variable of the epoll_wait() method, and the user can obtain it.

  • Compared with select/poll, we can see that there is no need to traverse the monitored file descriptors here, which is the charm of epoll.

In this way, the efficiency of epoll_wait is very high. Because when calling epoll_wait, there is no need to copy all the connection handle data to the operating system, and the kernel does not need to traverse all the connections.

4.3 Is shared memory used in epoll?

Many blogs have mentioned this:

  • When epoll_wait returns, for the ready events, epoll uses a shared memory approach, that is, both the user state and the kernel state point to the ready list, so memory copy consumption is avoided

But is this really the case?

There is no password in front of the source code, let's take a look at the source code directly.

Refer to the source code of eventpoll.c.

https://github.com/torvalds/linux/blob/master/fs/eventpoll.c

The specific epoll_wait calling relationship is shown in the figure below.


We can see the specific instructions in put_user.


Therefore, events are indeed copied from kernel space to user space, and no shared memory is used.

5. Comparison of three implementations

Through the above analysis, I believe everyone has understood the implementation of select/poll/epoll.

The following table summarizes their main differences.

Overall, the implementation performance of epoll is better than that of select/poll.

Of course, if there are always a lot of active connections, the efficiency of epoll_wait may not be high, because the callback function of epoll_wait is triggered too frequently.

Therefore, epoll is most suitable for scenarios where there are a large number of connections but a small number of active connections.

bibliography:

《Linux High Performance Server Programming》

Recommended collection of popular notes from the past:

  • HBase Principles and Practical Notes Collection
  • MySQL practical notes collection
  • Canal/Otter source code and practical notes collection
  • Java practical skills notes collection

<<:  China Mobile's 10-year old users will enjoy four major privileges. Netizens: How can China Unicom put its face?

>>:  China Unicom SMS has a large-scale failure: mobile phones cannot receive verification codes. Official: Emergency processing is in progress

Recommend

An automation-first approach to network predictability

An automation-first approach is one of the most e...

Ruijie Networks escorts Guangzhou's "Digital Asian Games"

With the Chinese women's volleyball team'...

Four experiments to thoroughly understand the disconnection of TCP connections

[[431016]] Preface Seeing this title, you may say...

...

After China leads the world in 5G, beware of being "led by the pace"

From the blank of 1G to the comprehensive leaders...

Illustration | A cross-domain problem confused me

[[439558]] This article is reprinted from the WeC...