An article on learning Go network library Gnet analysis

An article on learning Go network library Gnet analysis


Introduction

We analyzed the Go native network model and some source code. In most scenarios (99%), using native netpoll is sufficient.

However, in the case of massive concurrent connections, native netpoll will start a goroutine process for each connection, that is, 10 million connections will create 10 million goroutines.

This provides room for optimization in these special scenarios, which is probably one of the reasons why tools like gnet and cloudwego/netpoll were created.

In essence, their underlying cores are the same, and are all based on epoll (linux). It's just that after an event occurs, each library handles it differently.

This article mainly analyzes gnet. As for the usage, I won't post it. gnet has a corresponding demo library, you can experience it yourself.

Architecture

Directly quoting a picture from the gnet official website:

gnet uses "master-slave multiple Reactors". That is, a master thread is responsible for listening to port connections. When a client connection comes, it assigns the connection to one of the sub-threads according to the load balancing algorithm. The corresponding sub-thread handles the read and write events of the connection and manages its death.

The picture below makes it clearer.

Core Structure

Let's first explain some core structures of gnet.

The engine is the top-level structure of the program.

  • The listener corresponding to ln is the listener corresponding to the listening port after the service is started.
  • The loadBalancer corresponding to lb is the load balancer. That is, when the client connects to the service, the load balancer will select a sub thread and hand over the connection to this thread for processing.
  • mainLoop is our main thread, and the corresponding structure is eventloop. Of course, our sub-thread structure is also eventloop. The structure is the same, but the responsibilities are different. The main thread is responsible for listening to the client connection events that occur on the port, and then the load balancer assigns the connection to a sub-thread. The sub-thread is responsible for binding the connection assigned to it (more than one), waiting for subsequent read and write events of all the connections it manages, and processing them.

Next, look at eventloop.

  • netpoll.Poller: Each eventloop corresponds to an epoll or kqueue.
  • Buffer is used as a buffer for reading messages.
  • connCoun records the number of TCP connections currently stored in eventloop.
  • udpSockets and connections respectively manage all udp sockets and tcp connections under this eventloop. Note their structure map. The int type here stores fd.

Corresponding to the conn structure.

There are several fields here:

  • Buffer: stores the latest data sent by the current conn peer (client). For example, if it is sent three times, the buffer stores the third data at this time. The code says so.
  • inboundBuffer: stores the remaining data sent by the peer and not read by the user, also known as a Ring Buffer.
  • outboundBuffer: stores data that has not yet been sent to the peer. (For example, the data that the server responds to the client. Since conn fd is not blocked, when the write call returns that it cannot be written, the data can be placed here first.)

conn is equivalent to each connection having its own independent cache space. This is done to reduce the lock problem caused by centralized memory management. Ring buffer is used to increase the reuse of space.

That’s the overall structure.

Core Logic

When the program starts,

The number of eventloop cycles, that is, the number of subthreads, will be determined according to the options set by the user. Furthermore, in the Linux environment, it is how many epoll objects will be created.

Then the number of epoll objects of the entire program is count(sub)+1(main Listener).

The above picture is what I said. The corresponding eventloop will be created according to the set quantity, and the corresponding eventloop will be registered with the load balancer.

When a new connection comes, one of the eventloops is selected and assigned to it according to a certain algorithm (gnet provides polling, least connections and hash).

Let's look at the main thread first. (Since I'm using a Mac, the implementation of IO multiplexing later is kqueue code, but the principle is the same.)

Polling is waiting for network events to arrive, passing a closure parameter, or more precisely, a callback function when an event arrives. As the name suggests, it is used to handle new connections.

As for the Polling function.

The logic is simple, a for loop waits for events to arrive and then processes the events.

There are two types of main thread events:

One is a normal fd network connection event.

One is an event that is activated immediately via NOTE_TRIGGER.

The NOTE_TRIGGER trigger tells you that there is a task in the queue, so go and execute the task.

If a normal network event arrives, the closure function is processed, and the main thread processes the accept connection function above.

The logic of accept connection is very simple, get the connection fd, set the fd to non-blocking mode (think about what would happen if the connection is blocked?), then select a sub thread according to the load balancing algorithm, and assign this connection to it through the register function.

Register does two things. First, it needs to register the current connection to the epoll or kqueue object of the current sub thread and add a read flag.

The next step is to put the current connection into the connections map structure fd->conn.

In this way, when the corresponding sub-thread event arrives, you can find which connection it is through the fd of the event and perform corresponding processing.

If it is a readable event.

The analysis is almost over here.

Summarize

In gnet, you can see that basically all operations are lock-free.

That is because when an event arrives, non-blocking operations are adopted, and each corresponding fd (conn) is processed serially. Each conn operates on its own cache space. At the same time, all events triggered in one round are processed before the next waiting cycle begins, solving the concurrency problem at this level.

Of course, users also need to pay attention to some issues when using this. For example, if users want to process logic asynchronously in a custom EventHandler, they cannot open a g and then obtain the current data in it as shown below.

Instead, you should get the data first and then process it asynchronously.

As mentioned in the issues, connections are stored using map[int]*conn. The scenario of gnet itself is massive concurrent connections, which will require a lot of memory. In addition, storing pointers in big maps will cause a great burden on GC. After all, it is not like an array, which is a continuous memory space and easy for GC to scan.

Another point is that when processing buffer data, as you can see above, the essence is to copy the buffer data to the user, so there is a lot of copy overhead. At this point, byte netpoll implements Nocopy Buffer, which I will study another day.

<<:  F5 Launches NGINX for Microsoft Azure, Delivering Secure, High-Performance Applications to the Azure Ecosystem

>>:  Can't catch the three-way handshake process? Then come and catch a packet with me!

Recommend

HTTPS 7-way handshake and 9 times delay

HTTP (Hypertext Transfer Protocol) has become the...

Wi-Fi HaLow could be the next IoT enabler

[[435063]] WiFi HaLow is poised to become the nex...

50% of global data center Ethernet switches will be 25GbE or 100GbE by 2021

According to the latest survey report titled &quo...

What network automation certification options are available today?

Networks are increasingly reliant on software and...

What is the principle of WebSocket? Why can it achieve persistent connection?

[[396397]] To better understand WebSocket, we nee...