What is the appropriate number of Goroutines? Will it affect GC and scheduling?

What is the appropriate number of Goroutines? Will it affect GC and scheduling?

[[387141]]

This article is reprinted from the WeChat public account "脑进煎鱼了", written by Chen Jianyu. To reprint this article, please contact the public account "脑进煎鱼了".

Hello everyone, I am Jianyu.

A few days ago, I saw a friend in the reader exchange group asking a fatal question, which is: "What is the appropriate number of goroutines for a single machine?"

Maybe your first reaction will be the same as your friends in the group, and you will reply, "I don't think there is a definite answer as to how much to control."

Then a further question arose: "Too many goroutines will affect GC and scheduling, so how can we budget this number reasonably?"

This is the main topic of this article, so the structure of this article will be to explore the basic knowledge first, and then uncover it step by step to gain a deep understanding of this issue.

What is a Goroutine?

As a new programming language, one of the most popular features of Go is goroutine. Goroutine is a lightweight thread managed by the Go runtime, generally called a "coroutine".

  1. go f(x, y, z)

The operating system itself cannot clearly perceive the existence of Goroutine. The operation and switching of Goroutine belong to the "user state".

Goroutine is controlled by a specific scheduling mode and runs in a "multiplexed" manner on several system threads allocated by the operating system to the Go program.

At the same time, the overhead of creating a Goroutine is very small, and only 2-4k of stack space is required initially. Goroutine itself will scale up and down according to actual usage, and is very lightweight.

  1. func say(s string) {
  2. for i := 0; i < 9999999; i++ {
  3. time .Sleep(100 * time .Millisecond)
  4. fmt.Println(s)
  5. }
  6. }
  7.  
  8. func main() {
  9. go say( "fry fish" )
  10. say( "Hello" )
  11. }

It is known as the little overlord of coroutines that can open hundreds or tens of millions of them, and is one of the proud works of the Go language.

What is Scheduling?

Since there is a Goroutine representing the user mode, and the operating system cannot see it, there must be something to manage it so that it can work better.

This refers to the scheduling in the Go language, the most common GMP model that is most often asked in interviews. Therefore, the following will introduce the basic knowledge and process of Go scheduling.

The following content is excerpted from the chapter content of "Go Language Programming Tour" written by Jianyu and P Shen.

Scheduling Basics

The main function of the Go scheduler is to distribute executable Goroutines to OS threads running on the processor. When we talk about schedulers, we can't do without three abbreviations that are often mentioned, namely:

  • G: Goroutine. In fact, every time we call go func, we generate a G.
  • P: Processor. Generally, the number of P is the number of processor cores, which can be modified through GOMAXPROCS.
  • M:Machine, system thread.

The interaction between these three actually comes from Go's M: N scheduling model. That is, M must be bound to P, and then it will continuously loop on M to find a runnable G to execute the corresponding task.

Scheduling Process

We conduct a simple analysis based on the workflow diagram of the GMP model. The official diagram is as follows:

  1. When we execute go func(), we actually create a brand new Goroutine, which we call G.
  2. The newly created G will be put into the local queue (Local Queue) or global queue (Global Queue) of P, ready for the next action. It should be noted that the P here refers to the P that created G.
  3. Wake up or create M to execute G.
  4. Continuously executing the event loop
  5. Find G in available state to perform tasks
  6. After clearing, re-enter the event loop

The description mentions two types of queues, global and local. In fact, functionally speaking, both are used to store Gs waiting to be run, but the difference is that the number of local queues is limited and no more than 256 are allowed.

And when creating a new G, the local queue of P will be given priority. If the local queue is full, half of the G in the local queue of P will be moved to the global queue.

This can be understood as the sharing and rebalancing of scheduling resources.

Stealing

We can see that there is a steal behavior in the figure. What is it used for? We all know that when you create a new G or G becomes executable, it will be pushed to the local queue of the current P.

In fact, when P finishes executing G, it will also "work". It will pop G from the local queue and check whether the current local queue is empty. If it is empty, it will randomly try to steal half of the executable G from other P's local queues to its own name.

The official picture is as follows:

In this example, P2 cannot find any executable G in the local queue, so it executes the work-stealing scheduling algorithm, randomly selects another processor P1, and steals three Gs from P1's local queue to its own local queue.

At this point, both P1 and P2 have executable G, and the extra G of P1 will not be wasted, and scheduling resources will flow more evenly among multiple processors.

Are there any restrictions?

In the previous content, we gave a basic introduction and sharing of Go's scheduling model and Goroutine.

Next, let’s get back to the topic and think about “Will there be any impact if there are too many goroutines?”

After understanding the basics of GMP, we need to know what constraints the GPM that actually does the work is subject to during the running of the coroutine?

Fried Fish will analyze it step by step from the perspective of GMP.

Limitations of M

First, we need to know which GPM is actually doing the work during the execution of the coroutine.

It must be M (system thread), because G is something in user mode, and the final execution must be mapped and run on M, the system thread.

So is there any limit to M?

The answer is: Yes. In the Go language, the default limit of M is 10000. If it exceeds, an error will be reported:

  1. GO: runtime: program exceeds 10000-thread limit

This situation usually only occurs when a Goroutine is blocking. This may also indicate that there is a problem with your program.

If you really need that many, you can also set it through the debug.SetMaxThreads method.

Limitations of G

Second, what about G? Is there a limit to the number of Goroutines that can be created?

The answer is: No. But in theory it will be affected by memory, assuming that a Goroutine creation requires 4k (via @GoWKH):

  • 4k * 80,000 = 320,000k ≈ 0.3G memory
  • 4k * 1,000,000 = 4,000,000k ≈ 4G memory

In this way, we can relatively calculate the approximate number of Goroutines that can be created by a single machine under normal circumstances.

Note: The 2-4k required for Goroutine creation requires a continuous memory block.

Limitations of P

Third, what about P? Is there a limit to the number of P? What affects it?

The answer is: there is a limit. The number of P is directly affected by the environment variable GOMAXPROCS.

What is the environment variable GOMAXPROCS? In the Go language, by setting GOMAXPROCS, users can adjust the number of P (Processors) in scheduling.

Another important point is that the M (system thread) associated with P needs to be bound to P to perform specific tasks, so the number of Ps will affect the performance of the Go program.

The number of Ps is basically affected by the number of cores on the machine, so there is no need to worry too much about it.

Does the number of Ps affect the number of Goroutines created?

The answer is: no impact. And whether there are more or fewer Goroutines, P will still do what it should do, and it will not cause catastrophic problems.

What is reasonable

After introducing the limitations of GMP, let's return to the key point, which is "How to budget the number of Goroutines reasonably?"

The word "reasonable" needs to be defined in specific scenarios, which can be combined with the above learning and understanding of GPM.

  • M: There is a limit. The default limit is 10,000 and can be adjusted.
  • G: No limit, but affected by memory.
  • P: Affected by the number of cores on the machine, it can be large or small, and does not affect the number of G created.

The number of Goroutines is below the controllable limit of MG. A few dozen or dozens more, or a few less, actually have no impact, and it can be called "reasonable".

Reality

In real application scenarios, it is not possible to define it so simply. If your Goroutine:

  • When frequently requesting HTTP, MySQL, opening files, etc., it is definitely not reasonable to assume that there are hundreds of thousands of coroutines running in a short period of time (it may result in too many files open).
  • The increase in CPU and Memory usage caused by common Goroutine leaks still depends on what is running in your Goroutine.

It still depends on what is running in the Goroutine.

Summarize

In this article, we introduce the basic knowledge of Goroutine, GMP, and scheduling model, and expand on the following issues:

  • What is the appropriate number of goroutines for a single machine?
  • Too many goroutines will affect GC and scheduling, so how can we budget this number reasonably?

As long as the number of goroutines on a single machine is controlled below the limit, it can be considered "reasonable".

The actual scenario depends on what is running inside. If it is a "resource monster", it can die even if only a few Goroutines are running.

Therefore, if you want to define "budget", you have to look at what you are running.

<<:  US operators confirm that only premium users can enjoy C-band 5G signals

>>:  These 5 must-have software for computer installation are recommended by 200,000 people on Bilibili. What’s so good about them?

Recommend

In the 5G era, virtual operators “disappear”

[[269893]] "In the 4G era, the three major t...

A thought-provoking report on a major communications failure

Starting at 1:35 a.m. local time on July 2, a lar...

...

Edge computing and fog computing explained

By processing data at or near the source of the d...

5G enters the second half, the difficulty of ToB lies in the "three highs"

More than two years after the licenses were issue...

Kuroit: £3/month-1GB/15G NVMe/1TB@10Gbps/Ashburner data center

Kuroit is currently promoting its Ashburn data ce...

Catch it all - Webpack project packaging 1

[[427986]] This article is reprinted from the WeC...

What problems can blockchain solve in the real world IT field?

The tech world has long embraced high-risk, high-...