"If you don't believe it, run a test?" Is it a gimmick or real strength?

"If you don't believe it, run a test?" Is it a gimmick or real strength?

1. Background: The Battle of Performance

"If you don't agree, run a test" has become a joke in the mobile phone industry, but to be honest, "running a test" is indeed one of the most important evaluation methods in the field of operating systems. For example, the Linux kernel community often uses the score of the running software to evaluate the value of an optimization patch. There are even media such as phoronix that focus on Linux running. And today I want to say one more thing, making the software run high is a manifestation of strength and is based on a deep understanding of the kernel. The story of this article originated from a daily performance optimization analysis. When we evaluated the automated performance tuning software tuned, we found that it made some minor changes to the parameters related to the Linux kernel scheduler in the server scenario, but these changes greatly improved the performance of the hackbench running software. Isn't it interesting? Let's find out together.

This article will expand on several aspects and focus on the bold parts:

Related knowledge introduction
Introduction to hackbench working mode
The source of hackbench performance impairment: dual parameter optimization <br /> Thinking and expansion

2. Introduction to relevant knowledge

2.1 CFS Scheduler

Most (roughly speaking, all except real-time tasks) threads/processes in Linux are scheduled by a scheduler called CFS (Completely Fair Scheduler), which is one of the core components of Linux. (In Linux, threads and processes are only slightly different, so we will use process as the term below.)

The core of CFS is the red-black tree, which is used to manage the running time of processes in the system and serves as the basis for selecting the next process to run. In addition, it also supports priority, group scheduling (based on the well-known cgroup implementation), current limiting and other functions to meet various advanced requirements. A detailed introduction to CFS.

2.2 Hackbench

Hackbench is a stress testing tool for the Linux kernel scheduler. Its main task is to create a specified number of scheduling entity pairs (threads/processes), let them transmit data through sockets/pipes, and finally calculate the time overhead of the entire running process.

2.3 CFS Scheduler Parameters

This article focuses on the following two parameters, which are also important factors affecting the performance of hackbench. System administrators can use the sysctl command to set them.

Minimum granularity time: kernel.sched_min_granularity_ns

By modifying kernel.sched_min_granularity_ns, you can affect the length of the CFS scheduling period. For example, if you set kernel.sched_min_granularity_ns = m, when there are a large number of runnable processes in the system, the larger the m, the longer the CFS scheduling period.

As shown in Figure 1, each process can run on the CPU for a different amount of time. sched_min_granularity_ns ensures the minimum running time of each process (under the same priority). The larger the sched_min_granularity_ns, the longer each process can run at a time.

Figure 1: sched_min_granularity_ns diagram

Wake-up preemption granularity: kernel.sched_wakeup_granularity_ns

kernel.sched_wakeup_granularity_ns ensures that the reawakened process does not frequently preempt the running process. The larger the kernel.sched_wakeup_granularity_ns, the less frequent the preemption of the awakened process.

As shown in Figure 2, three processes, process-{1,2,3}, are awakened. Because the running time of process-3 is greater than curr (the process running on the CPU), it cannot preempt the running process. The running time of process-2 is less than curr but the difference is less than sched_wakeup_granularity_ns, so it cannot preempt the running process either. Only process-1 can preempt curr. Therefore, the smaller sched_wakeup_granularity_ns is, the faster the response time of the process after being awakened (the shorter the waiting time).

Figure 2: sched_wakeup_granularity_ns diagram

3. Introduction to hackbench working mode

The hackbench working modes are divided into process mode and thread mode. The main difference is whether the test is based on creating a process or a thread. The following will introduce it based on thread.

Hackbench will create several threads (even number), which are divided into two types of threads: sender and receiver
And divide it into n groups, each group contains m pairs of sender and receiver.
The task of each sender is to send loop times of data packets of size datasize to all receivers in its group in turn.
The receiver is only responsible for receiving data packets.
The sender and receiver in the same group can communicate in two ways: pipe and local socket (only pipe or socket can be used in one test). Threads in different groups have no interaction relationship.
Through the above hackbench model analysis, we can know that the threads/processes in the same group are mainly I/O intensive, and the threads/processes between different groups are mainly CPU intensive.

Figure 3: Hackbench working mode

Active context switching:

For the receiver, when there is no data in the buffer, the receiver will be blocked and actively give up the CPU to enter sleep.
For the sender, if there is not enough space in the buffer to write data, the sender will be blocked and actively give up the CPU.
Therefore, there are many "active context switches" in the system, but there are also "passive context switches". The latter will be affected by the parameters we will introduce next.

4. Sources of Hackbench Performance Impact

In the hackbench-socket test, tuned modified the two parameters of CFS, sched_min_granularity_ns and sched_wakeup_granularity_ns, which resulted in significant performance differences. The details are as follows:

Next, we adjust these two scheduling parameters for further in-depth analysis.

5. Dual Parameter Optimization

Note: For simplicity, m will be used to represent kernel.sched_min_granularity_ns and w will be used to represent kernel.sched_wakeup_granularity_ns.

To explore the impact of dual parameters on the scheduler, we choose to fix one parameter at a time, study the impact of the other parameter change on performance, and use system knowledge to explain the principle behind this phenomenon.

5.1 Fixed sched_wakeup_granularity_ns

Figure 4: Fix w and adjust m

In the figure above, we fixed the parameter w and divided it into three parts according to the change trend of the parameter m: area A (1ms~4ms), area B (4ms~17ms), and area C (17ms~30ms). In area A, the four curves all show a rapid downward trend, while in area B, the four curves are in an oscillating state with large fluctuations. Finally, in area C, the four curves tend to be stable.

From the related knowledge in Section 2, we can know that m affects the running time of the process, which also means that it affects the "passive context switching" of the process.

For region A, preemption is too frequent and most preemptions are meaningless because the peer has no data to write/no buffer available, resulting in a large number of redundant "active context switches". At this time, a larger w can allow the sender/receiver to have more time to write/consume data to reduce the meaningless "active context switches" of the peer process.
For area B, as m increases, the time requirement for sender/receiver to execute tasks is gradually met and enough data can be written/read from the buffer. Therefore, a smaller w is needed to increase the preemption probability of the awakening process, so that the peer process can respond to and process data faster, reducing the "active context switching" in the next round of scheduling.
For region C, m is large enough, and there is almost no "passive context switching". After completing the task, the process will perform "active context switching" and wait for the peer process to process it. At this time, the impact of m on performance is very small.

5.2 Fixed sched_min_granularity_ns

Figure 5: Fix m and adjust w

In the above figure, we fixed the parameter m and divided it into three areas:

In region A, the same phenomenon as in Figure 4 exists: larger m is less affected by w, while smaller m performs better as w increases.
In region B, the medium-sized m (8ms/12ms) processes still have more "passive context switches", and the processes have already processed a considerable amount of data and expect the peer process to respond as soon as possible. Therefore, a larger w will seriously affect the performance of the medium-sized m.
In region C, Figure 5 and Figure 4 show the same trend of stability. When w is too large, wake-up preemption almost never occurs. Therefore, the change of w value alone has little impact on performance. However, too large w will cause performance problems for medium-sized m (for the same reason as above).

5.3 Performance Trend Overview

The following is a thermal overview of the experimental data to intuitively show the constraint relationship between m and w for reference and analysis by students in need. The three areas are slightly different from the areas in Figures 4 and 5.

Figure 6: Overview

5.4 Optimal Dual Parameters (for hackbench)

From the analysis in the above two sections, we can see that for scenarios with "active context switching" such as hackbench, a larger m (for example: 15~20ms) can be selected.
In the scenario of pipe/socket bidirectional communication, the response time of the other end will affect the next processing of the process. In order to enable the other end process to respond in time, a medium-sized w (for example: 6-8ms) can be selected to obtain higher performance.

6. Thinking and expansion

In desktop scenarios, applications tend to be more interactive, and the service quality of applications is more reflected in the response time of applications to user operations. Therefore, a smaller sched_wakeup_granularity_ns can be selected to improve the interactivity of applications.
In the server scenario, applications are more inclined to computing processing, and applications need more running time for intensive computing, so a larger sched_min_granularity_ns can be selected. However, in order to prevent a single process from monopolizing the CPU for too long and to be able to process client request responses in a timely manner, a medium-sized sched_wakeup_granularity_ns should be selected.
In the Linux native kernel, the default parameters of m and w are set to adapt to the desktop scenario. Anolis OS users need to select kernel parameters based on the scenario of the application they deploy, whether it is desktop or server, or use the recommended configuration of tuned. Hackbench, as an application between desktop and server, can also be used as a reference for configuration.

<<:  First release | The creator of the low-code concept has proposed a new development paradigm

>>:  How to choose a text message service provider? Borei Data's real phone monitoring helps companies detect text message "stealing"

Recommend

Thinking about the Boundary Expansion of Web Front-end in the 5G Era

Author: Wang Shuyan and Wang Jiarong, Unit: China...

MaxCompute Spark resource usage optimization

1. Overview This article mainly explains MaxCompu...

Discussion on interactive control technology of device platform based on gateway

Author: Zhu Rongliang, Unit: China Mobile Smart H...

Netty - Sticky Packets and Half Packets (Part 2)

Continue from the previous article "Introduc...

5G development has reached a critical turning point

"5G currently covers all county towns and ur...

2018 Trends: What will the future hold for AI and IoT?

What kind of chemical reaction will occur between...

Three steps to improve data center efficiency

Recently, Maggie Shillington, a cloud computing a...

Why do you need a managed switch?

When dealing with complex network environments, i...

Just one click to start your journey into Huawei's ICT virtual exhibition car

[51CTO.com original article] The Huawei Enterpris...