MESI protocol, JMM, common thread methods, etc.

MESI protocol, JMM, common thread methods, etc.

[[329428]]

This article is reproduced from the WeChat public account "Little Sister Learning Java". Please contact the public account of Little Sister Learning Java for reprinting this article.

Preface

When we are looking for a job, we often see such a line in the recruitment information: multi-threaded concurrent experience is required. Whether you are a junior programmer, an intermediate programmer, or a senior programmer, whether it is a large or small company, concurrent programming is definitely indispensable.

However, many blog posts on the Internet directly talk about JUC without starting from the basics, so this article aims to explain the basics of concurrency, mainly the knowledge of computer principles, common thread methods, and Java virtual machine methods, to escort the subsequent learning. Without further ado, let's get started.

Cache consistency - MESI protocol

CPU multi-level cache official concept

Under the guidance of Moore's Law, CPUs are developing at a rate of doubling every 18 months. However, the development speed of memory and hard disk is far slower than that of CPU, so the concept of cache is introduced. We can see from the figure below that a cache is added between the CPU and main memory to improve the interaction speed.

As the CPU speed becomes faster and faster, people have higher and higher requirements for computer performance. Traditional cache can no longer meet the requirements, so multi-level cache is introduced, including level 1 cache, level 2 cache, and level 3 cache, as shown in the figure.

Level 1 cache: It is basically built into the CPU and runs at the same speed as the CPU, which can effectively improve the CPU's work efficiency. Of course, the more there are, the higher the CPU's work efficiency will be, but because the internal structure of the CPU limits its size, the data of the level 1 cache is not large.

L2 cache: Its main function is to coordinate the working efficiency between L1 cache and memory. The CPU first uses L1 memory. When the CPU speed increases gradually, the L1 cache is not enough for the CPU usage, so L2 memory is needed.

Level 3 cache: It has a similar relationship to level 1 cache and level 2 cache. It is a cache method designed for when the reading of level 2 cache is not enough. In a CPU with level 3 cache, only about 5% of the data needs to be retrieved from the memory, which can improve the efficiency of the CPU a lot, so that the CPU can work at high speed.

We can take a look at the cache status of this machine.

CPU multi-level cache vernacular translation

Only the first level cache:

We can regard the CPU as ourselves, the cache area as a supermarket, and the main memory as a factory. If we want to buy something (get data), we go to the supermarket (cache area) to buy (get it) first. If the supermarket (cache area) does not have it, we go to the factory (main memory) to buy (get it).

Multi-level cache situation:

We can think of the CPU as ourselves, the L1 cache as the store in the neighborhood downstairs, the L2 cache as an ordinary supermarket, the L3 cache as a large supermarket, and the main memory as a factory. If we want to buy something, we go to the store downstairs (L1 cache) first. If the store (L1 cache) doesn't have it, we go to the ordinary supermarket (L2 cache). If the ordinary supermarket (L2 cache) doesn't have it, we go to the large supermarket (L3 cache). If the large supermarket (L3 cache) doesn't have it, we go directly to the factory (main memory) to get it. The emergence of these caches means that we don't have to go to the factory (main memory) every time to buy things (get data), which saves time and improves speed.

Why do we need CPU cache?

The CPU speed is too fast, so fast that the memory cannot keep up. During the processor processing cycle, the CPU often waits for the memory, resulting in a waste of resources.

The significance of cache

Time limitation: If a piece of data is accessed, it may also be accessed at some time in the future. (If I buy potato chips today, I may buy potato chips again in the future, after all, I am a foodie O(∩_∩)O)

Spatial limitations: If a piece of data is accessed, then its adjacent data may also be accessed. (In plain language, if I buy potato chips today, I may also buy other puffed foods, after all, they are next to each other)

The problems

For multi-core systems, there is a problem of inconsistent cache data in each core.

Solution 1: Bus Lock (Performance is Too Low)

The CPU reads data from the main memory to the cache area and locks the data on the bus. Other CPUs cannot read or write the data until the CPU has used the data and the lock is released. For example, I want to buy a spicy bar at the supermarket, but Zhang San also wants to buy one. During the process of buying, the spicy bar is locked, and Zhang San can't touch it at all. My buying process is very slow, so Zhang San is very anxious.

Solution 2: MESI protocol (key point)

In response to the above situation of inconsistent cache data, the MESI protocol is proposed to ensure the consistency of shared data in multiple CPU caches, and four cache line states are defined, namely M (Modified), E (Exclusive), S (Share), and I (Invalid).

  • M (Modified): The data in this row is valid, but has been modified and is inconsistent with the data in the memory. The data can only exist in this buffer.
  • E (Exclusive): This row of data is valid, the data is consistent with the data in the memory, and the data only exists in this cache.
  • S (Shared): This row of data is valid, the data is consistent with the data in the memory, and the data exists in many caches.
  • I (Invalid): This row of data is invalid

Migration between MESI states:

This picture may seem confusing at first glance. Let’s take a closer look at it and slowly experience these changes.

The current status is Modified

  • The kernel reads the value in the local cache (local read): reads data from the cache, the state remains unchanged, or modifies M
  • The local kernel writes the value in the local cache (local write): Modify the data from the cache, the state remains unchanged, or modify M
  • Other cores read values ​​from other caches (remote read): data is written to the memory, and other memories read the latest data, which is the shared S
  • Other cores change the values ​​in other caches (remote write): data is written to the memory, other memories read the latest data, modify and commit, and the status of this cache area is invalid.

Current status is Exclusive

  • The kernel reads the value in the local cache (local read): reads data from the cache, the state remains unchanged, and it is still exclusive.
  • The local kernel writes the value in the local cache (local write): Modify the data from the cache, that is, modify M
  • Other cores read values ​​from other caches (remote read): data is written to memory, and other memories read the data, which is shared S
  • Other cores change the values ​​in other caches (remote write): data is written to the memory, other memories read the data, and modify and submit, which is invalid.

Current status is Share

  • The kernel reads the value in the local cache (local read): reads data from the cache, the state remains unchanged, and S is still shared
  • The local kernel writes the value in the local cache (local write): Modifying data in the cache area is to modify M
  • Other cores read values ​​from other caches (remote read): data is written to memory and other memories read data, which is shared S
  • Other cores change values ​​in other caches (remote write): data is written to memory, other memories read the data, and modify the commit, which is invalid.

The current status is Invalid

  • The kernel reads the value in the local cache (local read): if the value is not in other caches, the state is exclusive E; if the value is in other caches, the state is shared S
  • The local kernel writes the value in the local cache (local write): Modifying data in the cache area is to modify M
  • Other cores read values ​​in other caches (remote read): The operations of other cores have nothing to do with him, that is, invalid I
  • Other cores change the values ​​in other caches (remote write): the operations of other cores have nothing to do with him, that is, invalid I

The difference between parallelism and concurrency

Concurrency: Only one instruction can be executed at the same time, but multiple instructions are executed by the CPU in rotation. Because the time interval is very short, it will create the illusion of simultaneous execution.

Parallelism: Multiple instructions are executed simultaneously on multiple processors at the same time, whether at the micro or macro level.

For example, concurrency is when a housewife has to cook, take care of the baby, and clean the room. If she only does each task for one minute and then rotates, from a macro perspective, it will create the illusion of simultaneous execution. Parallelism is when the housewife hires two nannies, one for cooking and one for taking care of the baby, and she is responsible for cleaning. Whether from a macro or micro perspective, they are all executing at the same time.

A certain boss once said the difference between the two: concurrency is the ability to deal with multiple things at the same time, and parallelism is the ability to do multiple things at the same time. As an engineering student, I don't know how to praise the boss, I just know how to shout 666.

The relationship between processes and threads

Processes are used to load instructions, manage memory, and execute statements.

A thread is a part of a process, and a process can be divided into one or more threads.

Opening NetEase Cloud Music starts a process, while playing, searching, commenting, etc. are all threads.

Communication between threads

Communication between threads is relatively simple and can be done through their shared memory. For details, see the Java Memory Model section below.

Communication between processes

The communication between processes is more complicated. For the same computer, the communication is called IPC; for different computers, the communication requires a network and follows mutually agreed protocols, such as HTTP, etc. This part is hardware-oriented, so I dare not say or ask.

The state of the thread (from the hardware level)

Initial state: A new thread is created, no steps have been taken, and it has not yet been associated with the hardware.

Runnable state: When the start method is called, the program enters the runnable state (ready state), but the time slice has not been obtained at this time. The specific time of running depends on the hardware.

Running state: When the CPU allocates a time slice to a thread, the thread enters the running state.

Blocking state: When a thread calls a blocking API, the thread does not use the CPU and enters a blocking state.

Termination state: When a thread finishes running, it enters the termination state.

Some common thread operations

Three ways to create threads

Thread and task merging

  1. Thread thread = new Thread() {
  2. public void run(){
  3. System.out.println ( "Start" ) ;
  4. }
  5. };

Separation of threads and tasks

  1. Runnable runnable=new Runnable() {
  2. @Override
  3. public void run() {
  4. System.out.println ( "Start" ) ;
  5. }
  6. };
  7. Thread thread=new Thread(runnable);

FutureTask returns the execution result

  1. FutureTask<String> futureTask=new FutureTask<String>(new Callable<String>() {
  2. @Override
  3. public String call() throws Exception {
  4. return   "The thread's return value" ;
  5. }
  6. });
  7. Thread thread=new Thread(futureTask);

Thread start

  1. thread.start();

Here, start means entering the ready state, that is, the running state. The specific time depends on the CPU.

Wait for the thread to finish joining

Without join:

  1. Runnable runnable=new Runnable() {
  2. @Override
  3. public void run() {
  4. System.out.println ( "thread started" );
  5. try {
  6. sleep(4000L);
  7. } catch (InterruptedException e) {
  8. e.printStackTrace();
  9. }
  10. System.out.println ( "thread ended" ) ;
  11. }
  12. };
  13. //Create thread
  14. Thread thread=new Thread(runnable);
  15. //Start the thread
  16. System.out.println ( "Main thread starts" ) ;
  17. thread.start();
  18. System.out.println ( "main thread ends" ) ;

Running results:

When to use join:

  1. Runnable runnable=new Runnable() {
  2. @Override
  3. public void run() {
  4. System.out.println ( "thread started" );
  5. try {
  6. sleep(4000L);
  7. } catch (InterruptedException e) {
  8. e.printStackTrace();
  9. }
  10. System.out.println ( "thread ended" ) ;
  11. }
  12. };
  13. //Create thread
  14. hread thread=new Thread(runnable);
  15. //Start the thread
  16. System.out.println ( "Main thread starts" ) ;
  17. thread.start();
  18. thread.join ();
  19. System.out.println ( "main thread ends" ) ;

Running results:

In the first case without the join method, the main thread start and main thread end are both in front and close together, while the thread start and thread end are in the back, because they are two different threads and do not interfere with each other. In the second case with the join method, the main thread ends at the last line, because the join method needs to wait for the child thread to end before continuing to execute the following code.

Get thread id, name, priority

  1. //Create a thread Thread thread=new Thread(){
  2. public void run(){
  3. System.out.println ( "thread started" );
  4. }
  5. };
  6. //Start the thread
  7. thread.start();
  8. System. out .println( "id: " +thread.getId());
  9. System. out .println( "name: " +thread.getName());
  10. System. out .println( "priority: " +thread.getPriority());

Running results:

Java Memory Model - JMM

Memory Model

It is similar to multi-level cache. Each thread has working memory, which stores a copy of the data in the main memory, as shown in the figure below. If there is a variable a=1 in the main memory, and threads A, B, and C all store a copy of a=1, thread A adds 1 to it and refreshes it to the main memory. However, threads B and C do not know this, so there is a problem. How to solve this problem? I will explain it slowly below, don't rush.

8 atomic operations (concepts)

The following are 8 atomic operations. Please take a look at them. They will be described in detail below.

  • Read: Read data from main memory
  • load: write the data read from main memory to working memory
  • User: Read data from working memory to calculate
  • assign: reassign the calculated value to the working memory
  • store: write working memory data to main memory
  • Write: Assign the variable value in the store to the variable in the main memory
  • lock: lock the main memory variable and mark it as thread exclusive state
  • unlock: unlock the main memory variable. After unlocking, other threads can lock the variable.

8 atomic operations (examples)

Let's draw a picture based on the above example. Please forgive me for being stupid and the picture is a bit ugly.

1. read: read a=1 from the main memory.

2.load: load a=1 from the main memory into the working memory of thread A.

3. Use: read a=1 from the working memory of thread A and perform a self-increment operation.

4.assign: write a=2 to the working memory of thread A.

5.store storage: store a=2 in main memory.

6.write: write a=2 to the a variable in the main memory.

7. Lock: In the above method 1 to solve CPU cache inconsistency, when thread A operates, the main memory a variable is locked (locked), and thread B cannot read the a variable at all.

8. Unlock: After thread A unlocks, it unlocks the main memory a variable, and thread B can read the a variable and operate on it.

Note: There is a performance problem with lock and unlock. We found that the code we wrote is clearly a multi-threaded concurrent operation, but the underlying layer is still serialized and does not truly achieve concurrency.

Visibility Principle

The MESI protocol mentioned above is implemented on the bus. Threads A and B can simultaneously obtain the value of a in the main memory. After a is incremented, it will pass through the bus when performing the write operation 6write. Thread B has been using sniffing to monitor the variable a that it is interested in on the bus. Once it finds that the value of a has been modified, it immediately sets a in its working memory to invalid (using the MESI protocol) and immediately reads the value of a from the main memory. At this time, a in the bus has not been written to the memory, so there is a short lock process. After a is written to the memory, the unlock operation is performed, and thread B can read the new value of a.

Although this process also has lock and unlock operations, the granularity of the lock is reduced.

Risks and advantages of concurrency

Advantages:

  • Speed: multiple requests can be processed simultaneously, the response is faster, and complex operations can be divided into multiple processes and performed simultaneously.
  • Design: Programming is simpler in some cases and may have more options.
  • Resource utilization: The CPU can do other things while waiting for IO.

risk:

  • Security: Sharing data between multiple threads may produce unexpected results.
  • Activity: When an operation cannot proceed, activity problems occur, such as deadlock, knots, etc.
  • Performance: Too many threads will cause: frequent CPU switching, increased scheduling time; synchronization mechanism; excessive memory consumption.

Conclusion

This is the foundation of the concurrency series, mainly talking about the hardware MESI protocol, the eight atomic operations, the relationship between threads and processes, some basic operations of threads, the foundation of JMM, etc.

<<:  One year has passed since China's 5G license issuance, and these opportunities and challenges are becoming clearer

>>:  To prevent 5G from the barrel effect, both Sub-6GHz and millimeter wave are indispensable

Recommend

IPv6 global penetration rate reaches 27%, 6G will be deployed in 2030

Recently, Latif Ladid, chairman of the National I...

Before 5G arrives, let’s talk about what Gigabit LTE is

[[177405]] In October this year, Qualcomm and Aus...

Gigabit broadband: speed for speed’s sake?

At this year's Broadband World Forum (BBWF 20...

T-Mobile and Sprint to merge

Early morning news on February 11, 2020, accordin...