How to use gdb to accurately locate deadlock problems in multithreading

How to use gdb to accurately locate deadlock problems in multithreading

[[337631]]

This article is reprinted from the WeChat public account "Program Cat Adult", the author is Program Cat Adult. Please contact Program Cat Adult's public account to reprint this article.

Many people will encounter deadlock problems during multi-threaded development. Deadlock problems are also frequently asked questions during interviews. Here we introduce how to use gdb+python scripts in C++ to debug deadlock problems, and how to detect deadlocks during program execution.

First, let's introduce what deadlock is. Let's look at the definition in Wikipedia:

Deadlock (English: Deadlock), also translated as deadlock, is a term in computer science. When two or more computing units are waiting for each other to stop running in order to obtain system resources, but neither of them exits early, it is called deadlock. In a multi-tasking operating system, the operating system must solve this problem in order to coordinate different processes and whether they can obtain system resources in order to make the system operate.

Wikipedia introduces process deadlock. Deadlock can also occur in multi-threading. The same principle applies, so I will not introduce it in detail here.

Four conditions for deadlock

  • No preemption: System resources cannot be forced to exit from a process (thread), and already acquired resources cannot be preempted before they are used up.
  • Hold and wait: When a process (thread) is blocked for requesting resources, it holds on to the resources it has obtained.
  • Mutual exclusion: Resources can only be allocated to one process (thread) at a time and cannot be shared by multiple processes (threads).
  • Circular waiting: A series of processes (threads) hold resources required by other processes (threads).

Deadlock will only occur if the above four conditions are met at the same time. To eliminate deadlock, you only need to destroy any one of the conditions.

How to debug multithreaded deadlock issues

Most of the reasons for multi-threaded deadlocks are due to inconsistent locking orders in multiple threads. See the following code that will cause deadlock:

  1. using std::cout;
  2.  
  3. std::mutex mutex1;
  4. std::mutex mutex2;
  5. std::mutex mutex3;
  6.  
  7. void FuncA() {
  8. std::lock_guard<std::mutex> guard1(mutex1);
  9. std::this_thread::sleep_for(std::chrono::seconds(1));
  10. std::lock_guard<std::mutex> guard2(mutex2);
  11. std::this_thread::sleep_for(std::chrono::seconds(1));
  12. }
  13.  
  14. void FuncB() {
  15. std::lock_guard<std::mutex> guard2(mutex2);
  16. std::this_thread::sleep_for(std::chrono::seconds(1));
  17. std::lock_guard<std::mutex> guard3(mutex3);
  18. std::this_thread::sleep_for(std::chrono::seconds(1));
  19. }
  20.  
  21. void FuncC() {
  22. std::lock_guard<std::mutex> guard3(mutex3);
  23. std::this_thread::sleep_for(std::chrono::seconds(1));
  24. std::lock_guard<std::mutex> guard1(mutex1);
  25. std::this_thread::sleep_for(std::chrono::seconds(1));
  26. }
  27.  
  28. int main() {
  29. std::thread A(FuncA);
  30. std::thread B(FuncB);
  31. std::thread C(FuncC);
  32.  
  33. std::this_thread::sleep_for(std::chrono::seconds(5));
  34.  
  35. if (A.joinable()) {
  36. A. join ();
  37. }
  38. if (B.joinable()) {
  39. B. join ();
  40. }
  41. if (C.joinable()) {
  42. C. join ();
  43. }
  44. cout << "hello\n" ;
  45. return 0;
  46. }

As shown in the figure:

  • Thread A already holds mutex1 and wants to apply for mutex2. It can release mutex1 and mutex2 only after obtaining mutex2. At this time, mutex2 is occupied by thread B.
  • Thread B already holds mutex2 and wants to apply for mutex3. It can release mutex2 and mutex3 only after obtaining mutex3. At this time, mutex3 is occupied by thread C.
  • Thread C already holds mutex3 and wants to apply for mutex1. It can release mutex3 and mutex1 only after obtaining mutex1. At this time, mutex1 is occupied by thread A.

The three threads refused to give in to each other, resulting in a deadlock.

Traditional gdb debugging multi-threaded deadlock method

(1) The attach id is associated with the process id where the deadlock occurs

  1. (gdb) attach 109
  2. Attaching to process 109
  3. [New LWP 110]
  4. [New LWP 111]
  5. [New LWP 112]
  6. [Thread debugging using libthread_db enabled]
  7. Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1" .
  8. 0x00007fa33f9e8d2d in __GI___pthread_timedjoin_ex (threadid=140339109693184, thread_return=0x0, abstime=0x0,
  9. block=<optimized out >) at pthread_join_common.c:89
  10. 89 pthread_join_common.c: No such file or directory.

(2)info threads View the information of all threads in the current process, and also view some stack information

  1. (gdb) info threads
  2. Id Target Id Frame
  3. * 1 Thread 0x7fa33ff10740 (LWP 109) "out" 0x00007fa33f9e8d2d in __GI___pthread_timedjoin_ex (
  4. threadid=140339109693184, thread_return=0x0, abstime=0x0, block=<optimized out >) at pthread_join_common.c:89
  5. 2 Thread 0x7fa33ec80700 (LWP 110) "out" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  6. 3 Thread 0x7fa33e470700 (LWP 111) "out" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  7. 4 Thread 0x7fa33dc60700 (LWP 112) "out" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

Here you can see that threads 2, 3, and 4 are all in the lock_wait state. Basically, you can see whether there is a problem, but it is not certain. Here you need to use info threads multiple times to see if there are any changes in these threads. If there is no change for multiple times, then a deadlock has basically occurred.

(3)Thread id enters the specific thread

  1. (gdb) thread 2
  2. [Switching to thread 2 (Thread 0x7fa33ec80700 (LWP 110))]
  3. #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  4. 135 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.

(4) bt View the current thread stack information

  1. (gdb) bt
  2. #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  3. #1 0x00007fa33f9ea023 in __GI___pthread_mutex_lock (mutex=0x7fa340204180 <mutex2>) at ../nptl/pthread_mutex_lock.c:78
  4. #2 0x00007fa340000fff in __gthread_mutex_lock(pthread_mutex_t*) ()
  5. #3 0x00007fa3400015b2 in std::mutex::lock() ()
  6. #4 0x00007fa3400016d8 in std::lock_guard<std::mutex>::lock_guard(std::mutex&) ()
  7. #5 0x00007fa34000109b in FuncA() ()
  8. #6 0x00007fa340001c07 in void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) ()

The debugging is basically done here. For pthread_mutex_t, it can print which thread holds it. Then, by repeating steps 3 and 4, you can determine which threads and locks are deadlocked. For std::mutex, gdb cannot print specific mutex information, and you cannot see which thread holds the mutex. You can only enter the threads one by one to view the stack information.

However, is there any good way to locate deadlock for std::mutex in c++11?

have.

This can be considered as the fifth step, continue:

(5) Source loads the deadlock.py script

  1. (gdb) source -v deadlock.py
  2. Type "deadlock"   to detect deadlocks.

(6) Enter deadlock to detect deadlock

  1. (gdb) deadlock
  2. [Switching to thread 3 (Thread 0x7f5585670700 (LWP 123))]
  3. #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  4. 135 in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
  5. [Switching to thread 4 (Thread 0x7f5584e60700 (LWP 124))]
  6. #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  7. 135 in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
  8. [Switching to thread 2 (Thread 0x7f5585e80700 (LWP 122))]
  9. #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  10. 135 in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
  11. #1 0x00007f5586bea023 in __GI___pthread_mutex_lock (mutex=0x7f5587404180 <mutex2>) at ../nptl/pthread_mutex_lock.c:78
  12. [Switching to thread 3 (Thread 0x7f5585670700 (LWP 123))]
  13. #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  14. #1 0x00007f5586bea023 in __GI___pthread_mutex_lock (mutex=0x7f55874041c0 <mutex3>) at ../nptl/pthread_mutex_lock.c:78
  15. [Switching to thread 4 (Thread 0x7f5584e60700 (LWP 124))]
  16. #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  17. #1 0x00007f5586bea023 in __GI___pthread_mutex_lock (mutex=0x7f5587404140 <mutex1>) at ../nptl/pthread_mutex_lock.c:78
  18. Found deadlock!
  19. Thread 2 (LWP 122) is waiting on pthread_mutex_t (0x00007f5587404180) held by Thread 3 (LWP 123)
  20. Thread 3 (LWP 123) is waiting on pthread_mutex_t (0x00007f55874041c0) held by Thread 4 (LWP 124)
  21. Thread 4 (LWP 124) is waiting on pthread_mutex_t (0x00007f5587404140) held by Thread 2 (LWP 122)

Looking directly at the results, the script detected the deadlock and pointed out which specific threads caused the deadlock. According to the output information, it can be clearly seen that the loop formed by the thread locks caused the deadlock. After finding out which specific threads formed the deadlock loop, you can check the stack information of the corresponding threads to see which locks are waiting.

The principle of deadlock detection script:


Let’s take the above picture as an example:

  • Thread A already holds mutex1 and wants to apply for mutex2. It can release mutex1 and mutex2 only after obtaining mutex2. At this time, mutex2 is occupied by thread B.
  • Thread B already holds mutex2 and wants to apply for mutex3. It can release mutex2 and mutex3 only after obtaining mutex3. At this time, mutex3 is occupied by thread C.
  • Thread C already holds mutex3 and wants to apply for mutex1. It can release mutex3 and mutex1 only after obtaining mutex1. At this time, mutex1 is occupied by thread A.

As shown in the figure, three threads form a loop. Deadlock detection is to check whether there is a loop between threads. It is relatively easy to check the deadlock loop alone. Here, the concept of simple loop is also involved, because the normally detected loop may be a large loop, not the loop with the least number of weight vertices. If the number of vertices in the detected loop is large, the cost of positioning will increase. The script is a simple loop for detection. It involves the strongly connected component algorithm and the simple loop algorithm. It is relatively complicated and I will not introduce it in detail. The script comes from Facebook's folly library (I recommend you to check Google's abseil and Facebook's folly, both of which are good things). The code is too long to be listed in the article. If you need it, you can download it yourself or follow me and add me as a friend to send it to you.

How to detect deadlock in your code

The principle is the same as described above. During the thread locking process, a graph is always maintained to record the relationship between threads.

A->B, B->C, C->A

<<:  The management and configuration of routers and switches will definitely refresh your mind!!!

>>:  Working together: Two ways Wi-Fi and 5G can coexist

Recommend

TCP source code analysis - three-way handshake Connect process

[[386167]] This article is reprinted from the WeC...

Six ways SD-WAN simplifies network management

For software-defined wide area networks (SD-WAN),...

New trends: eight directions of development of the Internet of Things industry

The Internet of Things (IoT) is a technological r...

How to implement a 100-channel network camera monitoring solution?

1. Calculate line bandwidth First, we need to det...

Operator workers: What are your plans after receiving your year-end bonus?

It’s the end of the year again. Regardless of whe...

Interviewer, I implemented a Chrome Devtools

[[426371]] Web pages will load resources, run JS,...

...

From 5G to 6G: The race between innovation and disruption

McKinsey's 2022 Technology Trends Outlook sho...

HTTP request headers - the basics you need to remember

Introduction Usually HTTP messages include reques...