April 23, 2007
Multi-threaded Debugging TechniquesPutting It All Together
Let's stop for a minute and take a look at applying the previously discussed principles to a simplified real-world example. Assume that you are writing a data acquisition application. Your design calls for a producer thread that samples data from a device every second and stores the reading in a global variable for subsequent processing. A consumer thread periodically runs and processes the data from the producer. In order to prevent data corruption, the global variable shared by the producer and consumer is protected with a Critical Section. An example of a simple implementation of the producer and consumer threads is shown in Listing 1.6. Note that error handling is omitted for readability.
1 static int m_global = 0;
2 static CRITICAL_SECTION hLock; // protect m_global
3
4 // Simple simulation of data acquisition
5 void sample_data()
6 {
7 EnterCriticalSection(&hLock);
8 m_global = rand();
9 LeaveCriticalSection(&hLock);
10 }
11
12 // This function is an example
13 // of what can be done to data
14 // after collection
15 // In this case, you update the display
16 // in real time
17 void process_data()
18 {
19 EnterCriticalSection(&hLock);
20 printf("m_global = 0x%x\n", m_global);
21 LeaveCriticalSection(&hLock);
22 }
23
24 // Producer thread to simulate real time
25 // data acquisition. Collect 30 s
26 // worth of data
27 unsigned __stdcall Thread1(void *)
28 {
29 int count = 0;
30 SetThreadName(-1, "Producer");
31 while (1)
32 {
33 // update the data
34 sample_data();
35
36 Sleep(1000);
37 count++;
38 if (count > 30)
39 break;
40 }
41 return 0;
42 }
43
44 // Consumer thread
45 // Collect data when scheduled and
46 // process it. Read 30 s worth of data
47 unsigned __stdcall Thread2(void *)
48 {
49 int count = 0;
50 SetThreadName(-1, "Consumer");
51 while (1)
52 {
53 process_data();
54
55 Sleep(1000);
56 count++;
57 if (count > 30)
58 break;
59 }
60 return 0;
61 }
Listing 1.6: Simple Data Acquisition Device
The producer samples data on line 34 and the consumer processes the data in line 53. Given this relatively simple situation, it is easy to verify that the program is correct and free of race conditions and deadlocks. Now assume that the programmer wants to take advantage of an error detection mechanism on the data acquisition device that indicates to the user that the data sample collected has a problem. The changes made to the producer thread by the programmer are shown in Listing 1.7.
void sample_data()
{
EnterCriticalSection(&hLock);
m_global = rand();
if ((m_global % 0xC5F) == 0)
{
// handle error
return;
}
LeaveCriticalSection(&hLock);
}
Listing 1.7: Sampling Data with Error Checking
After making these changes and rebuilding, the application becomes unstable. In most instances, the application runs without any problems. However, in certain circumstances, the application stops printing data. How do you determine what's going on?
The key to isolating the problem is capturing a trace of the sequence of events that occurred prior to the system hanging. This can be done with a custom trace buffer manager or with tracepoints. This example uses the trace buffer implemented in Listing 1.1.
Now armed with a logging mechanism, you are ready to run the program until the error case is triggered. Once the system fails, you can stop the debugger and examine the state of the system. To do this, run the application until the point of failure. Then, using the debugger, stop the program from executing. At this point, you'll be able bring up the Threads window to see the state information for each thread, such as the one shown in Figure 1.
[Click image to view at full size]
Figure 1: Examining Thread State Information Using Visual Studio 2005
When you examine the state of the application, you can see that the consumer thread is blocked, waiting for the process_data() call to return. To see what occurred prior to this failure, access the trace buffer. With the application stopped, call the PrintTraceBuffer() method directly from Visual Studio's debugger. The output of this call in this sample run is shown in Figure 2.
[Click image to view at full size]
Figure 2: Output from trace buffer after Error Condition Occurs
Examination of the trace buffer log shows that the producer thread is still making forward progress. However, no data values after the first two make it to the consumer. This coupled with the fact that the thread state for the consumer thread indicates that the thread is stuck, points to an error where the critical section is not properly released. Upon closer inspection, it appears that the data value in line 7 of the trace buffer log is an error value. This leads up back to your new handling code, which handles the error but forgets to release the mutex. This causes the consumer thread to be blocked indefinitely, which leads to the consumer thread being starved. Technically this isn't a deadlock situation, as the producer thread is not waiting on a resource that the consumer thread holds.
The complete data acquisition sample application is provided on this book's Web site, www.intel.com/intelpress/mcp.
Multi-threaded Debugging Using GDB
For POSIX threads, debugging is generally accomplished using the GNU Project Debugger (GDB). GDB provides a number of capabilities for debugging threads, including:
Not all GDB implementations support all of the features outlined here. Please refer to your system's manual pages for a complete list of supported features.
|
|
|||||||||||||||||||
|
|
|
|