October 23, 2009
Performance Analysis Tools for Linux Developers: Part 2Mark Gray and Julien Carreno '
Setting performance profiling and analysis goals
Mark Gray is a software development engineer working at Intel on Real-Time embedded systems for Telephony. Julien Carreno is a software architect and senior software developer at specializing in embedded Real-time applications on Linux
In Part 1 of this article, we summarized some of the performance tools available to Linux developers on Intel architecture. In Part 2, we cover a set of standard performance profiling and analysis goals and scenarios that demonstrate what tool or combination of tools to select for each scenario. In some scenarios, the depth of analysis is also a determining factor in selecting the tool required. With increasingly deeper levels of investigation, we need to change tools to get the increased level of detail and focus from them. This is similar to using a microscope with different magnification lenses. We start from the smallest magnification and gradually increase magnification as we focus on a specific area.
Methodologies
In any performance analysis or profiling exercise, it is the authors' experience that there are two critical pieces of information that need to be present from the start:
When items 1 and 2 above are clear, you have effectively determined "where you are" and "where you want to be". For the purposes of this article, we focus on scenarios in which the system is not behaving according to specifications rather than measurement on a working system.
From experience, it is critical to apply a structured method at the start of any performance analysis since any activity with an inappropriate tool can be a complete waste of time. Performance can be broadly affected by issues in three distinct areas: CPU occupancy, memory usage and IO. As a first step, it is absolutely essential to determine which area your problem is coming from since the tools mainly focus on one of these three areas to provide any kind of detailed data. Hence, the first step is always to use general tools that provide a high-level view of all three areas simultaneously. Once, this has been done, the developer can delve deeper into a specific area using tools with an increasing level of detail and potentially more and more invasiveness. It is advised not to make any assumptions regarding the category the investigated problem falls under and skipping the first high-level analysis. Assumptions such as these have proven in the past to be counter-productive on numerous occasions.
When doing performance analysis on a working system to understand what makes it tick, it is important to take into account a number of things. Avoid any over-kill. For example, if only a simple CPU performance measurement of a working system is required, it may be sufficient to use a non-invasive high-level analysis tool such as ps. The depth of analysis should be determined "a priori" by all interested parties.
Start at the 10,000 ft View
As stated earlier, the starting point of any analysis should be a set of system-level measurements meant to provide an indication of the system state, most notably:
For our purposes here, it is assumed that we are dealing with finding a single problem area at a time during our analysis, figuring out what that area is that brings us here. Scenarios covering analysis of a system with both CPU occupancy and memory usage problems, for example, is not covered here.
Figure 1: top View (Fully-Loaded Single Core System)
Figure 2: top View (Half-Loaded Dual-Core System)
Figure 3: sar System-Wide Increased Memory Usage View
Figure 4: sar IO Wait CPU Usage View
Figure 5: ps View (Loaded System)
Figure 6: iostat View (Loaded System)
Using some of the examples above, having already applied our methodology of performing a high-level analysis that includes CPU, I/O, and memory performance for all the scenarios below, we can see in Figure 1 that our CPU usage is approximately 90%. Our main problem here is CPU occupancy as the vast majority of cycles are being spent in user space. Our next step should be to examine more closely the applications running on the system. Using ps, in Figure 5, we can see that we have a number of applications running concurrently on the system and that our VoIPapp is by far the biggest CPU user. We should examine our VoIPapp in more detail, see "CPU Bottlenecks".
In Figure 2, we can see that our overall CPU occupancy is just under 50%, however we are using 99% of one core and virtually nothing of the second available core. We should examine our threading model, see "Optimizing a Complete System" and "CPU Bottlenecks".
We can see between Part 1, Figure 7 and Figure 4 that, over time, our memory usage is increasing, further measurements may indicate that we have a memory leak that is affecting system behaviour, see "Investigating a Memory Issue". From Figure 5, we can see that the CPU is spending an inordinate amount of processing time waiting on IO. We should investigate the reason for the high number of IO waits, see "IO Bottleneck Issue". Optionally, we can use iostat to assess the loading of the block devices in the system to quickly determine if they are a factor in the bottleneck. For instance, in Figure 6, it is apparent that during the file copy, the bottleneck is the block device which is highly loaded.
|
|
||||||||||||||||||||||||||||||
|
|
|
|