Performance Analysis Tools for Linux Developers: Part 1

Tools
  • Print
  • Email
Performance analysis and profiling for Intel Processor Architectures

Mark Gray is a software development engineer working at Intel on Real-Time embedded systems for Telephony. Julien Carreno is a software architect and senior software developer at specializing in embedded Real-time applications on Linux.


With the advent of the Intel Atom processor and multicore processors, Intel architecture processors are proliferating in a number of new market segments, most notably embedded systems where good performance is essential. In parallel with this trend, Linux is becoming an established operating system option for embedded designs. The two trends combined pose an interesting problem statement: "How to get the most out of my embedded application running on an Intel platform and a general purpose operating system?" During all kinds of application development, there comes a time when a certain level of performance analysis and profiling is required, either to fix an issue or to improve on current performance. Whether it is memory usage and leaks, CPU usage, or optimal cache usage, analysis and profiling would be almost impossible without the right tool set. This article seeks to help developers understand the more common tools available and select the most appropriate tools for their specific performance analysis needs.

In Part 1 of this article, we summarize some of the performance tools available to Linux developers on Intel architecture. In Part 2 we cover a set of standard performance profiling and analysis goals and scenarios that demonstrate what tool or combination of tools to select for each scenario. In some scenarios, the depth of analysis is also a determining factor in selecting the tool required. With increasingly deeper levels of investigation, we need to change tools to get the increased level of detail and focus from them. This is similar to using a microscope with different magnification lenses. We start from the smallest magnification and gradually increase magnification as we focus on a specific area.

top and ps

The top and ps commands are freely available on all Linux distributions and are generally installed by default. The ps command provides an instantaneous snapshot of system activity on a per-thread basis, whereas the top command provides mostly the same information as ps updated at defined intervals, which can be as small as hundredths of a second. They are frequently overlooked as tools for understanding process performance at a system level. For example, most users tend to use the ps -ef command only to check which processes are currently executing. However, ps can also print useful information such as resident set size or number of page faults for a process. A thorough examination of the ps man pages reveals these options. Likewise, top can also display all this information in various formats while updating it in real-time. The top command window also displays summary information at the top of the window on a per-CPU basis.

In Figure 1, top is showing information for all threads of a process on a multicore machine. Using this more detailed view, we can see total activity on CPU, all threads of the process "app" and on which CPU each thread is scheduled at that instance (P). You can also see memory usage for the process including resident set size (RES) and total virtual memory use (VIRT).

Figure 1: top View (Idle System)

In Figure 2, we can see similar information using ps. We can see the CPU usage on a per-thread basis with 1/10 % accuracy. This is the cumulative CPU percentage since the spawning of the thread.

Figure 2: ps View (Idle System)

As can be seen, top and ps provide a good general overview of system performance and the performance of each process running on the system.

PAGE 1 | 2 | 3 | 4 Next Page »

Real World Parallelism Webinar Series
  • November 17, 2009
    Visual Effects for Animation - presented by DreamWorks Animation
    Speaker: Ron Henderson (Bio)

    Ron Henderson manages the FX Tools group at DreamWorks Animation, where he is responsible for developing physical simulation and procedural modeling tools. These systems have been used for key visual effects in recent films such as Kung Fu Panda and Monsters vs. Aliens (March 2009).

    Prior to joining DreamWorks in 2002 he was a senior scientist at Caltech with a joint appointment to the Applied Math and Aeronautics departments, where he worked on efficient techniques for the direct numerical simulation of fluid turbulence.

    Abstract:
    In this webinar, Ron Henderson will show examples of visual effects, from hair and feathers to smoke and fire, from a variety of DreamWorks Animation feature films. He will discuss in general terms the kinds of techniques used to achieve particular visual effects. Finally, Henderson will show a detailed breakdown of the dam-breaking scene from Madagascar: Escape 2 Africa, demonstrating how different elements of key frame animation, simulation, and rendering are combined in a real production shot.

  • December 1, 2009
    A Quick and Easy Way to Parallelize a Legacy Codebase with Intel® Threading Building Blocks (TBBs)
    Speaker: Bernard Laberge, Avid, Senior Principal Engineer (Bio)

    Bernard Laberge is a senior principal engineer in the video editors division at Avid. During his seven years with the company he has been actively involved in the replacement of the legacy video processing engines used by Avid editors with a common hardware-abstracted, component-based video processing engine currently running on the CPU with SIMD optimized code, GPU, and dedicated hardware.

    Abstract:
    Learn how to overcome the limitations of a thread-based scheduler, including dealing with the absence of recursive parallelism support and the inefficient handling of unbalanced processing load. Bernard Laberge addresses how Avid resolved the expensive refactoring of their thread-based scheduler into a task-based solution by choosing Intel® Threading Building Blocks (TBBs). He explores how Avid was able to easily integrate the Intel TBBs into their video editor applications and more than 5 million lines of code.

  • December 15, 2009
    How to Use Intel® Parallel Studio to Streamline Code Development in a Multicore Environment
    Speaker: Matt Dunbar, Director for Performance Technology, SIMULIA (Bio)

    Matt Dunbar is the director for performance technology at SIMULIA. Since joining the company in 1993, he has worked on parallelization of the Abaqus suite of products, initially for shared memory architectures and more recently for distributed memory architectures. Dunbar has also been intimately involved in selecting both the hardware and software tools used in the development of the Abaqus product line.

    Abstract:
    Resolve elusive, costly multithreading errors quickly and efficiently with Intel® Parallel Studio. While many coding problems that lead to bugs in software applications are typically straightforward logic errors, errors in managing memory and in multithreading code can sometimes take weeks to months to diagnose and fix. Matt Dunbar explores how and why taking advantage of multicore processors through multithreaded code is critical for compute-intensive applications. While spotlighting his work on SIMULIA's Abaqus finite element solver, Dunbar addresses the need for multicore execution and shares his experiences using Intel Parallel Studio to streamline code development in a multicore environment.