Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Surviving Performance Fire Drills


August 2000 Feature: Surviving Performance Fire Drills

Acute performance problems are a regular part of life for software developers and their managers. It is easy to fall into the trap of seeing them as natural disasters: You can’t predict them, you can’t prevent them, and you can’t do much more than offer sacrifices to the local volcanoes to get rid of them.

Despite the best preparation, catastrophes will occur with little or no warning. However, there are better responses to them than trekking to the nearest crater. Indeed, techniques that you would never expect to use in a crisis can come in handy. Code inspection is one of them.

Code inspection is often seen as a measure favored by quality assurance engineers or methodology mavens. Many people think of it as part of a large, bureaucratic process in which code passes through many hands before it sees the light of day. Developers, and even managers, often consider it a waste of time imposed on them by some quality process (such as ISO-9000 or the Software Engineering Institute’s Capability Maturity Model) rather than as an effective tool for solving or preventing problems.

In fact, code inspection can be a powerful tool for solving and preventing problems, including software performance nightmares—and there is a reason why I focus on performance as an example. Performance problems often come to our attention as situations that demand immediate response or "fire drills." If I can convince you that code inspection is an effective technique in a product performance crisis, I shouldn’t have much trouble convincing you that it can help in less stressful situations.

Performance problems provide a good example for all kinds of unplanned fire drills. When a performance problem arrives, it arrives with a bang. There are tense phone calls, threats of corporate disaster and all the usual theater. Frequently, there are precious few data to back up an impressionistic problem characterization ("it’s too slow"). There is plenty of pressure to act immediately.

The first order of business is chaos-containment. In managing a fire drill, you may find yourself surrounded by people who are running around like chickens with their heads cut off. If you allow this to spread to those who must fix the problem, you are finished.

The second step is fostering ownership. In many development organizations, software performance belongs to everyone and no one. At best, a separate quality assurance group might run some performance regression tests. The main developers don’t think of themselves as performance experts or as responsible for solving or preventing performance problems. With these two steps in mind, let’s look at how to convert a crisis from a hot potato to a routine concern.

We’re in Trouble

One morning, you arrive at your office to find smoke coming out of your e-mail client: There is a performance problem. Perhaps the new version of the product is mysteriously slower than in the previous release, or it runs at a snail’s pace at a particular customer site or for a particular application. It could be that someone has deployed it on a full production configuration for the first time, and it is not performing adequately. Or maybe seemingly trivial changes in the load or environment have led to a significant drop in performance.

Whatever the problem, it has to be fixed "right this minute." As with many issues that people think have to be fixed "right this minute," maybe it does, and maybe it doesn’t. Perhaps the urgency will evaporate when you ask hard questions. Your first job is to push back as appropriate, but I don’t have to tell you that.

Finger-pointing

How do performance problems happen? While this question may not help you very much in fixing the current mess, it deserves attention after the dust settles. Code can be defect-free, meet its specifications and still yield a performance crisis the day it is deployed for three main reasons: unclear requirements, code evolution and memory issues.

In the first case, it can be hard to predict the configuration and load requirements for a deployed product—assuming, of course, that someone tried to state performance requirements at all. If you don’t see the answers to questions like "How many …" and "How fast …" and "With how many megabytes of memory in the machine …" in the specifications, then no one has thought through the requirements.

Second, seemingly trivial changes to the code to address functional requirements or defects can have alarmingly large effects on the performance of the code. A programmer repairing a defect may not know that the procedure she is adjusting is used in a performance-critical inner loop.

Finally, when code has buffers and caches, very small changes in the size of the input can produce large changes in the results.

Moving Targets

One of the biggest traps in dealing with performance issues is the lack of hard data. Problems are often reported in qualitative, impressionistic terms. And while the customers’ subjective perceptions make the difference between success and failure in the marketplace, you can’t solve a performance problem by chasing subjective targets. You, as the manager, must make sure that the performance problems are measurable and repeatable.

This is easiest when the code can be run as some sort of batch process that you can time. However, even if you can’t do that, you can still impose some rigor on the problem definition. For instance, you can write a script to follow in such events to make sure that the people complaining and the people trying to fix the problem are observing the same problem.

In short, use the scientific method: define a problem, state a hypothesis, perform an experiment and record the results. A specific example might be:

Problem statement: When I edit a drawing with at least 10,000 elements, group at least 10 and try to rotate them, the program is unresponsive for at least 20 seconds.

Hypothesis: The background pagination process has gone haywire and is consuming all the time.

Experiment: Disable the pagination process and go through the exact script from the problem statement, and see what happens.

Results: A scientist’s most important tool is the lab notebook. People involved in performance wars tend to forget what they have already tried. It is essential to keep good records of all the data and all the experiments. The notebook doesn’t have to be a paper one. Spreadsheets work very well to record a series of experiments and data points.

Among the challenges in tracking down a performance bottleneck are instrumentation, unrepeatable testing conditions, complex modeling and tunnel vision. For instance, you can insert code (manually or with a tool) that measures time used in different parts of the code. It requires great care, however, to avoid spending so much time in the instrumentation code that it changes the program’s behavior beyond recognition, or, worse yet, beyond usability.

Also, if the production environment is a $0.5 million server, it can be a little difficult to get an identical one that you can use for experiments. Say you’d like to construct a model of the parts of the system and how they interact in order to predict the performance of the code. It doesn’t take much complexity for a system to become a very difficult modeling problem. Finally, snow blindness can strike developers: Once someone has been staring at the same code for long enough, he stops seeing it.

Grinding Through

In spite of the difficulties, performance problems are solved with a combination of all of the methods mentioned above—with the critical addition of a methodical process of experimentation. Code inspection is not a magic bullet that obviates all of that work. It’s primarily a remedy for snow blindness, both individual and organizational. When a new person reads code for the first time, he carries different preconceptions and assumptions. He asks questions and challenges what he sees. It can be quite amazing to see the light bulb go on over the head of the original developer.

In a sense, you can think of code inspection as a way of generating specific, relevant materials for brainstorming. When a new person reads old code, he nearly always ends up with a laundry list of questions about how it works—or whether it works at all.

Code inspection is also useful at different scales. If you have managed to isolate a problem to a relatively small region of code, but the owner can’t seem to find the smoking gun, code inspection can lead very rapidly to the critical problem.

If you are faced with a large mass of poorly understood code, I don’t know of any alternative to reverse-engineering it to some extent, and, again, code inspection is the only way I know to approach that problem.

For larger bodies of code, a few days of examination can prove quite illuminating. Original authors tend to work on individual problems in isolation. An inspector may notice, for example, that the code is doing the same work in three different places.

Note that these tactics apply both to focused and broad performance problems. If a particular operation in a particular case is too slow, careful reading of the implementation of that operation can yield results very quickly. If the problem is general slowness of an entire program, an examination that attempts to understand what work gets where can turn up structural problems, such as bottlenecks or redundant computations.

So how do you define the problem, apply the right technical resources and choose management strategies to make a development group more functional in dealing with performance problems and less likely to have them in the first place? Code inspection holds many of the answers.

Defining the Problem

As discussed before, the most important thing you need to do in tackling a performance problem is to make sure that the problem is defined in terms of empirically measurable facts. This can be difficult. On the one hand, the people reporting the problem may present you with mushy, impressionistic complaints. You must convert those complaints into a specific characterization that you can measure. On the other hand, developers tend to respond to performance problems by spinning out hypothetical causes of the problem without any factual basis. I call these, after Rudyard Kipling, "Just So Stories." A Just So Story about a performance problem is a perfectly plausible explanation that usually comes with a proposed technical solution. "The object cache is too small, so we need to make it bigger;" or "the computation of intercept points is using floating point arithmetic, so we need to change it to scaled integers."

The only problem with Just So Stories is that there is no way of telling whether they are correct except to go ahead and implement the proposed solution. If the story happens to be correct, there is a happy ending. Even a blind pig finds an acorn once in a while, as the saying goes. (Of course, this isn’t entirely fair. These ideas are often informed by a developer’s deep understanding of the code. You would be surprised, however, at how often developers’ intuitions are off.) If the story is wrong, you have used up valuable time to no avail.

Just as you have to insist that the original problem be defined in terms of measurable, repeatable phenomena, you should insist that proposed solutions be justified by measurable data. When someone proposes an explanation for a performance problem, the first step is to use some kind of instrumentation to find out if it is really an explanation. In general, only after you have instrumentation should you proceed to coding any proposed fixes.

Of course, you have to use judgment in deciding when to take a flyer. Given a choice between spending three hours trying an unsupported hypothesis and three weeks implementing a measurement scheme to test it, you may want to go ahead with the first option. The problem is that, before you know it, you may have used up three weeks on a whole series of experiments of this kind, and you have neither fixed the problem nor implemented enough instrumentation to find the problem. The cliché about making haste slowly is quite relevant here. Write it in big letters and hang it on your door.

Data Acquisition

The Holy Grail of performance tuning is timing data from the production environment. If you know where the time is going, you know where to focus in fixing the problem. This information can be elusive, but it is hardly ever impossible to get.

The easiest case is when you can use a tool that automatically instruments your code. My favorite for this purpose is Quantify, by Pure Software (which has since been swallowed up by Rational). Quantify modifies your executable image to add code that captures timing data at a line-by-line level. You run the modified executable, then use a GUI to poke around in the call graph and at annotated source listings, looking at where the time went. You can even call upon it to calculate the difference between two different runs as a way of getting a clearer picture of the effect of a change in the code or the input data.

You don’t get something for nothing, however. An instrumented executable runs v-e-r-y s-l-o-w-l-y, so it can’t be put into full production. A full run of an interesting case may take a very long time, to the point of impracticality.

That said, if the program can run a relevant case, even if it takes several hours, the resulting data is a gold mine of real, hard facts: "Fifty percent of the CPU time is being spent formatting date-time strings for the log, even when the log is disabled?" (I kid you not. I was involved in a performance stew with this eventual explanation.)

Quantify is available for Windows and a number of Unix platforms. There are a variety of competing products. I confess that I don’t have experience with them and cannot offer a comparison. One that is ubiquitous on Unix is gprof. This is a much weaker tool that requires you to compile with special options and which attempts to capture function-level CPU time usage. Sometimes it can be very useful, but other times it’s not.

Even with the best of tools, the analysis of the data is rarely completely trivial. Here is an example of why. It is quite common for a function-level profile to report that a significant amount of time is consumed in a common library routine. This does not imply that the library routine is the problem, however. More likely it means that some part of the code has an algorithmic problem such that it is making too many calls to the library routine. It can be hard to separate these issues. In a specific case, it is very common to see a CPU profile that shows a great deal of CPU time in the standard C strcmp function. This does not imply that this simple string comparison function needs to be rewritten. Rather, you have to look at the callers of the function and see why these callers call for so many string comparisons.

Another source of difficulty is real time versus CPU time. If the application’s performance is bounded by some kind of I/O latency, nothing will be using much CPU time.

What do you do if you can’t use an automatic instrumentation tool? You capture your own real-time measurements. Every operating system in common use has way to read a real-time clock. The trick is to find a way that is as inexpensive as possible. For example, on the RS/6000, you can create a tiny assembly procedure that reads a clock register.

Once upon a time, real timers were a mixed bag. If you are time-sharing a system, there is no telling when you will get a big real-time pause, courtesy of the scheduler. These days, you can almost always run your tests on a dedicated system. In such an environment, real times are, by far, preferable to CPU times.

For example, if you are working on a multithreaded server, and you have too many threads (or the wrong thread scheduling policy), your application may suffer from thrashing as it tries to share the CPU among the threads. In this case, CPU measurement shows nothing out of the ordinary, but real-time measurements show suspicious delays.

A set of real-time measurement routines should be part of the utility toolkit of any development group. In the event of a performance problem, a developer should be able to take them off the shelf, apply them to the code and get some numbers.

Once you have numbers, you have to be careful about how to interpret them. In many cases, some basic statistics are required. Never trust one measurement. Take 100, or at least 10. Plot a histogram, or run a standard deviation. Not all programs display wide variations, but you won’t know if you are working on one that has wide variations until you try.

One thing to look out for is the outlier. That is, the single measurement that is much larger than the norm. If the original complaint is of the form, "Once in a while I click the mouse and it runs forever," then the outlier is the case you are seeking. Tracking these down may require additional instrumentation to correlate the long real times with other conditions in the program.

In these cases, you need to augment your real-timing tools with event-logging tools. Once again, the crucial problem is to avoid perturbing the performance by measuring it. A common error is to call a library routine (such as printf) to record an event. These routines are often single-threaded—or just plain time-consuming—and can completely alter the code’s behavior.

A traditional technique is the in-memory log structure. Allocate a large buffer at the beginning of the run. Record the events by writing simple, binary information (numbers, for instance) into the log buffer. After the run is complete, format the log and write it to a file. Don’t do input/output, don’t format strings, and don’t add anything extra during the measurement interval. A log of events with real-time stamps can be a powerful tool for understanding the performance behavior of your code.

Management Strategies

As a development manager, you are faced with helping a group productively apply the techniques described above.

The first thing you have to do is to model the appropriate attitude, which consists of the following truisms:

• If we run around like chickens with our heads cut off, we will end up as chicken salad.

• If we can’t measure it, it doesn’t exist.

• If we don’t engage in a systematic process, we will never accomplish anything.

• We’re here to hunt the problem, not the author of the problem.

• We need to keep records of what we do.

• We need to keep records of what we do.

• We need to keep records of what we do.

Strike a balance between the need to solve the problem quickly and the need to take enough time to solve the problem at all. Getting a group to engage (in the military sense) with a performance problem is a delicate balance between fostering individual expertise in using the tools and techniques described above and promoting an attitude of interest and responsibility in the entire group. I think of this as a circular process. You start with performance being everyone and no one’s problem. Then you make it someone’s problem in particular. Finally, you feed that person’s work back into the group to get everyone involved in a productive way.

If you don’t have anyone in the group with experience in measurement and instrumentation, you need to find one. Hiring an expert (at least full-time) is almost always impractical, so you have to figure out how to turn someone you already have into an expert.

The most important qualifications for a field promotion to "performance expert" are:

• unfamiliarity with the code;

• tenaciousness;

• ancestors from Missouri (the "Show Me" state);

• organization and record-keeping ability.

Unfamiliarity with the code qualifies the individual as a code inspector. It can be much more effective to have a new person show up and start asking questions about the code in order to design an instrumentation strategy than to have developers add instrumentation. They will tend to neglect those areas that have escaped from their conscious picture of the code.

A performance issue is a great opportunity to get a relatively inexperienced person to step up to the plate and take on more responsibility and a more central role. An important role of the instant expert is to foment, with your assistance, brainstorming. Once you have some data, gather the usual suspects in a room and do the brainstorming thing. Open the floor for suggestions, however strange, and allow no critical commentary. Collect them all, and then collect opinion about which ones to investigate (with suitable measurement, of course), first.

The best thing about brainstorming is that even the most stressed-out developers usually relax and enjoy themselves. The whole process begins to look suspiciously like fun, and then your life is much easier.

A brainstorming session is your opportunity to ensure that the process finds the problem and not its author. (You may want to keep track of the author for future reference.) You want the author to see this entire process as a service, not as a cannibal feast in which he or she is the main course. The developer or developers may have been knocking themselves silly for an extended period of time, or been knocked silly by external parties, over performance issues. Giving them a process that allows them to make material progress on these issues should, with a little spin, improve their quality of life.

Solving the Problem Before it Arrives

Well, you’ve heard all this, taken it home and perhaps survived a file drill with fewer casualties. Now what? Well, I nearly don’t have to tell you that all of this technique can be looped back into your process before someone shows up with a problem. You can set quantitative performance targets, instrument and measure as part of the development process and hunt down and fix performance problems before anyone knows you have them.

This process can be more fruitful that you might expect. Making relatively few simple changes can speed up almost any program significantly. It is much less time consuming to work on these issues within the development cycle than in response to an external crisis. So giving every release a basic performance once-over can pay.

Performance problems present themselves as fire drills. You have a choice. You and your development group can join the Keystone Kops, or you can stop, take a deep breath and apply an organized process to the problem.

Whether you use a Fagan-style inspection with a number of reader or reviewers, or just one other person reading the code and asking questions, this, combined with an appropriate methodology for gathering data and performing experiments, can be a powerful technique. You and your group will learn to treat performance as a central part of the product’s quality—and you’ll apply those lessons to solving problems before they escape to the field and come back around to bite you.

A Performance War Story
Inspection without measurement won't always solve the problem.


Once upon a time, I was hired to have a look at a very prominent Internet search engine. The Internet is always growing, so much so that a program is inevitably a constant source of performance issues.

I want to emphasize that, while some of the things I found will sound somewhat goofy, no one involved was incompetent or stupid. Collectively, they had tried to scale a solution and an approach until it no longer worked. At that point, they were so invested in what they had that it was hard to back off and see the problem from a fresh perspective. It happens to all of us.

The code in question is a multithreaded server written in ANSI C on Digital Unix. It ran on very large and expensive hardware—so large and expensive that the development group didn’t have a machine with the same configuration as the production systems. Because it is on DU, there is no Quantify tool to help. There is a DEC (Compaq) tool called atom, but it is very focused on CPU time and on the leaves of the call graph (such as memcpy). It turned out to produce very misleading results in this case.

The hardware was so expensive and the scale of the problem growing so fast, that there was a great deal of management resistance to buying $0.5 million systems to increase capacity. Over the long run, the problems would have to be solved by a different architecture in which they could run a larger number of smaller systems instead of fewer large ones. In the short term, the requirement was to "fix the code" to run significantly faster.

The development organization had an unhappy track record of making changes that were intended to improve the performance. These changes had proven disappointing when deployed, even when they appeared to improve performance in controlled test runs.

There was no instrumentation in the code. Some data had been collected with the DU atom tool, but it attributed the time to low-level data copying primitives. At best, this failed to tell us what callers were using them. At worst, it was misleading, as it was a CPU-time only measurement, and it seemed very likely that there were real-time effects at work.

I started two efforts in parallel. I inspected the code, and I persuaded the owner to add real-time measurements over all of it and to try to get the version that contained them into service as soon as possible.

Code inspection yielded four big issues that seemed as if they might explain the problem:

• The code worked by reading and processing large amounts of data. A particular thread working on a particular task would read, say, 50 MB of data into memory all at once and then traverse it. This required very large buffers, needless to say. There was no I/O hardware that optimized such very large buffers. It seemed likely that the performance could be improved by reading in smaller chunks.

• The code was holding locks while reading the 50 MB chunks.

• The code was using a default thread-scheduling policy that used a round-robin rule. That meant that if the program ever started more threads than it could service with the available processors, the thread scheduler would thrash the CPU’s among the threads. It would also give the CPU to new threads in preference to old ones.

• Several interesting pieces of code appeared to be duplicated.

The only one of these for which there was hard evidence was the thread-scheduling policy. The system was responding to extra load by mysteriously failing to complete queries. This was entirely consistent with a thread falling off the bottom of the scheduler’s agenda. Further, it was very hard to imagine any way to test this theory other than to add the two lines of code needed to change the policy and measure the results.

The process that ensued was interesting. The owner of the code moved from vehement denial that any of these issues could be problems to a state of mind something like, "Gee. I haven’t looked hard at the code in a while. Now that I look at it again, it sure looks peculiar."

Eventually, a version with real timers made it into service, and pointed at some computational code that hadn’t looked especially suspicious to me. This code, it turned out, could be put on a diet by moving some computations around and moving some other computations out of the run time altogether. Note that if we had acted on the inspection alone, without measurements, we would not have found this.

It is important to note that the cost of this code was not news to anyone involved, except perhaps me. I never did figure out why it wasn’t on the top of the list before I got involved. In this case, what I brought to the process was method: First collect the data, then look at it, and then choose a course of action. It sounds simple, but after a few too many fire drills, people get punchy and a little disorganized.

Long after I finished my work, the team succeeded in deploying a significantly faster version. Fixing the scheduling policy drastically reduced the incidence of mysterious timeouts, and fixing the CPU-intensive code reduced the cost of each query.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.