Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Prevention's the Cure


Prevention’s the Cure

The term static analysis means different things to different people in the software industry. It seems that there are two main static analysis theories: the program execution camp and the pattern-matching camp.

For program execution adherents, static analysis means trying to logically execute the program—sometimes symbolically—to uncover code problems such as memory corruption, leaks and exceptions. This type of testing largely focuses on identifying code problems without creating test cases. However, it’s deficient on two counts: First, it’s difficult to perform on large amounts of code, and incredibly slow when the program has a call structure exceeding 1,000 lines of code. Second, all of the source code—including the code for every function and library that the program calls—must be available for the analysis to produce valid results. As a result, this type of static analysis works best for fairly simple programs with all source code available. However, you might want to tolerate the performance or noise issues when you consider the trade-off: You can deal with a slow and potentially noisy analysis now, or deal with the consequences of finding bugs after the software is on the market.

In the pattern-matching camp, static analysis means detecting patterns in the parse tree after code is analyzed. A tool parses the source code and represents it as a connection of nodes in a parse tree; then, a rule-based system tries to find patterns in the parse tree. This type of analysis is incredibly flexible: The information it yields depends upon the types of patterns that are checked. If the patterns identify error-prone code constructs, static analysis helps prevent errors. If the patterns identify formatting problems, static analysis helps with code beautification. Orders of magnitude faster than the other type of static analysis, it can be applied to segments of code whether or not the entire code base is available.

Both types are useful in appropriate situations, but I’ll focus on pattern-matching because it’s the technique that’s most generally applicable—and most misused.

Choosing Rules and Reducing Noise

When you use a pattern-matching static analysis tool, your results are entirely determined by the patterns that the tool is configured to find. In addition to code beautification, pattern-matching static analysis can be used to determine whether code follows “best practice” rules developed by industry experts such as Scott Meyers, Martin Klaus, Scott Ambler and Joshua Bloch. Any developer who uses a static analysis tool that matches patterns based on these rules can access experts’ knowledge and prevent scores of errors that have been made by other developers working with the same language.

Checking rules based on these experts’ knowledge dramatically reduces the level of noise produced by static analysis tools—remember the adage, “Garbage in, garbage out”? If you use a tool that checks rules you don’t want to follow, however, that tool’s results will be mostly noise to you.

Depending on how rules are matched, some noise filtering could be required even if you’re checking trusted coding rules. If the rule can be checked by matching an exact pattern, every reported violation will represent an actual rule violation. If you consider violations of a particular rule to be noise, you can eliminate that noise by configuring the static analysis tool to ignore that rule. However, when you’re checking rules that are matched based on “fuzzy logic” (for example, rules that check whether code follows design patterns), some noise is inevitable: The tool finds similar, but not exact, matches. In this case, each reported rule violation represents a potential rule violation, and you must review the results to determine whether an actual violation occurred.

Focusing on Process, Not Bugs

To further maximize the benefits of static analysis, it’s critical that you see coding standards as a means of preventing errors—not detecting them. Many developers are disappointed if a coding standard violation doesn’t point them to an obvious bug. When they explore a violation and find an error-prone construct rather than an error, they think that the coding standards aren’t useful, eventually stop investigating violations, and later stop performing static analysis altogether. This speaks to a fundamental problem with the software industry: a focus on removing errors, not preventing them.

Error prevention involves correlating each error to the exact point in the development process that allowed it, then fixing that part of the process; this prevents the need to debug applications after the fact and produces an exponential increase in product quality. Error prevention is very different than error detection. When development focuses only on error detection, the flawed process that generated those errors is left uncorrected, and the errors persist.

The coding standards promoted by industry experts are like an immunization that gives you resistance otherwise reserved for someone who has survived a disease—you get the error-prevention benefits derived from other developers’ mistakes without having to suffer the consequences of making them.

To learn how coding standards help you prevent—rather than detect—errors, consider these examples.

The following C++ code is technically valid, but contains a logical problem that could be prevented by following a common C++ coding standard:


The developer has written a class that contains a pointer member, but hasn’t defined a copy constructor. The compiler will copy the class using a default copy constructor, and the pointer will be copied to a new class. The outcome will be correct only if the developer didn’t intend to create a new memory location. In this case, because SomeObject creates the memory and frees the memory in its destructor, the intent that SomeObject owns the memory is clear. Therefore, two different instances of SomeObject shouldn’t point to the same address because two different objects can’t own the same memory. If the developer had followed the C++ coding standard that says to write a copy constructor for classes with dynamically allocated memory, this logical problem (and the resulting errors) would have been prevented.

Frequently, a developer will intentionally omit the copy constructor, thinking that since it won’t be used, why bother writing it? This is exactly the kind of early shortcut that can lead to problems later (for instance, when someone decides to use the copy constructor without checking that it’s been implemented correctly). This is a distinguishing feature between error detection and error prevention.

Here’s some Java code that’s technically correct, but causes a performance problem that could have been prevented by following a common Java coding standard:

The preceding segment of code gets a new word every time it passes through the loop. It appears that we keep expanding the message string and that the message is being appended—but appearances are deceiving. In fact, the old String message object is discarded, and new memory is allocated every time the message is expanded. The old information from the message is copied to the new memory, and then a new character is added at the end. The new memory is one word longer than the old memory.

Because the message variable will correctly contain the appended values, you might think that the object was modified. Instead, a new string is being created each time += is called. We’re creating extra work for the virtual machine because the garbage collector must clean up whatever memory is left behind. If we travel through the loop 1,000 times, the garbage collector must identify and delete thousands of chunks of memory (multiple extra objects are created every time we go through the loop) that create significant overhead for the program.

We can eliminate these extra objects (and the related extra overhead) by following the common Java coding standard that says to use the StringBuffer class instead of the String class for non-constant strings. StringBuffer is a mutable object. Because it can be modified, StringBuffer can truly be appended, rather than give the appearance of being appended. You use StringBuffer in performance-critical sections of code, such as loops. If we modify this code to follow that standard, we can modify the memory without creating new objects each time through the loop. Every time we travel through the loop, we extend the memory as we grow the buffer. This way, we don’t force the garbage collector to perform unnecessary work.

Five Simple Steps

How can teams use the information gained from software measurement, monitoring and testing to improve the development process? It takes five simple steps—steps that should be automated whenever possible:

  1. Detect an error.
  2. Isolate the error’s cause.
  3. Locate the point in the production process where the error was created.
  4. Implement practices to prevent the error from recurring.
  5. Monitor the process for improvements.
Static analysis is orders of magnitude more effective when used in a group environment than when used by only a few developers in a team. As part of a group-wide automated error-prevention methodology, it helps you prevent common bugs as well as those that are most troubling to your team.

A Day in the Life

So how does error prevention happen in the real world? Simple: During the day, developers modify their code; before they check it back into source control, they verify that their work adheres to the coding standards using an agreed-upon static analysis tool. Configuration files stored in source control ensure that each developer’s tool accesses the rules and applies them identically. The architect determines the tools’ configuration, by adding, suppressing or removing rules.

Once the code is checked into source control, the developer’s connection to that code is essentially over. During the subsequent automated builds, the same set of static analysis tools, with the same set of configuration files that the developers used during the day, runs on the code modified during the day. Ideally, the code base shouldn’t violate any standards at this point. If it does, reports of these violations should be sent to the architect and the developer, inviting both parties to review the offending code. Architect and developers then work together to determine why the violations have occurred and whether a rule should be suppressed (due to some unusual coding requirements for a particular project), or whether the code should be modified. This review is an important part of the process, forcing the architect and developer to reason through the code and the decisions behind building it a certain way. Almost always, additional errors will be discovered and fixed or prevented by close scrutiny of even one violation.

If the developers don’t adhere to the coding standards, the architect can institute zero-error thresholds to force the team to check in only perfectly clean code. Thus, static analysis acts as a filter for the source control repository check-in.

The Right Direction

Static analysis not only steers you away from the errors that are historically most common and problematic for the development community as a whole, but also helps you avoid making mistakes that have caused problems for your team and/or project. And in my view, a step toward prevention is a step forward for the entire software industry.

Free Pattern-Matching Tools

Java junkies: Try dipping into these static analyzers first.

The classic open-source static analysis tool, Lint, was a program that analyzed your C code for a variety of portability and standard violations, many of which were quite serious. Because early C compilers didn’t do a whole lot of checking, and the language allowed all sorts of erroneous statements, this was a useful tool. Many detractors disliked the typically voluminous output, however, which often required sifting through hundreds of errors to find the “gems” indicating real problems. You could exert some control over the errors by adding comments to your code with special tags that the Lint tool recognized. For more information on Lint, refer to Ian F. Darwin’s Checking C Programs with Lint (O’Reilly, 1988). Similar open source or free static analysis tools for Java include:

—A. Kolawa


Adam Kolawa came to the U.S. from Poland in 1983 to pursue a Ph.D. at the California Institute of Technology. Four years later, having earned a doctorate in theoretical physics, he founded Parasoft, a software company based in Monrovia, Calif.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.