Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tools

Managing That Millstone


Managing That Millstone

Changes in a system can be made in two primary ways. I like to call them Edit and Pray and Cover and Modify. Unfortunately, Edit and Pray is pretty much the industry standard. When you use Edit and Pray, you carefully plan the changes you are going to make, you make sure that you understand the code you are going to modify, and then you start to make the changes. When you're done, you run the system to see if the change was enabled, and then you poke around further to make sure that you didn't break anything. The poking around is essential. When you make your changes, you are hoping and praying that you'll get them right, and you take extra time when you are done to make sure that you did.

Superficially, Edit and Pray seems like "working with care, a very professional thing to do. The "care that you take is right there at the forefront, and you expend extra care when the changes are very invasive because much more can go wrong. But safety isn't solely a function of care. I don't think any of us would choose a surgeon who operated with a butter knife just because he worked with care. Effective software change, like effective surgery, really involves deeper skills. Working with care doesn't do much for you if you don't use the right tools and techniques.

Cover and Modify is a different way of making changes. The idea behind it is that it is possible to work with a safety net when we change software. The safety net we use isn't something that we put underneath our tables to catch us if we fall out of our chairs. Instead, it's kind of like a cloak that we put over code we are working on to make sure that bad changes don't leak out and infect the rest of our software. Covering software means covering it with tests. When we have a good set of tests around a piece of code, we can make changes and find out very quickly whether the effects were good or bad. We still apply the same care, but with the feedback we get, we are able to make changes more carefully.

If you are not familiar with this use of tests, all of this is bound to sound a little bit odd. Traditionally, tests are written and executed after development. A group of programmers writes code and a team of testers runs tests against the code afterward to see if it meets some specification. In some very traditional development shops, this is just the way that software is developed. The team can get feedback, but the feedback loop is large. Work for a few weeks or months, and then people in another group will tell you whether you've gotten it right.

Regression Testing

Testing done this way is really "testing to attempt to show correctness. Although that is a good goal, tests can also be used in a very different way. We can do "testing to detect change.

In traditional terms, this is called regression testing. We periodically run tests that check for known good behavior to find out whether our software still works the way that it did in the past.

When you have tests around the areas in which you are going to make changes, they act as a software vise. You can keep most of the behavior fixed and know that you are changing only what you intend to.

Regression testing is a great idea. Why don't people do it more often? There is this little problem with regression testing. Often when people practice it, they do it at the application interface. It doesn't matter whether it is a Web application, a command-line application, or a GUI-based application; regression testing has traditionally been seen as an application-level testing style. But this is unfortunate. The feedback we can get from it is very useful. It pays to do it at a finer-grained level.

What Is Unit Testing?

The term unit test has a long history in software development. Common to most conceptions of unit tests is the idea that they are tests in isolation of individual components of software. What are components? The definition varies, but in unit testing, we are usually concerned with the most atomic behavioral units of a system. In procedural code, the units are often functions. In object-oriented code, the units are classes.

Can we ever test only one function or one class? In procedural systems, it is often hard to test functions in isolation. Top-level functions call other functions, which call other functions, all the way down to the machine level. In object-oriented systems, it is a little easier to test classes in isolation, but the fact is, classes don't generally live in isolation. Think about all of the classes you've ever written that don't use other classes. They are pretty rare, aren't they? Usually they are little data classes or data structure classes such as stacks and queues (and even these might use other classes).

Testing in isolation is an important part of the definition of a unit test, but why is it important? After all, many errors are possible when pieces of software are integrated. Shouldn't large tests that cover broad functional areas of code be more important? Well, they are important, I won't deny that, but there are a few problems with large tests:

  • Error localization. As tests get further from what they test, it is harder to determine what a test failure means. Often it takes considerable work to pinpoint the source of a failure. You have to determine where along the path from inputs to outputs the failure occurred. Yes, we have to do that for unit tests also, but often the work is trivial.
  • Execution time. Larger tests tend to take longer to execute, making test runs rather frustrating.
  • Coverage. It is hard to see the connection between a piece of code and the values that exercise it. We can usually find out whether a piece of code is exercised by a test using coverage tools, but adding new code means considerable work to create high-level tests that exercise the new code.

Unit tests fill in gaps that larger tests can't. We can test pieces of code independently; we can group tests so that we can run some under some conditions and others under other conditions. With them, we can localize errors quickly. If we think there is an error in some particular piece of code and we can use it in a test harness, we can usually code up a test quickly to see if the error really is there. Good unit tests share two qualities: They run fast, and they help us localize problems.

In the industry, people often go back and forth about whether particular tests are unit tests. Is a test really a unit test if it uses another production class? I go back to the two qualities: Does the test run fast? Can it help us localize errors quickly? Naturally, there is a continuum. Some tests are larger, and use several classes together. In fact, they may seem to be little integration tests. By themselves, they might seem to run fast, but what happens when you run them all together? When you have a test that exercises a class along with several of its collaborators, it tends to grow. If you haven't taken the time to make a class separately instantiable in a test harness, how easy will it be when you add more code? It never gets easier. People put it off. Over time, the test might end up taking as long as 1/10th of a second to execute. That's too slow.

Yes, I'm serious. At the time that I'm writing this, 1/10th of a second is an eon for a unit test. Let's do the math. If you have a project with 3,000 classes and there are about 10 tests apiece, that is 30,000 tests. How long will it take to run all of the tests for that project if they take 1/10th of a second apiece? Close to an hour. That's a long time to wait for feedback. You don't have 3,000 classes? Cut it in half. That is still a half an hour. On the other hand, what if the tests take 1/100th of a second apiece? Now we're talking about 5 to 10 minutes. When they take that long, I make sure that I use a subset to work with, but I don't mind running them all every couple of hours.

Unit tests run fast. If they don't run fast, they aren't unit tests. Other kinds of tests often masquerade as unit tests. A test is not a unit test if:
  1. It talks to a database.
  2. It communicates across a network.
  3. It touches the file system.
  4. You have to do special things to your environment (such as editing configuration files) to run it.
Tests that do these things aren't bad. Often, they are worth writing, and you generally will write them in unit test harnesses. However, it is important to be able to separate them from true unit tests so that you can keep a set of tests that you can run fast whenever you make changes.
With the help of Moore's Law, I hope to see nearly instantaneous test feedback for even the largest systems in my lifetime. I suspect that working in those systems will be like working in code that can bite back. It will be capable of letting us know when it is being changed in a bad way.

Test Coverings

So how do we start making changes in a legacy project? The first thing to notice is that given a choice, it is always safer to have tests around the changes that we make. When we change code, we can introduce errors; after all, we're human. But when we cover our code with tests before we change it, we're more likely to catch any mistakes that we make.


[click for larger image]

Invoice Update Classes
Much legacy code work involves breaking dependencies so that change can be easier..

"Invoice Update Classes shows us a little set of classes. We want to make changes to the getResponseText method of InvoiceUpdateResponder and the getValue method of Invoice. Those methods are our change points. We can cover them by writing tests for the classes they reside in.

To write and run tests, we have to be able to create instances of InvoiceUpdateResponder and Invoice in a testing harness. Can we do that? Well, it looks like it should be easy enough to create an Invoice; it has a constructor that doesn't accept any arguments. InvoiceUpdateResponder might be tricky, though. It accepts a DBConnection, a real connection to a live database. How are we going to handle that in a test? Do we have to set up a database with data for our tests? That's a lot of work. Won't testing through the database be slow? We don't particularly care about the database right now anyway; we just want to cover our changes in InvoiceUpdateResponder and Invoice.

We also have a bigger problem. The constructor for InvoiceUpdateResponder needs an InvoiceUpdateServlet as an argument. How easy will it be to create one of those? We could change the code so that it doesn't take that servlet anymore. If the InvoiceUpdateResponder just needs a little bit of information from InvoiceUpdateServlet, we can pass that information along instead of passing the whole servlet in; but shouldn't we have a test in place to make sure that we've made that change correctly?

All of these problems are dependency problems. When classes depend directly on things that are hard to use in a test, they are hard to modify and hard to work with. Dependency is one of the most critical problems in software development. Much legacy code work involves breaking dependencies so that change can be easier.

So, how do we do it? How do we get tests in place without changing code? The sad fact is that, in many cases, it isn't very practical. In some cases, it might even be impossible. In the example we just saw, we could attempt to get past the DBConnection issue by using a real database, but what about the servlet issue? Do we have to create a full servlet and pass it to the constructor of InvoiceUpdateResponder? Can we get it into the right state? It might be possible. What would we do if we were working in a GUI desktop application? We might not have any programmatic interface. The logic could be tied right into the GUI classes. What do we do then?

The Legacy Code Dilemma

When we change code, we should have tests in place. To put tests in place, we often have to change code.


[click for larger image]

Dependencies Broken
We want to make changes to the getResponseText method of InvoiceUpdateResponder and the getValue method of Invoice.

In the Invoice example, we can try to test at a higher level. If it is hard to write tests without changing a particular class, sometimes testing a class that uses it is easier; regardless, we usually have to break dependencies between classes someplace. In this case, we can break the dependency on InvoiceUpdateServlet by passing the one thing that InvoiceUpdateResponder really needs. It needs the collection of invoice IDs that the InvoiceUpdateServlet holds. We can also break the dependency that InvoiceUpdateResponder has on DBConnection by introducing an interface (IDBConnection) and changing the InvoiceUpdateResponder so that it uses the interface instead. "Dependencies Broken shows the state of these classes after the changes.

Is it safe to do these refactorings without tests? It can be. These refactorings are named Primitivize Parameter and Extract Interface, respectively. When we break dependencies, we can often write tests that make more invasive changes safer. The trick is to do these initial refactorings very conservatively.

Being conservative is the right thing to do when we can possibly introduce errors, but sometimes when we break dependencies to cover code, it doesn't turn out as nicely as what we did in the previous example. We might introduce parameters to methods that aren't strictly needed in production code, or we might break apart classes in odd ways just to be able to get tests in place. When we do that, we might end up making the code look a little poorer in that area. If we were being less conservative, we'd just fix it immediately. We can do that, but it depends upon how much risk is involved. When errors are a big deal, and they usually are, it pays to be conservative.

When you break dependencies in legacy code, you often have to suspend your sense of aesthetics a bit. Some dependencies break cleanly; others end up looking less than ideal from a design point of view. They are like the incision points in surgery: There might be a scar left in your code after your work, but everything beneath it can get better.

If later you can cover code around the point where you broke the dependencies, you can heal that scar, too.

The Legacy Code Change Algorithm

When you have to make a change in a legacy code base, here is an algorithm you can use: 1. Identify change points. 2. Find test points. 3. Break dependencies. 4. Write tests. 5. Make changes and refactor.

The day-to-day goal in legacy code is to make changes, but not just any changes. We want to deliver value while bringing more of the system under test. At the end of each programming episode, we should be able to point not only to code that provides some new feature, but also to its tests. Slowly, tested areas of the code base surface like islands rising out of the ocean. Work in these islands becomes much easier. Over time, the islands become large landmasses. Eventually, you'll be able to work in continents of test-covered code.

The Case of the Undetectable Side Effect
Before you can test, you must separate responsibilities effectively.

In theory, writing a test for a piece of functionality shouldn't be too bad. We instantiate a class, call its methods, and check their results. What could go wrong? Well, it can be that easy if the object we create doesn't communicate with any other objects. If other objects use it and it doesn't use anything else, our tests can use it also and act just like the rest of our program would. But objects that don't use other objects are rare.

Programs build on themselves. Often we have objects with methods that don't return values. We call their methods, and they do some work, but we (the calling code) never get to know about it. The object calls methods on other objects, and we never have a clue how things turned out.

Let's imagine a class with this problem.

 public class AccountDetailFrame extends Frame 
             implements ActionListener, WindowListener
 {
     private TextField display = new TextField(10);
     ...
     public AccountDetailFrame(...) { ... }

     public void actionPerformed(ActionEvent event) {
         String source = (String)event.getActionCommand();
         if (source.equals("project activity)) {
             DetailFrame detailDisplay = new DetailFrame();
             detailDisplay.setDescription(
                     getDetailText() + " " + getProjectionText());
             detailDisplay.show();
             String accountDescription  
                     = detailDisplay.getAccountSymbol();
             accountDescription +=  ": ";
             ...
             display.setText(accountDescription);
             ...
         }
     }
     ...
 }

This old class in Java does it all. It creates GUI components, it receives notifications from them using its actionPerformed handler, and it calculates what it needs to display and displays it. It does all of this in a particularly strange way: It builds up detailed text and then creates and displays another window. When the window is done with its work, it grabs information from it directly, processes it a bit, and then sets it onto one of its own text fields.

We could try running this method in a test harness, but that would be pointless. It would create a window, show it to us, prompt us for input, and then go on to display something in another window. There is no decent place to sense what this code does.

What can we do? First, we can start to separate work that is independent of the GUI from work that is really dependent on the GUI. Because we are working in Java, we can take advantage of one of the available refactoring tools. Our first step is to perform a set of Extract Method refactorings to divide up the work in this method.

New Methods, New Names

It will help to make the detailDisplay frame an instance variable of the class. Then we can extract the code that uses that frame into a set of methods. What should we name the methods? To get ideas for names, we should take a look at what each piece of code does from the perspective of this class, or what it calculates for this class. In addition, we should not use names that deal with the display components. We can use display components in the code that we extract, but the names should hide that fact. With these things in mind, we can create either a command method or a query method for each chunk of code.

After we've extracted all of the code that deals with the detailDisplay frame, we can go through and extract the code that accesses GUI components on the AccountDetailFrame class. After those extractions, we can use Subclass and Override Method and test whatever code is left in the performCommand method. To do this, we'll have to create a class named TestingAccountDetailFrame and have it save the description passed to setDisplayText in a variable named displayText.

 public class AccountDetailFrame extends Frame 
             implements ActionListener, WindowListener
 {
     private TextField display = new TextField(10);
     private DetailFrame detailDisplay;
     ...
     public void actionPerformed(ActionEvent event) {
         String source = (String)event.getActionCommand();
         performCommand(source);
     }

     public void performCommand(String source) {
         if (source.equals("project activity)) { 
             setDescription(getDetailText() + " " + 
                            getProjectionText());
             ...
             String accountDescription = getAccountSymbol();
             accountDescription +=  ": ";
             ...
             setDisplayText(accountDescription);
             ...
         }
     }

     void setDescription(String description) { ... }
     void setDisplayText(String description) { ... }
     String getAccountSymbol() { ... }
     ...
 }

Here is a test that exercises the performCommand method using the testing subclass:

 public void testPerformCommand() {
    TestingAccountDetailFrame frame = 
      new  TestingAccountDetailFrame();
    frame.accountSymbol = "SYM;
    frame.performCommand("project activity);
    assertEquals("SYM: basic account, frame.displayText);
 }
When we conservatively separate out dependencies by doing automated extract method refactorings, we might end up with code that makes us flinch a bit. This is an extremely crude refactoring, but at least it separates responsibilities. The AccountDetailFrame is tied to the GUI (it is a subclass of Frame) and it still contains business logic. At the very least, now we can run the method that contains business logic in a test case. It is a positive step forward. Later, we can move some of these responsibilities to other classes.

A good refactoring tool will only allow you to do an automated extract method refactoring when it is safe. However, that just makes the editing that we do between uses of the tool the most hazardous part of the work. Remember that it is OK to extract methods with poor names or poor structure to get tests in place. Safety first! After the tests are in place, you can make the code much cleaner.

-MF


Prior to joining Object Mentor as a consultant, Michael Feathers had developed CppUnit, the initial port of JUnit to C++. He is a frequent conference speaker and has acted as the chair for the CodeFest event at the last three OOPSLA conferences. His passion is helping teams surmount problems in large legacy code bases-and have fun in the process. This article is abridged from Working with Legacy Code (Prentice Hall Technical Reference, 2005). Copyright 2005 Pearson Education. Reprinted with permission.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.