Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Automating Design-Pattern Identification


Automating Design-Pattern Identification

Automating Design-Pattern Identification

Dr. Dobb's Journal June 1998

By Jagdish Bansiya

Jagdish is a lecturer in the computer-science department at the University of Alabama in Huntsville. He can be contacted at [email protected].


For maintenance, reuse, reverse engineering, and reimplementation, software developers frequently need to examine source code to understand object-oriented software systems. The ability to learn and understand software systems from source code is greatly enhanced by visualizing the software systems at higher levels of abstraction, rather than seeing them as nebulous collections of classes and method implementations. Design patterns provide this medium-grained abstraction and can be an effective tool for understanding object-oriented systems.

Visualizing object-oriented programs as a system of interacting patterns requires detecting, identifying, and classifying groups of related classes (clusters) in program code. These visualizations represent either known (design and code) patterns, or clusters that perform an abstract task and are not necessarily a known pattern solution.

In this article, I present DP++, a tool that automates design-pattern detection, identification, and classification in C++ programs. DP++ currently identifies most of the structural and a few of the behavioral patterns described in Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1995), by Erich Gamma, Richard Helm, John Vlissides, and Ralph Johnson (the Gang of Four, or "GoF"). An executable version of DP++, which runs on Windows 95/NT, is available electronically from DDJ (see "Resource Center," page 3) and at http:// indus.cs.uah.edu/.

Design Patterns in Software Systems

Pattern solutions are defined and represented by a general structure of the classes in the pattern, and an assignment of responsibilities to the participating classes. These pattern solutions can be customized for varying needs and situations. The use of design and implementation patterns is either consciously built into a system design, or pattern-like solutions get created as a result of problem solving and implementation by first principles (developing solutions from scratch). While several popular design patterns have been catalogued and described by the GoF in Design Patterns, Frank Buschmann's Pattern-Oriented Software Architecture: A System of Patterns (John Wiley & Sons, 1996), and other books, most designers are not aware of these pattern solutions. Consequently, designers and developers are independently rediscovering and implementing pattern solutions in constructing software systems. The pattern movement hopes to reduce this rediscovery of patterns and to promote the reuse of pattern solutions through cataloguing and creating solution repositories.

Typically, it is hard to recognize pattern use in real-world software systems -- unless you know what you are looking for, and then carefully and methodically search for the pattern. This is because the final forms of pattern implementations in real-world solutions are generally different from the simplistic and isolated examples we are used to seeing in textbooks. Textbook-represented clean structures of patterns get convoluted in real-world software systems. Why? Because influences, such as the domains in which the patterns are implemented, change factors such as the number of participant classes and their responsibilities, participant classes performing multiple roles, developer preferences and styles, programming language features, and performance issues.

The strategy I propose here, and have implemented in DP++, for identifying pattern use in object-oriented programs is based on the structural relationship between classes and objects. The approach also uses heuristics and empirical information derived from design and implementation metrics for pattern identification. The patterns that are most meaningfully amenable to identification using this approach are the structural patterns (and some of the behavioral patterns) described in the GoF's Design Patterns. Structural patterns deal with the composition of classes or objects. Behavioral patterns, on the other hand, characterize the ways in which classes or objects interact and distribute responsibility.

This approach to pattern detection and identification focuses on the ability to identify key structural and functional relationships between classes and objects. Most structural patterns can be reduced to requiring minimum key structures that can point to the usage of the patterns in larger systems. The key structural relationships of interest for pattern identification are inheritance, aggregation, and uses. While inheritance is easy to identify, aggregation can have several forms in implementation. Aggregation could be implemented either as a physical containment, in which case the component objects are completely encapsulated inside the aggregate object, or by references/pointers of the component type that are initialized during program run time. Aggregation of the latter form provides the greatest flexibility, since the references can be set and changed at run time to refer to a variety of related objects. Typically, the aggregation relationship is detected by looking for data-member declarations in classes that are of other user-defined class types. The "uses" relationship represents an arbitrary dependency between objects of two classes. This relationship is generally implemented through method parameters, and therefore detected by examining method parameter types.

Another powerful and effective feature of object orientation that is necessary for the realization of several patterns is polymorphism. While polymorphism allows parent-class-method overriding in child classes, objects of the child classes can be used and manipulated as objects of parent-class types. This concept offers great flexibility for programs to choose and select, at run time, the types of objects that are created and manipulated. Since the real value of polymorphism can only be seen when the program is executed, it is difficult to use polymorphism effectively in pattern detection and identification from static program implementations. However, polymorphic methods (marked virtual in C++) provide valuable information that can be used with pattern heuristics. This, however, requires parsing method implementations for detection of a method's uses, referenced by relationships. In particular, we need to track the usage patterns of a class's virtual method by other methods and instances in subclasses where the virtual methods are overridden.

Another component of object-oriented systems that deserves specialized treatment is the interface. Interfaces in a C++ implementation are abstract classes that have one or more member functions that are declared purely virtual (that is, methods with no implementation). These virtual member functions of an interface class are implemented by overriding in the derived classes. An interface helps define a contract of services that clients can expect to be present and used. Interfaces provide for a useful design and implementation technique, frequently referred to as "programming to an interface," which decouples users of services (clients) from implementation classes. Finally, template classes are also used in the solutions of several patterns, and therefore the identification of template-class declarations and the instances of their use is also important for pattern identification.

Thus, a list of automatically identifiable structural elements of an object-oriented system should include: interfaces (abstract classes), base classes and subclasses, template classes, inheritance relationships, aggregation by physical containment of instance variables, aggregation by references/pointer instance variables, and method parameter-based uses relationships.

To illustrate how pattern-detection algorithms can be developed, consider the Composite structural pattern. Composite patterns are useful to solutions that require individual objects (components) and their composition (collection) to be treated identically. Systems that can benefit from the use of this pattern solution include programs like file systems, which treat files (components) and directories (collections of files) uniformly, and drawing packages, which manipulate graphics primitives such as lines, rectangles, circles (components) and their groups (collections of graphics primitives) uniformly.

The basic structure of a Composite pattern requires a part-whole relationship between parent and child classes. This aggregation relationship is typically implemented as a reference from the composite child class to a parent class and has a cardinality of 1-to-N. In terms of structure, there is a loop back from a child class to a parent class. Therefore, the detection of a composite pattern hinges on the detection of this loop in branches of class hierarchies. Figure 1 shows the general form of this minimum necessary structure for detecting the presence of a Composite pattern.

The Decorator pattern, which provides the ability to dynamically add responsibilities to objects, requires a basic structure similar to the Composite pattern. The key to implementing this pattern is also an aggregation relationship from a child class to a parent class. However, it differs from the Composite pattern in the cardinality of the aggregation relationship from the child class to parent class, which is 1-to-1 for the Decorator pattern. Both Composite and Decorator patterns require a recursive structure that can be extended in an open-ended fashion.

The Adapter pattern, which addresses the conversion of the interface of a class to what is expected by clients, has two forms of solutions. The Class Adapter pattern uses multiple inheritance to adapt one interface to another; whereas the Object Adapter relies on object composition. The detection of the Class Adapter is triggered when multiple inheritance is detected, and the detection of the Object Adapter is triggered by the discovery of a composition, which is followed by an examination of the method implementations of the adapter (the composite class) for uses of methods declared in the "adaptee." Similar strategies have been developed and implemented for the identification of patterns such as the Bridge, Facade, Flyweight, Proxy, Template Method, and Chain of Responsibility.

DP++: An Automated Pattern Identification Tool

DP++ automates the detection, identification, and multiview visualization of design patterns and clusters in C++ programs. The overall architecture of DP++ consists of the three major subsystems shown in Figure 2. As a first step, the tool does a partial compilation of the C++ project source to generate the project's source browser database files. DP++ uses Microsoft Visual C++ 5.0 for project compilation and browser database generation. Alternatively, users can specify their own source browser files for DP++ to analyze directly. Information about the project's class structure and functionality is subsequently queried from the browser data files and stored in several internal data structures for fast access and manipulations. The pattern-detection algorithms work on these internal data structures.

The display components of the tool allow for viewing the program structure, patterns detected, and clusters of related classes. As Figure 3 shows, these components provide three ways to view project information:

  • A class-model window (graphical view) that shows the program structure and patterns as sets of class diagrams.
  • A tree-view window to display project class hierarchies.
  • A text-view window to display class content and information about patterns and clusters.

DP++ currently detects most of structural patterns primarily based on the detection of structural relationships. It also identifies clusters of functionally related classes, which may not necessarily represent any known pattern, but do represent an abstraction in the program. I've used DP++ to successfully identify patterns in several commercial and public-domain object-oriented packages, ranging in size from 30 to 400 classes.

New features are currently being added to DP++, including an expansion of the suite of patterns that can be detected unambiguously, a reduction of misclassifications, and more-effective visualization techniques. A pattern repository and a pattern-heuristic component to help in identifying pattern usage are also in the works.

Case Study: Drawing Tool Kit (DTK)

To illustrate how you can use DP++, I'll examine the Drawing Tool Kit (DTK) library I implemented several years ago. DTK was implemented in C++ for both Windows- and UNIX-based applications.

At the time I developed DTK, I didn't know anything about design patterns. The DTK library was designed and implemented entirely by first principle -- everything was designed from scratch using an iterative approach and then implemented in C++.

DTK is a component of a "retargetable architectural" solution used in the development of several commercial products that address the domain of printing and plotting from application software to a variety of output devices. DTK acts as a bridge between the various types of application output representations such as PostScript, HPGL, TIFF, GIF, and a variety of output devices, ranging from simple laser printers to large wide-body document production systems.

Figure 4 shows a major section of the DTK class inheritance hierarchy as viewed in the DP++ class-model display window. The DTK library was implemented using 44 classes, 30 of which are structured as in the class hierarchy in Figure 4. The remaining 12 classes of the system were found not to be involved in any of the patterns, and therefore are excluded from the analysis in this case study. The class header files of the DTK library were provided as input to DP++ for the study.

One instance of an Adapter, three instances of the Composite, and two instances of the Flyweight design patterns were detected in the DTK library. Figure 5 shows all the six patterns and their participant classes; 23 classes participate in defining these six patterns identified in the DTK library. The participant classes of the patterns are placed in separate subgraphs, which are shown by labeled (pattern name) rectangles in the class-model display window. Relationships between classes are shown using different line characteristics, inheritance is shown using solid black lines, attribute-based uses relationships (represented by pointers and references of user-defined class types) are shown using dotted red lines, and method-parameter-based uses relationships are shown using dotted blue lines.

The Class Adapter pattern discovered in DTK represents the solution for interfacing between the language interpreters and the DTK library. Classes MSDrvCalls, Parser, and MSParser are the participants of this pattern. The MSParser class, which implements an interpreter for Windows Metafiles, is the adapter class, the Parser class is the adaptee class, and the MSDrvCalls is the target class of the Adapter pattern solution described by the GoF. This pattern in DTK represents the solution designed for supporting new representation languages, which requires only developing new (adapter) interpreters that retarget the representation language primitives to the primitives in the DTK (adaptee).

Three different instances of Composite pattern are identified in the DTK library. The participants of the Composite pattern #1 (Figure 5) are Graphics, Container, Page, HeapPage, and PageElement. The component classes of this pattern are Graphics and all its descendant classes. The HeapPage is the composite class of the pattern. While Container and Page are just intermediate classes in the branch from the Graphics class to the HeapPage class, the PageElement class plays the role of an iterator for HeapPage. This pattern represents the solution of a page as a collection of Graphics type items.

Two other Composite pattern usages are also detected in the DTK library. Composite #2 represents a collection of line styles. The participant classes of this pattern are Resource, LineResource(Component), LineStyleResource, LineStyleTableResrc(Composite), and LineResrcElement(Iterator). Composite pattern #3 is similar to Composite #2, except that it represents a part-whole hierarchy of color objects.

The Flyweight pattern supports the sharing of repetitive or less-frequently changing information between object instances. Two separate uses of the Flyweight pattern were detected in the DTK library. Lines, Arcs, Circles, and similar geometric primitives can all be reduced to a collection of line segments. These line segments can have several characteristics (attribute bundle) such as Weight, Color, Style, and End-Types. The values for this bundle of line characteristics or attributes tend to repeat for a large number of line segments. Therefore, it is a better design from the performance viewpoint to split a traditional line segment class (which would keep the line coordinates and attribute information in one class) into two classes, one for the line coordinates (Line, Arc, Circle, and so on) and one for the line attributes (LineAttr). This solution represents the design of a Flyweight pattern. In DTK, the participants of the Flyweight pattern #1 are the classes Line, Arc, and Circle. These are the context-dependent (extrinsic) classes; the LineAttr (Flyweight) class is the context-independent (intrinsic) class, and therefore the shareable class. The participants of the Flyweight #4 are the Text (extrinsic) class and TextAttr (Flyweight and intrinsic) class.

The six identified patterns cover all the essential concepts designed and implemented in the DTK library. However, there are additional attributes and parameter-based uses relationships between classes that exist in the DTK library, but are not covered in the pattern classes and relationships. The presence of these attributes and parameter-based relationships is reflected by the use of additional class types as data-member or method-parameter declarations. Figure 6 shows the classes and uses relationships, which are not covered in any of the detected design patterns, along with the identified pattern solutions. The relationships in Figure 6 cover all of the major static interactions between classes of the DTK library.

The views represented in Figures 4, 5, and 6 illustrate the class-inheritance model, pattern solutions used, and static class interaction model of the DTK library. These views capture the essence of the DTK solution and provide complementary information that is valuable for understanding large object-oriented systems.

Conclusion

The case study of pattern-use identification in the DTK library demonstrates that automated tools such as DP++ can succinctly identify, capture, and represent the internal architecture, pattern solutions (concepts), and static interaction between classes from the project source code. The DTK study shows that the ability to visualize a system-level architecture as a collection of interacting patterns and relationships rather than as a nebulous collection of classes can significantly aid in the process of learning and understanding object-oriented software systems.

The approach to identifying design patterns in DP++ relies on reducing known patterns to the minimum necessary and identifiable structure(s) required by the pattern solutions. However, an approach based solely on pattern structures is not complete because pattern structures are not sufficiently unique. Several patterns tend to use similar basic structures. Thus, to reduce erroneous identifications (misclassification or completely missing the presence of patterns) the approach implemented by DP++ is in the process of being extended to use design heuristics and empirical data in resolving the presence of patterns and pattern-like solutions. The heuristics and empirical data are to be derived from design and implementation metrics, which evaluate the structure and functional characteristics of classes and relationships. A repository and mechanisms to specify thresholds for pattern identification are also in the process of being incorporated in DP++.

DDJ


Copyright © 1998, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.