Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Code Generation Templates Using XML and XSL


January 2002/Code Generation Templates Using XML and XSL

Code Generation Templates Using XML and XSL

Cristian Georgescu

XML is indisputably the long-awaited lingua franca for defining and transmitting structured data. With a little help from XSL, it can also generate C++ code for you.


Introduction

Dealing with complexity is one of the biggest challenges of programming. There is a great need for tools that help us to better understand and manage software complexity. Such tools would not only help create better code, but would also increase software development productivity. Code generators are tools that generate complex code from a simpler model. Code generators are therefore an ideal candidate for helping cope with software complexity.

XML is an open standard of the W3C (World Wide Web Consortium) designed for describing structured data. XSL (eXtensible Stylesheet Language) is a companion language used to transform and format XML. While XML is a way of representing information, XSL is the language for manipulating information [1].

XML and XSL have emerged from the need to exchange and present data over the Internet. Another reason for the emphatic acceptance of XML is that this technology has an unprecedented capability to yield surprisingly elegant solutions for the most unexpected problems. And perhaps the best kept secret of today’s hype around XSL is that it gives you the power to create your own code generators. Using XML and XSL together to create custom code generation templates may be yet another killer app for these two technologies.

In this article, I will explore the possibility of creating custom code generators using XML and XSL by presenting a simple example. XML will contain a representation of the conceptual model of the problem, whereas XSL will do the work of generating the code, based on information extracted from the XML model.

I assume the reader already knows what XML is and how it is structured, as well as the basics of XSL. There are many books and sources that cover XML and XSL in more detail. For a detailed introduction to these topics, a good starting point is [2] and [3]. For in-depth coverage of XSL, see [4] and [5].

Code Generators

Coping with Complexity

A difficult problem will usually lead to a difficult solution and a complex program. Good design can lead to simpler and more understandable code, but there are limits to what good design can do. Mixing the information contained in a conceptual model with the artifacts of the language obscures the ideas behind the code, makes tracking the requirements into code more difficult, and affects the capacity to make easy changes to the implementation. One way to address this issue is to separate the conceptual model from the process of converting the model into code and to use a code-generating tool to create the final code.

A code generator has two inputs: the information model containing any data describing the conceptual model of the problem domain and the implementation logic that contains the processing instructions necessary to generate the code (see Figure 1). A meta-model that describes the structure of the information model is also necessary. The meta-model is a model’s model that contains metadata or information about information. A second type of code generator has a more elaborate architecture that involves the creation of an intermediate design model and possibly iterating through several design models before the final step of code generation (see Figure 2).

Having a separate information model allows requirements to be mapped directly to the elements of this model. A separate implementation logic, on the other hand, facilitates changes in the implementation. These changes can be either small coding style changes or more drastic ones like switching to a different implementation language. Any redundancy in the generated code will not affect the information model, but rather it will be dealt with by the implementation logic.

The Case against “Copy and Paste”

Everybody agrees that using “copy and paste” is evil because it leads to code duplication that makes the resulting code more difficult to maintain. Factoring out the redundancy in a separate function or a class is the better approach. But a fact of life remains: “copy and paste” is not only a fast and dirty way to reuse code, but, in some cases, it is also the only alternative.

When dealing with code reuse, not all languages are created equal. C++ has arguably the most powerful built-in language tools for code reuse: inheritance, templates, and macros. Yet there is intrinsic code redundancy required by C++ that may be difficult to avoid. For instance, the duplication of a method signature declared in the .h file and implemented in the .cpp file demands that changes to one parameter require changes to be made in two different places in code: the .h file and the .cpp file.

This is just the tip of the iceberg. A bigger source of redundancy is that during the translation of a conceptual model into code, one piece of otherwise atomically indivisible information cannot always be mapped to only one language artifact. This leads to redundancy in code consisting of the same piece of information having to be repeated in different places in code. Redundancy makes the code more difficult to understand, as well as more resilient in the face of changes.

As an example, consider a simple Employee class with only a few properties. The conceptual model of the class is presented in Figure 3. Using a common language idiom, a property is implemented as a private field with associated public get and set methods. The resulting design model and code is also presented in Figure 3. In this approach, the property name and its type are the pieces of information that have to be duplicated in several places in code.

Other languages, like the new Microsoft C#, may have explicit support for properties, by introducing an additional language feature and thus making the implementation of properties less reliant on the vilified “copy and paste.” However, there is no language that can be infinitely extensible in order to incorporate all desirable features needed by a scrupulous no “copy and paste” policy.

Yet another source of information duplication appears when additional production-related issues are addressed by adding code to an initially simple class prototype. Real production code often requires additional error-handling code, unit tests, and extensive documentation. A simple class may need a wrapper to be used as a command-line tool or as a COM component. In the case of a COM wrapper around an existing C++ class, the class information will be duplicated in an IDL file, and COM-related support has to be added, usually by delegating the wrapper calls to a contained object. A GUI interface for the COM component will require some Visual Basic code to display and manipulate the class functionality. Persistence to a relational database will require SQL code to create underlying tables for storing the class objects, as well as SQL code to select, update, insert, and delete rows in these tables. The final code required by a simple class in the real world may contain thousands of lines distributed over many files containing code written not only in C++ but also in other languages like SQL, Visual Basic, and Java. The information contained in these files is largely redundant, stemming from maybe just a few properties.

Other Advantages

Having the code generated automatically makes the task of enforcing consistent coding conventions much easier. If you ask two people to write a simple Employee class, you will probably get two different versions of the code. Using a code generator to create Employee will ensure coding consistency. Any set of coding conventions, style, rules, and best practices can be uniformly enforced for all classes by incorporating them in the code generator.

The technique of restructuring code, known as refactoring, is part of every programmer’s tools of the trade, but it has always been a tedious and error-prone manual “cut and paste” activity. By having the implementation logic removed from the code and put into the code generator, the refactoring process is simplified by replacing the need of low-level “cut and paste” with higher-level modifications of the code generation logic.

The only drawback of automatic code generation is the overhead incurred by the need to create the information model and to write the implementation logic. The danger lies in running into a hodgepodge of model metadata, code generating, and auxiliary code files written in different languages. But this is far outweighed by the fact that different people with different skills can work on the two different models: business knowledge goes directly into the information model, whereas programming skills are required to create the implementation logic. This approach radically changes the software development process by separating the modeling activity from the design and implementation. While collaboration is still required between the business expert and the programmer, they can work independently on the business information and the implementation. One big advantage is that the separation of the code generation from the information model encourages the division of work between business expert and programmer.

XSL Code Generators

Why XML and XSL?

An important decision related to custom code generators is the format used for the representation of the information model and the implementation logic.

XML has established itself as a widely accepted industry standard of a universal data format to be used by applications that exchange data over the Internet in an application-independent and platform-neutral manner.

XML standardization, acceptance, and flexibility as a data format make it an obvious choice to store the information model. Therefore, a natural choice for the implementation logic is to write it as an XSL stylesheet that uses the information contained in the XML information model to generate the final code.

Alternatively, XSL can be used to indirectly generate a UML model rather than directly generate code. The MOF (Meta Object Facility) language defined by OMG can be used for describing UML-based design meta-models. Efforts to provide a standard way to represent UML models as XML are under way. XMI (XML Metadata Interchange Format) is an XML-based representation of UML. Generating a design meta-model from an XML information model can be done by using either XMI or MOF. Converting to an intermediate XMI model should be straightforward because XSL simplifies the XML to XML transformations. Such an approach to code generation, with an intermediate design model, is presented in Figure 2.

I will show how to build an XSL code generation template that can read XML files containing the information model and write out the resulting generated code. I will concentrate on a simple example that will give the reader a better understanding of how to use XML and XSL to write custom code generators.

One important advantage of using an XSL stylesheet for translating the information model into code is that XSL is a declarative language. XSL is declarative (as compared to procedural) in the sense that it describes results of the translation rather than the steps to perform it. This pervades in a higher-level of abstraction. XSL also has some procedural control structures, but these features are somehow limited when compared to the power offered by a real procedural language. Therefore, complex code generation logic is better addressed by a combination of XSL and a procedural language [6].

How It Works

Any XML document is structured like a tree with the nodes having a name, a text value, named attributes, and child sub-nodes. XML’s simple and generic design makes it possible that virtually any kind of information can be stored in an XML document. An XML document can be associated with a DTD (Document Type Definition) document that contains metadata describing the structure of the XML document. The XML document can be checked against the corresponding DTD file to ensure that its structure is valid.

An XSL stylesheet consists of a set of rules that determine how specific elements from the XML document should be processed. Each rule has a match criteria that specifies the pattern of elements for which the rule is activated. Each rule also has an associated transformation that acts on the matching elements. Rules are composed hierarchically because each rule can trigger other rules by using the <xsl:apply-templates> tag that indicates the hot spots where other rules can bind and become active.

The following XSL tags are most frequently used and their meaning is intuitive. <xsl:value-of> is used to select the value of an element from the XML document. <xsl:text> is used to specify custom text. <xsl:if> and <xsl:when> are the XSL analog of if-then-else and switch control structures, while <xsl:for-each> iterates through a collection of XML elements. <xsl:include> simply includes the contents of one stylesheet into another.

An XSL processor is required to apply an XSL stylesheet to an XML document. The XSL processor reads both the XML document and the XSL stylesheet, and then applies a set of transformation rules from the stylesheet to the XML document in order to produce the desired output. In this article, I have used XT, a freely available XSLT processor developed by James Clark. For additional information on installing and using this program as well as other tools and information related to XML and XSL, see [7].

The code generator will contain the following files (see Figures 1 and 2):

  • The meta-model with the XML metadata schema described in a DTD file.
  • The information model stored in XML files.
  • The XSL code template files used to generate the source code.

The data representing the information model that is stored in the XML file has to follow the structure defined by the DTD meta-model file. The XSL template extracts information from the XML file and generates the source code.

The software development process tailored for the XML and XSL code generation techniques (see Figure 1) consists of several steps:

  1. 1. The business analyst creates the conceptual model that specifies the structure of the data for the code generator in the form of XML metadata in a DTD file. The metadata includes the name of the elements that can be contained in the XML file and their possible attributes, as well as the relationship between these elements.

    For example, a first cut at the UML-like meta-model from Figure 1 might only be composed of class elements. Each class should have a required name attribute:

    <!ATTLIST class
      name NMTOKEN #REQUIRED>
    

    Each class element could optionally have a properties element (“zero or one” cardinality of an element is designated by ?):

    <!ELEMENT class (properties?)>
    

    The properties element is a collection of elements that could contain several property elements (“zero or many” cardinality is designated by the *):

    <!ELEMENT properties (property*)>
    

    Each property should have a required name attribute and a required type attribute:

    <!ATTLIST property
      name NMTOKEN #REQUIRED
      type CDATA #REQUIRED>
    

    These DTD statements specify a simplified meta-model that will be expanded to a more realistic example in the following section.

  2. 2. The business expert stores data for the information model in an XML file. This data consists of specific elements with their attributes and their sub-elements. The structure of these elements has to follow the structure defined by the DTD file created in Step 1. For example, an Employee class that has a Name property of type string and a Salary property of type double is stored as the following XML file:

    <class name="Employee">
      <properties>
        <property name="Name"
                  type="string">
        </property>
        <property name="Salary"
                  type="double">
        </property>
      </properties>
    </class>
    
  3. 3. The programmer writes the code generation logic in the form of XSL scripts. The XSL scripts containing the code templates extract the data from the information model and use it to generate source code.

    For example, the following XSL template operates on the XML elements named class by inserting the C++ keyword class, selecting the name attribute of the current element class, adding the {, activating the templates for the sub-sub-elements property contained in the sub-element properties, and finally inserting the terminating };:

    <xsl:template match="class">
     <xsl:text>class </xsl:text>
     <xsl:value-of select="@name"/>
     <xsl:text>
    {</xsl:text>
     <xsl:for-each
        select="properties/property">
      <xsl:apply-templates select="."/>
     </xsl:for-each>
    <xsl:text>
    };</xsl:text>
    </xsl:template>
    

The XSL template for the sub-elements property inserts some indenting white space, selects the attribute type of the current element property, and inserts a white space and the attribute name prefixed by _ and terminated by ;:

<xsl:template match="property">
 <xsl:text>
    </xsl:text>
 <xsl:value-of select="@type"/>
 <xsl:text> _</xsl:text>
 <xsl:value-of select="@name"/>
 <xsl:text>;</xsl:text>
</xsl:template>

The generated code will be:

class Employee
{
    string _Name;
    double _Salary;
};

While this is an over-simplified example, it should nevertheless give an idea of the steps involved in writing a code generator.

A Concrete Example

I will expand the previous example to build a non-trivial code generator. I will follow the same step-by-step approach as in the previous section.

The Meta-Model

The meta-model has to be developed by a business analyst, and specifies the structure of the model. Our sample meta-model consists of classes with properties and associated methods. Figure 4 shows this meta-model using UML. The resulting meta-model is defined in the DTD schema file, Model.dtd, in Listing 1.

Each class element has a name and belongs to a package that can be mapped to a C++ namespace. Each class has a set of dependencies and uses relationships that map to #include C++ statements. Each class has parent classes that map to C++ base classes. Each class has methods that map to C++ functions and properties that map to C++ data members with additional get and set functions.

Each method element has a list of exceptions, a return type, and a set of parameters. Each of the parameters has a name, a type, and a possible default value. Additionally, methods have a visibility attribute that can take values in the set: (public, protected, private) with a default value of public. Methods also have a modifier attribute that can be virtual or static and a const attribute that can be true or false. The following DTD snippet specifies the structure of the method element:

<!ATTLIST method
  name NMTOKEN #REQUIRED
  type CDATA #REQUIRED
  visibility (public | protected |
              private) "public"
  modifier (virtual | static) ""
  const (true | false) "false">

Each property element has a name and type, as well as additional attributes that specify if get and set functions are required and if there is a data member to store the value of the property. All classes, properties, methods, and parameters have an info field that is used for documentation and will map to C++ comments.

Designing the meta-model is perhaps the single most critical step of building the code generator because both the model and the code generator will depend on the meta-model. A carefully thought out meta-model will allow enough flexibility without sacrificing simplicity.

The Information Model

Based on the meta-model, a business expert builds the information model. The information model becomes a repository of business knowledge. In this case, the information model is stored in Employee.xml (Listing 2). Our example has an Employee that is an Object (no pun intended here) that has a Name, an SSN (Social Security Number), and a Salary. The Employee class has only one very appealing business method: increaseSalary.

The Object class is the root of our inheritance hierarchy, and it is introduced only as an example of handling inheritance relationships. For simplicity, the inheritance is the only class relationship considered here, and other relationships like association or containment are not handled by the XSL code-generating template. Their implementation would closely follow the XSL code that handles the inheritance relationship. The model for the Object class is defined in Object.xml (Listing 3).

The generation of Employee.h (Listing 4) that defines the Employee class, based on the XML model in Employee.xml and using the XSL translation logic from CppClass.xsl, can be done with the following command line:

C:\>xt Employee.xml CppClass.xsl Employee.h

In a similar way, I have generated Object.h, which contains the definition of the class Object (Listing 5).

The Implementation Logic

CppClass.xsl (Listing 6) contains part of the XSL code generating template. It mainly consists of the top-level template for the class element defined by <xsl:template match="class">. This template outputs boilerplate code and, in turn, calls other templates. Boilerplate code that is output as text to the generated file is usually surrounded by the <xsl:text> tag. The templates for handling the #includes and the class inheritance can also be found in Listing 6. Listing 7 shows the implementation of the CppMethod.xsl stylesheet, which handles the code generation for the C++ class methods.

The code generation always follows the same pattern: the stylesheet writes to the output the code for the current element and then iterates through the sub-elements and calls the corresponding templates associated with each specific sub-element. For instance, in this case, the class template generates the skeleton for the class code and then iterates through all method elements; for each method, it applies the method template. The <xsl:for-each> XSL control structure is used to perform this iteration:

<xsl:for-each
  select="methods/method">
  <xsl:apply-templates select="."/>
</xsl:for-each>

The method template, in turn, generates the method code and iterates through all the param elements by applying the param templates, and so on.

It is straightforward to implement simple language idioms via code generation. In my example, the class properties are translated into get and set functions and an associated data member. Enough flexibility is provided through the XML element attributes so that read-only, as well as read-write, properties can be generated. The translation of properties into code is handled by CppProperty.xsl (Listing 8).

Extending the Horizon

In an enterprise application, the same business object is distributed across different tiers and is often implemented in different languages: C++, Visual Basic, Java, and SQL. A code generator can use the same information model to automatically generate the corresponding pieces of code in different languages. The task of generating documentation can also be automated by a special XSL stylesheet that extracts only the specific documentation tags and formats them, (i.e., an HTML document).

Up to this point, I have only generated C++ code. However the information model for the Employee class (Listing 1) is an abstract model not necessarily bound to C++. Only minor modifications are required to the CppClass.xsl, CppMethod.xsl, and CppProperty.xsl templates in order to generate Java code or even Visual Basic code.

In order to prove the power of this approach, I have considered the generation of SQL code that can be used for persisting the C++ objects. The additional SqlTable.xsl and SqlProperty.xsl stylesheets were used to generate the Employees.sql script (all available for download at <www.cuj.com/code>). This SQL script creates an Employees database table that can be used to store the Employee objects.

Conclusion

Code generators can help you cope with software development complexity by separating the conceptual model of the problem from its translation into code. Code generators can save time by generating important amounts of code from smaller information models. They also handle code duplication (“copy and paste”) and help with code refactoring (“cut and paste”) by factoring out the redundant information and by isolating changes to the information model. Increased code quality can be achieved by easily enforcing consistent coding standards during the code generation. Code generators will not replace the need for good design, but they can certainly be made to do a lot of the dirty work for you.

Using XSL to create your own code generators based on models stored in XML files offers the advantage of using XML as a standard and flexible data format together with XSL as a powerful processing language that was specially designed to operate on XML.

The increased leverage over code production and maintenance obtained by using a code generator does comes at the expense of having to create a separate information model, as well as the effort to specify and customize the code generation logic. But the benefits far outweigh the additional overhead, especially in larger projects.

I encourage you to experiment with the code samples provided in this article and to modify them in order to fit specific needs. With minor changes to the XSL templates, code can be generated in Java instead of C++. See <www.cuj.com/code> for the complete source code.

Notes and References

[1] Technically, XSL can be further split into XSLT, which defines the transformations, and XSL Formatting Objects. In this article, I will consistently use XSL to refer to all of the technologies defined under the XSL umbrella.

[2] Elliotte Rusty Harold and W. Scott Means. XML in a Nutshell: A Desktop Quick Reference (O’Reilly & Associates, 2001).

[3] Mark Birbeck, et al. Professional XML (WROX Press, 2000).

[4] Michael Kay. XSLT Programmer’s Reference (WROX Press, 2001).

[5] Neil Bradley. The XSL Companion (Addison-Wesley, 2000).

[6] Another way to extend the present approach is to replace the XSL stylesheet with a C++ or Java program that parses an XML file and walks through the parsed tree-like representation and writes the generated code. This programmatic solution might be more scalable than a simple XSL stylesheet when the XSL code generator logic becomes too complicated.

[7] James Clark. XML resources at <www.jclark.com/xml/>. XT is a free implementation of XSL Transformations available at <www.jclark.com/xml/xt.html>.

[8] John Hubbard. “Building a Professional Software Toolkit,” C/C++ Users Journal, May 2001.

[9] Fred Pace. “Modeling Metadata for API Generation,” MDSN, May 1996.

[10] Fred Pace. “Generating Code Using Templates and Metadata,” MSDN, July 1997.

[11] Mark Pollack. “Code Generation Using Javadoc,” JavaWorld, August 2000.

[12] Chris Sells. “Code Generalization and Replication,” <www.develop.com/genx/code_gen_article.htm>.

Cristian Georgescu has more than a decade of experience in writing software for the financial industry, aerospace and defense, and energy management. He is currently developing infrastructure software for a major Wall Street financial company. His present professional focus is in enterprise distributed applications. He can be reached at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.