Dr. Dobb's | Generative and Transformational Techniques in Software Engineering

Generative and Transformational Techniques in Software Engineering

Generative and transformational techniques have evolved from being very specific code generators to full fledged generic tools

June 02, 2006
URL:http://drdobbs.com/architecture-and-design/generative-and-transformational-techniqu/188701148

James Litsios is CTO and head of development at Actant. He can be contacted at [email protected].

Generative and transformational techniques (GTT) have been around in the form of tools like Lex and Yacc for more than 30 years. These techniques are all about taking source- or model-level representation, and performing analysis, transformation, and potentially generating new sources or models.

Having now reached a level of maturity, GTT has become the "swiss army knife" of software development. More specifically GTT fills the need for programmable development tools that can deal with unique business or technical domain content. As such its use spreads from adding new features to mainstream languages, process sources in older languages or in new domain-specific languages (for which no compiler exists yet), to providing the glue to blend together sources and models defined at different levels, and effectively be a universal solution to model-driven development. What all these usages have in common is that they fulfill the need to have development systems and tools that work on making and managing transformations of source and models in a fully consistent manner.

To many developers, this is a new approach to software development because it gives up relying on a "single representation" for systems, such as source code. Instead, it focuses on describing systems with multiple representations and the transformations between these representations. For some industries, say game development, transformation-based methodology is now mainstream. Who would think of writing a game purely as source code? Instead, games are built from multiple inputs such as level layouts, character AI, physical and illumination models, then brought together with the help of proprietary and third-party tools.

Transformation-based approaches have not been mainstream because of the limited availability of off-the-shelf transformation tools (parser generators and compilers, for instance) and the difficulty of writing them. Now, new transformation tools are being introduced that are both powerful and flexible, making them real contenders as building blocks for any large development. These tools, their use, and some underlying theory were the subject of Generative and Transformational Techniques in Software Engineering (GTTSE2005).

Sessions covered many of the important usage trends of transformation technology, including:

Domain-specific languages.
Aspect-oriented programming.
The use of features, reflection, modeling in case tools.
The mapping and transformation of XML and databases systems.
Source-to-source maintenance work.

It is worth mentioning that it is hard to do justice to the quality of the work that was presented at GTTSE2005. The executive committee Ralf Lämmel (Microsoft), and Joãco Saraiva Braga and Joost Visser (both of the Univeristy do Minho, Braga, Portugal) did an excellent job in leading the selection of presentations.

Models, Code Generation, and Transformations

Jean Bezivin (INRIA, LINA, University of Nantes) introduced GTT, covering both its history and underlying theory. Of everything he said, I was most impressed when he welcomed us to world of software engineering where "everything is a model"! He reminded us of the '80s, when everyone was claiming that "everything is an object" and how later, help was needed from non-OO concepts like UML, use cases, and patterns. Alas, not everything could be an object. The solution to these "failures" of the object model is now to accept everything as a model where a model is understood as a representation of a system in a certain context. This may sound like escapism into grand ideas but it turns out to be quite workable. It addresses, for example, that many of these "non-object" concepts like use cases and patterns have often lacked much "rigueur" and were often presented as recipes from a cookbook. More importantly, it is now accepted that no single "type" of model is good enough, that many different types of models are necessary to represent different aspects of a same system, and that as much emphasis needs to be put on bridging these models together as focusing on each one.

Bezivin called the system/context pair defining a model "the technical space". He also managed to present 20 years of software engineering history in three hours! My best summary of this is that:

Standards and tools frameworks for model-driven engineering exist from all the big players and academia--Eclipse and Bezivin's Atlas system come to mind.
Models can be more or less formal (mathematical, for instance) and people have a hard time agreeing on how much so.
Models of models are called "meta-models" and are useful to bring in many models into the same framework and to manage them.
Models can be transformed from one to another and that you talk about vertical transformations when you map models between different abstraction layers (for example, requirements, design, PIM, PSM, Java)and horizontal when you stay with the abstraction level such as UML, OOD, refactoring or source-code transformation.

Transformations In Functional and Database Contexts

Two tutorials reflected the wide range of application of transformation techniques: Zhenjiang Hu (University of Tokyo) focused on deterministic transformations from one algorithmic representation of a functional program to another, and Jean-Luc Hainault (University of Namur) showed jhow transformation tie all the levels a database schema together and how to take advantage of this.

Zhenjiang described how to write both clear and efficient programs. He proposes to write clear programs not bothering about efficiency, then to transform them automatically to make them more efficient. This is not low-level compiler optimization but high-level transformations where lists or trees can be merged with the corresponding transformations to their corresponding functions. An existing way to perform this optimization is to expand function calls and propagate known facts (unfolding), while searching for expressions that can be converted back into function calls (folding). This so-called "fold/unfold" method is somewhat heuristic as there is no easy way to decide what can or cannot be folded. Zhenjiang argued that a better way to transform the code is through program calculation where no guessing is necessary--"you just need to solve equations!".

There is a catch. The programs must be written using homomorphisms (a restricted form of recursion) and promotions (transformation properties of the homomorphisms) need to be defined. He then used his method to easily transform an inefficient max function built by selecting the first result of a sort function into an efficient program. Going a bit deeper he showed how different loops can be merged together in a formal manner (loop fusion) and how parallel execution also fits into program calculation. If you have ever delved some depth into functional programming you will know how tricky it is to get anywhere near this type of result. These technics are definitely impressive and supported as part of Haskell tool he has written.

Jean-Luc Hainault's tutorial was more pragmatic. He has led a team that has developed a "universal" database schema model. He has taken many different existing schema models (for example, conceptual, logical, physical, product specific, SQL, COBOL, relational, UML, and so on) and brought them all into a single model. He can then apply 40 or so primitive operations which lets him convert pretty much any type of schema to another. This is really impressive, but gets even better because many of these transformations can be run backwards--allowing reengineering from a deeper schema level to a higher one. With the addition of a COBOL analysis tool he showed us how he could make sense out of 30-year old databases. He did mention that for these types of refactoring some human help is needed. The work is available as part of the DB-MAIN tool. Anthony Cleve, a member of Jean-Luc's team, later showed us how queries could be transformed to automatically reflect schema transformations, including both COBOL and SQL sources.

Reflections and Aspects

Shigeru Chiba (Tokyo Institute of Technology) focused his tutorial on mainstream transformation usages such with the use of reflection and aspects.

Reflection can be seen as dynamic transformation allowing the application to change its behavior during runtime by intercepting calls and potentially augmenting the data structures. He has written several reflective systems including and more recently openC++, openJava, and Javassist (part of the JBOSS project). He conceded that reflection was hard to use and explained how Aspect-Oriented Programming (AOP) is a safer way to change behaviors. AOP let an advice be introduced around a point cut. The advice can include code to be called before and after and new storage needs. The point cut identifies the parts of the code that will be affected by the advice. Shigeru then came back to the difficulties of using reflection and proposed the use of dynamic aspects to resolve these where new aspects can be brought in during execution. This is particularly interesting for systems that can have no downtime. He mentioned the current effort by many to separate the definition of the advice from the point cut, a concept that he has built into GluonJ, his latest system. Finally, he dismissed criticisms on the limited scope of AOP by pointing out that he preferred to push a working solution forwards than to refine a non-working one!

The Men from Austin, Texas

I have two interests in transformation systems:

To capture complex requirements and move these into code with minimal risk.
To find a way to semi-automatically refactor large C++ applications.

I had identified two tutorials, each addressing one of these concerns directly. And by coincidence, both came from Austin, Texas. Don Batory presented his work on Feature-Oriented Programming (FOP), and Ira D. Baxter on DMS system, a commercial transformation and refactoring system.

Batory's work with features is outstanding because it is both broad in scope but applicable to real-life problems. This makes it look very usable, although there's still room for growth. The principal idea is to build a system (of potentially many programs) by mixing features together. He describes a feature as something that adds characteristics; later bringing in a more formal functional framework where features have their algebra, can be encapsulated, and can be composed together. Once all features for a system have been described, optimization transforms them into something usable. He showed how features described in a domain-specific language (DSL) can be mapped to propositional formula (a logic system). System-wide constraints are added as equations and a logic solver or optimizer can then be used to find system implementation that matches the features. This equational approach also allows mismatches in features (for instance, trying to iterate backwards on a list) to be detected by the system as overconstraints. FOP is implemented in the AHEAD system developed by his group.

Baxter presented DMS, a system for practical software maintenance. He reminded us that all software development has a life cycle, and at some point maintenance costs explode. This is where his company saves the day with a powerful transformation system. Different transformations are used for different problems such as migrating to another language, or to another system, restructuring an application, optimizing it, or performing a documentation extraction. The list goes on because the transformations are fully programmable with a lexer, parser, rule-based, and procedural rewrite system, attribute grammar system, and pretty printer. An attribute grammar is what you could use to perform cross-procedure usage or dataflow analysis to understand how one part of your program is affected by another.

One example of DMS's use involved porting 6000 C++ components from a legacy system to CORBA/RT in 15 months. Interestingly, Ira emphasized that the solution process that he proposes is iterative and yet also fits into a waterfall type process: The transformations are run over and over until "they got it right". The twist is that the original source application can continue to evolve until the rewrite team is ready, at which point the transformation is applied and the whole team can migrates to the new source basis. Listening to him, a vision takes shape where source code can be like putty and continuously reshaped by transformation rules. Finally Ira mentioned that other commercial transformation systems (JANUS, Refine5) exist as well.

Hands-on Generative and Transformational Techniques

The use and building of transformation technology were addressed through two tutorials and technical presentations. Tom Mens (University of Mons-Hainaut) talked about graph transformation and Erik Meijer (Microsoft) presented a strong case for focusing on workable solutions.

Model management is typically done interactively, such as with a UML editor. We were lucky to have Tom Mens to talk about graph transformation (GT) in the context of model refactoring. Many models are represented as graphs and GT is all about changing the graph in order to change the model. He introduced the use of GT in the AGG and Fujaba GT tools (I've included a reference to AGG in Table 1). Mens explained how graph theory could be used to show the potential impact of a transformation but also to help identify transformation compatibility and incompatibilities to allow differential change management and collaborative work on models.

Table 1

Staying at a practical level, Erik Meijer focused on workable solutions. Meijer has contributed to the development of Haskell and is a member of Microsoft's Visual Basic team where he tackles "real problems". He presented two such problems--mapping objects to XML-XSD and mapping objects to relational DBs. He presented these in the context of transformations techniques. We learned that:

XSD XML schemas have many details and extra care must be taken to interpret them.
That to a database person two classes have the same types if they share the same sets of all possible values.
To beware of NULLABLE elements in comparisons.

And the list goes on and on. We also learned about Cw, a research language based on C# that integrates database like operations and enumerable types (streams). Later, he showed how much can be built with just a few type transformations such as renaming a field, inlining a type, or merging two types. Finally he ended his tutorial by making the case for giving up the use of languages with only static typing. He argued that dynamic types can be more expressive and solve many problems better. As a long time C++ developer, I will admit that I thought this was heresy but as I write this I admit that I am intrigued.

Table 2

The technical presentation demonstrated both commercial and academic systems. I did not manage to go to all the presentations (they ran in parallel and I was often on the phone with work) so in all fairness I include a reference to each one.

I will add that Mark van den Brand's ASF+SDF tool (Table 1) is the heavyweight reference among academic tools and that I have heard much good of the TXL system. I did see the MetaBorg presentation by Martin Bravenboer and it is extremely impressive, allowing a DSL to be integrated into an existing language framework (such as Java) with full type checking with a minimal amount of definitions.

Participant's Workshop

The participants workshop had many excellent presentations. I present here a succinct overview:

Many in the audience knew that SystemC was used to specify chip systems, Patrizia Scandurra showed us how it could be generated from a UML model producing a low cost, system on chip, design environment.
Regirio Paula presented modeling solutions to manage the system-level software of power grids.
If everything is a model then you need some serious tools to manage them: Artur Boronat presented a model management system (MOMENT) based on an equation solver (MAUDE) and integrated into Eclipse. We all dream of generating highly dynamic and efficient web pages from compact high-level descriptions with good separation of data and view.
Davide Di Ruscio presented such a system and also showed how he uses abstract state machines to make successive transformations from his initially high-level model to bring it down to an automated generation of J2EE components.
Aline Lzcia Baroni presented a system that extracts metrics from UML behavioral models.
Tero Hasu showed how he is making Python available on the Sybian OS by automatically wrapping the systems API into Python compatible stubs.
Holger Krahn showed how to use transformations to take prototypal java classes and automatically build templates from them to be then used in a production work.
Bram Adams showed us how he intercepts the intermediate language of GCC4.0 (GENERIC) to weave in aspects. GENERIC is common to the GNU compiler suit so this trick no only work across all the supported languages but also allows aspects written in one language to be woven into another language!
David Benevides showed how constraint programming can be used to automate the management of feature models.
Christof Mosler presented a telecommunication reengineering system (E-CARES) based on a graph rewriting system.
Vadim Zaytsev provide a somewhat humorous account of the difficulties of finding an official description of the C# grammar for use in the context of transformation work.
Nicolas Juillerat argued for the use of a Lisp-like intermediate language (FOOD) for automated refactoring.
Anthony Cleve addressed the problem of automating code conversion when changing a DB schema.
Finally, F. Javier Piriez Garcma addressed the issue of elaborating and managing transformation plans.

Conclusion

Generative and transformational techniques have evolved from being very specific code generators to full fledged generic tools, applicable to any software domain. Their theoretical grounding is strong, for example, in compiler development and functional programming. It is important to note that major software development platforms such as Microsoft's Visual Studio and Eclipse have integrated these techniques into their workflows.

Very powerful tools are available; some of the major ones were presented at GTTSE2005. What was made clear is that GTT can radically change the way development is done and therefore developers need to revisit their current understanding of not only development techniques but also development economics! Key to GTT is that they provide leverage: It is no longer the code that is static but the transformations that build the code, therefore the code can change completely with little effort.

GTTSE2005 was a resounding success and that it showed that generative and transformational techniques have achieved a maturity that let them justify a place in mainstream development. Don't miss GTTSE in 2007.