Developers are drawn to new programming techniques if they increase effectiveness. Up until now, modeling has not drawn many developers because they believe that models offer only an awkward form of documentation and are not effective aids in creating working software. In this article, I describe a modeling technique using free tools and some of the ideas from OMG's Model-Driven Architecture (MDA) process. This modeling technique is interesting precisely because it is effective in speeding development.
Code reuse is effective in speeding development. I've found that modeling can facilitate a great deal of reuse in the development of a data warehouse system. Using a Common Warehouse Metamodel (www.cwmforum .org/), short templates describing the required output, and freely available tools, I was able to autogenerate most of the components of the system. The autogenerated code included the database data definition language (DDL), the relational data access objects (DB DAO layer), the online analytical processing code (OLAP DAO layer), the extract/translate/load (ETL) code, and the XML configuration files for the OLAP reporting tool. The model was reused many times and the templates are only a fraction of the size of the final output. I estimate that creating the model and taking advantage of reuse reduced the development time on the project by 30-50 percent.
Even further reuse of the model is possible to provide impact analysis (determining all the source fields contributing to a particular report field, or finding all the report fields affected by a change to a source field); create unit tests; create system documentation; and to create XML schema for verifying input file compliance.
The Technique
This technique is effective for any application that contains a set of steps that are repeated. There are five parts to this technique:
- Create a model of the application.
- Write a miniapplication that implements the first instance of the repeated set of steps.
- Pull apart the miniapplication into template files replacing the named parts in the repeated set of steps with recognizable strings for substitution.
- Write code that can rebuild the miniapplication from the templates and the model.
- Generate all the code for the whole application.
For example, imagine that you want to use this technique to create a table showing the driving distance between pairs of major cities in the United States. The five parts in the technique would be:
- Create a simplified map of the U.S. The map would have two types of objects: major cities and roads. The cities would have names and coordinates. The roads would have distances and two end-point cities. Most of the cities would be directly connected to only a few other cities. The map would be built with the Common Warehouse Metamodel (CWM) so that it is accessible to programs.
- Pick any two cities from the map (perhaps Phoenix and Denver), and write a miniapplication that uses the map to determine the driving distance between the two cities. The algorithm must work for any two cities on the map. There are many potential algorithms you could use, some that produce better results than others. Whichever algorithm you pick, it will be embedded in the miniapplication and cannot be found directly in the map.
- Pull apart the miniapplication into template files where the city names are replaced by recognizable strings for substitution. For example, you may replace "Phoenix" with "ORIGIN_CITY" and "Denver" with "DESTINATION_CITY."
- Write code that can rebuild your miniapplication from the map and the templates. This code-generating code uses the map to put the template files in the right places and to replace the recognizable strings with "Phoenix" and "Denver."
- Extend the code so that it operates on the full set of cities.
You have now completed the entire application.
I also found the technique effective for configuring an OLAP reporting tool like Mondrian (mondrian.sourceforge.net/):
- Create a model of the data warehouse.
- Create Mondrian XML configuration files for part of the first dimension and part of the first fact table.
- Pull apart the configuration files into template files replacing the table names and column names with easily recognizable strings for substitution.
- Write code that can rebuild the original XML configuration files using the model and the templates.
- Generate the XML configuration files for all the dimensions and fact tables.
The output from this technique can be any language or format.