Join the Evolution!

By Scott W. Ambler, May 01, 2005

Today, many projects involve integrating relational database technologies with an iterative software development process-a task that requires developers and data modelers to overcome a significant cultural impedance mismatch.

Join the Evolution

Modern applications typically use a combination of object technologies such as J2EE or C#, and relational database technologies such as Oracle or MySQL. Because of this, developers and data professionals clearly need to work together, but to do so, they must overcome a significant cultural impedance mismatch. Modern software development processes—including the Rational Unified Process (RUP), Extreme Programming (XP), Scrum and the Dynamic System Development Method (DSDM)—are all evolutionary (iterative and incremental) in nature. These processes are most effectively followed by “generalizing specialists”—people who have one or more specialties, such as Java programming or project management, a general understanding of the entire software lifecycle, and, ideally, an understanding of the business domain, as well. On the other hand, most data-oriented techniques are serial in nature, relying on specialists performing relatively narrow tasks such as logical data modeling or physical data modeling. Therein lies the rub: The two groups must work together, but want to do so in different ways.

Data professionals need to adopt evolutionary techniques similar to those of developers—not the other way around. Craig Larman summarizes the research evidence, as well as the overwhelming support among IT thought leaders, in favor of evolutionary approaches in Agile and Iterative Development: A Manager’s Guide (Addison-Wesley, 2003). Unfortunately, the data community missed the object revolution of the 1990s, which meant they lost the opportunity to learn the evolutionary approaches to development that developers now take for granted. However, data professionals can adapt evolutionary approaches to all aspects of their work.

[click for larger image]
The Karate School’s Initial Doman Model
This is a slim conceptual domain model for a karate school, using UML notation. Note that it illustrates only the main business entities and the relationships among them.

Evolutionary Data Modeling
Last summer I wrote a series of columns (July through Sept. 2004) describing how to take an evolutionary approach to data modeling. In that series, I opined that the best method was to first create a slim conceptual domain model (see “The Karate School’s Initial Domain Model”)that depicts the main business entities and the relationships among them. The amount of detail shown in this example is all that’s needed at a project’s start; your goal is to identify the landscape, trusting that you can fill in the details as you go. Your conceptual model will naturally evolve as your understanding of the domain grows, but the level of detail will remain the same.

Taking an Agile Model Driven Development (AMDD) approach, you then use your conceptual model to guide your physical class and data modeling efforts during development iterations on a just-in-time (JIT) basis. An example of such a model, for the third iteration of the physical data model (PDM), is shown in “The Karate School’s PDM”. Notice how the model doesn’t show a detailed schema for the entire domain; instead, it’s comprised of just enough detail for the currently implemented requirements. To see a six-iteration sample of physical data modeling for the karate school example, complete with changing requirements, visit www.agiledata.org.

[click for larger image]
The Karate School’s PDM
Here’s a more detailed physical data model (PDM) for the karate school system, using UML, after three development iterations.

AMDD offers several advantages:

You minimize waste. A JIT “model storming” approach helps you avoid the inevitable wasted time and effort inherent in serial techniques that occur when requirements change. When you build a detailed model based on the initial requirements, you must then change your design when the requirements change—hence, waste. Investing significant time in up-front design is clearly a risky proposition, particularly when you realize that if you have the skills to do the detailed design up front, you also have the skills to do the same work JIT.

You avoid significant rework. By doing just enough modeling up front to develop the conceptual domain model, you’ll probably avoid any serious rework later in the project. Think back to any project you’ve been involved with. If you’d been able to get several key business stakeholders together in a single room, could you have created a slim, conceptual model that was sufficient to successfully drive your development efforts on that project? Could you have done so within a few hours or, at most, a few days? If you could have done it then, couldn’t you also do it on future projects?

You reduce the overall modeling effort. Why create both a logical data model (LDM) and an analysis class model that effectively cover identical ground when a shared conceptual model will do? We need to work together as a single team, not as two separate entities.

You simplify object/relational (O/R) mapping. O/R efforts are easiest when both your object and data schemas are based on a common source.

Don’t get me wrong—evolutionary data modeling isn’t easy. You must take legacy data constraints into account, and as we all know, legacy data sources are often nasty beasts that can maim an unwary software development project. Luckily, good data professionals understand the ins and outs of their organization’s data sources, and this expertise can be applied on a JIT basis as easily as it could on a serial basis.

Effective data professionals also apply intelligent data modeling conventions, just as Agile Modeling’s Apply Modeling Standards practice suggests. Note the use of the word intelligent. I recently ran into an organization that was still creating column names with a maximum length of 18 characters, because that’s what its mainframe DB2 databases supported. The organization would have been better served by applying full English names for columns in the databases that could handle them—the vast majority—and hobble the usability of only those few mainframe databases still under this constraint.

Database Refactoring
It isn’t sufficient to take an evolutionary approach to data modeling; you must also adopt techniques that enable you to evolve your existing database schema. Just as developers have learned to refactor their object schemas, data professionals must learn to refactor their database schemas. In Refactoring (Addison-Wesley, 1999), Martin Fowler described refactoring as a disciplined way to incorporate small changes to your code to improve its design, making it easier to understand and to modify. Before adding a new feature, ask yourself if the current design is the best one possible to enable you to add that feature. If it is, then do so. If not, refactor your design so that it is, and then add the feature. In this way, you optimize your design, making it very easy to extend as needed.

Refactoring must retain the behavioral semantics of your code, at least from a black-box point of view. For example, say you want to rename the getPersons() operation to getPeople(). To implement this refactoring, you must change the operation definition, which is simple, and then change every single invocation of this operation throughout your application code—a task that’s best done with good tools—and fortunately, modern IDEs all include refactoring tools. A refactoring isn’t complete until your code runs again as before.

Similarly, a database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Your database schema includes both structural aspects such as table and view definitions, and functional aspects such as stored procedures and triggers. Database refactorings are clearly more difficult to implement than code refactorings due to the prevalence of increased coupling. A simple schema change could affect a score of applications that access that portion of the schema. Clearly, you need to be careful.

[click for larger image]
By Any Other Name
Here, I renamed the Fname column to FirstName within the Customer table via the Rename column database refactoring.

Some database refactorings are very easy to implement. For example, to apply the Introduce Default Value database refactoring, simply apply the ALTER TABLE command to define a column’s default value. Naturally, you’d apply this refactoring only if there truly was a common default value applicable to all programs that access the column; otherwise, you could introduce errors in those programs. Similarly, to apply Introduce Index to improve access performance, you simply apply the SQL command CREATE INDEX.

Other database refactorings—particularly those that modify the existing schema structure—can be more difficult to implement due to coupling with external programs that access the database. The secret? Run both schemas in parallel during a transition period long enough to enable the other project teams to update and deploy their applications. In Agile Database Techniques (Wiley, 2003), I originally called this the deprecation period, a common term in the Java community. For example, in “By Any Other Name,” you see how the Rename Column database refactoring is applied to rename Customer.FName as Customer.FirstName. During the transition period, both the old and the new schema are supported; a trigger keeps the two columns synchronized because we must assume that the external programs will update only one of the columns. This trigger and the original column would be removed after June 14, 2006, once you’ve refactored, tested and deployed all of the external programs that access the original column.

To enable both code refactoring and database refactoring, you must:

Have a regression test suite. To safely refactor something, you must be able to verify that you haven’t broken anything, and if you have, you must fix it or roll back the refactoring.
Put your work under configuration management. Sometimes a refactoring proves to be a very bad idea. For example, renaming Customer.FName may prove to break 50 external programs, and the cost to update those programs may be too high.
Have separate work areas. Developers must be able to safely test first before promoting a refactoring into their shared project integration, or even into a preproduction test environment.
Have good tools. A primary challenge for successful database refactoring is a lack of good tools, as I discuss in “A New Vision for Vendors.”

This won’t always be a problem, but it clearly is now. When it comes to database refactoring, it’s the cultural issues, not the technology, that will give you pause. The real challenge lies in traditional data developers’ reticence to adopt new techniques. Every data professional I’ve ever worked with has talked about the need to have high-quality database designs—yet in practice, they’ve never been able to achieve or maintain them. Theoretically, you might be able to get your database design right off the bat, but that rarely happens. Existing database schemas aren’t perfect (and therefore should be improved), and changing requirements demand that database schemas evolve over time. The programming community has experienced significant productivity gains via refactoring, and frankly, so can the data community.

Becoming Agile
Evolutionary database development is a good start, but you can take it one step further. To increase your agility, you should:

Enable developers, data professionals and business stakeholders to work side by side on a daily basis; if people are in separate groups or work areas, you’ve put your project at risk by erecting a barrier to communication.
Be willing to share your skills and learn new skills from others; as everyone becomes more effective in the process, they learn to work together more effectively and require less documentation.
Never work alone: It’s too easy to inject defects and deviate from the team vision. Instead, pair program and model with others.
Actively seek to reduce the feedback cycle: This will improve your ability to find defects and decrease the cost of fixing them. Remember, you should create small, just-in-time models and take a test-driven development approach to development.
Take advantage of enterprise assets and standards in a collaborative manner; data architects and data administrators must act as coaches and mentors instead of enterprise police.

New World Order
The software development landscape has changed, and data professionals must change with it. Although many traditionalists prefer to work in a serial manner, modern techniques have abandoned serial development in favor of an evolutionary approach. This will be a difficult transition for some, but it’s necessary if they’re to become effective IT professionals.

A New Vision for Vendors
Six tool categories agile database developers can’t live without.

To support agile approaches to database development, database tool vendors have their work cut out for them. We need tools that are easy to learn and work with, enabling us to make simple, incremental changes to our database schemas. These tools must be inexpensive enough so that they can be deployed on every development machine. The critical tool categories are:

Databases. Duh! Database vendor licensing strategies need to reflect the fact that we must deploy database instances to every sandbox and have easy ways to promote changes between these instances.
Database testing tools. There’s a wide selection of test data generation tools, which is the good news (www.aptest.com/resources.html#app-data). To actually run tests, tools such as DBUnit and utPLSQL are good starts, although we need comprehensive ways to fully test our database schema.
Evolutionary Extract Transform Load (ETL) tools. ETL tools are critical aspects of your data warehousing strategy, but are rarely deployed to individual desktops so that developers can create scripts to improve the quality of the data that they work with.
Database refactoring tools. We need database refactoring tools to match the plethora of available code refactoring tools. Five years ago, code refactoring was quite difficult, but thanks to the tools, it’s become as common as writing if statements.
Database modeling tools that are integrated into development tools. Anyone working with relational technology should have access to a good data modeling tool—ideally, one with built-in database refactoring features. Data modeling plug-ins for Eclipse, such as DB Visual Architect from Visual Paradigm, are a step in the right direction.
O/R mapping tools and persistence frameworks. Fortunately, Hibernate and many others listed at www.ambysoft.com/persistenceLayer.html are available. Unfortunately, these tools are often stand-alone products that don-t integrate well with other tools. O/R mapping features should be a common part of all modeling tools, and several have started moving in that direction.

—SWA

Critical Resources
The following resources provide a conceptual foundation, as well as the details, behind the concepts described in this article.

Agile Alliance homepage
Agile Data Method homepage
Agile Data Modeling
Agile database techniques
Agile enterprise data administration
Agile enterprise architecture
Agile Model Driven Development (AMDD)
Database refactoring homepage
Evolutionary Database Development
Journal of Conceptual Modeling
“The Joy of Legacy Data”
Object/Relational mapping
Refactoring homepage
UML data modeling profile
My column, The Agile Edge, focused on agile database development: Feb. 2002, June 2002, Aug. 2003, July through Sept. 2004, Oct. 2004, Dec. 2004 and May 2005. Check out the articles at www.sdmagazine.com/articles/.

—SWA

Senior Contributing Editor Scott W. Ambler is author of the Productivity Award–winning Agile Database Techniques (Wiley, 2003).

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Join the Evolution!

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Join the Evolution!

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content