The New Rules

By Scott Ambler, September 01, 2004

The New Rules

Software Development

Read Parts 1 & 2

Traditionally, data professionals take a relatively serial approach to development: They start by creating a nearly complete domain data model, which they use to create an almost complete physical data model representing the database design. While there's ample opportunity to update the models as a project progresses, it's often a difficult and time-consuming task, because the database schema is usually set early in the project and subsequently remains sacrosanct. This is a convenient assumption for data professionals, as it streamlines their work, but it doesn't reflect the iterative and incremental processes commonly followed by developers. Minimally, data professionals should work in an evolutionary manner; better yet, they should take an agile approach that is both evolutionary and collaborative in nature.

In the first segments of this series, I revealed how to iteratively evolve a data model in the face of new and changed requirements. This month, I'll explore some of the philosophies behind the approach and discuss supporting techniques.

Initial Modeling?

[click for larger image]
High-Level Domain Model
After the fact, we remember to identify the basic business entities and relationships.

In any project, a critical question to ask is whether you could have done a better job by following another approach. In the case of our karate school management system, some aspects could have been improved. One thing that I didn't do is start with a "High-Level Domain Model, which identifies the basic business entities and the relationships between them, using UML notation. This model would have been created early in the project as part of our initial modeling efforts, and would then have been used to guide our detailed class and data modeling efforts during each iteration, potentially reducing wasted work.

More importantly, it's interesting to note that the model's structure is slightly different than our physical database schema; in particular, the Person supertype table occurs in the domain model. Had I started with this domain model, I would have introduced a Person table in the first iteration. (Click here to see a reworking of the schemas reflecting this approach.) This wouldn't have made much of a difference during the first five iterations, but it would have helped immensely during iteration six, when the ability to place orders is added to the schema. A portion of "Iteration Six's Physical Data Model, based on last month's original schema, includes the ability for students to place orders. Unfortunately, this isn't a feature our users want—nonstudents as well as students can place orders; so if, for example, Sarah's mother orders some sparring gear for her birthday, Sarah could find out and spoil the surprise. To avert this problem, we really need the schema depicted in "Interation Six: The Refactored Physical Data Model. Not only was the Person table introduced, but we also had to rename the StudentPOID columns in several tables to PersonPOID; keys are a primary source of coupling within relational schemas, so whenever a refactoring affects a key, there's a good chance that the change will spread to several tables.

[click for larger image]
Iteration Six's Physical Data Model
After the fact, we remember to identify the basic business entities and relationships.

The high-level domain model is just good enough to get us going. With a serial approach, we would have filled in the details, effort that clearly would have been wasted when the requirements changed. For example, this model shows tournaments, yet our users decided against this option. Furthermore, any work on the low-priority functionality of selling products would have postponed delivery of higher-priority functionality that supports the dojo's prime business: selling and managing memberships. Worse yet, when we discovered that the requirements changed, we may have even tried to motivate the users to not make those changes in order to protect the work we'd already done.

Adopt New Data Techniques

So what techniques let you work in an evolutionary manner? First, the principles and practices of Agile Modeling (AM) can guide your data modeling efforts: Model in small increments, then prove that models work with actual software. Understand that there's far more to development than data modeling. Learn how to create multiple models and apply the right artifact. Model with others to share information and skills, and support stakeholder participation, because they have the best domain information. Database refactoring is critical; applying small changes to your database schema improves its design without changing its semantics. Whenever you need to support a new requirement, first ask, "Is the existing design the best one to support this requirement? If the answer is yes, add the new functionality. If not, refactor your design to make it the best possible, and then proceed. For example, in iteration five, we needed to add the ability to support multiple martial arts. Looking at the iteration four data model, we discovered that there was a one-to-many association between students and belts, but what we now needed was a many-to-many association. So we introduced the StudentBelt table via the Replace One-To-Many With Associative Table refactoring. In this way, the system always has the highest quality design that supports the highest-priority requirements to date.

To safely refactor, you must be able to test your system to ensure you haven't broken anything. You need a full regression test suite for your system, including your database. Just as Java programmers use tools such as JUnit, data professionals must adopt tools such as DBUnit.

To back out of database schema changes, you need a strong configuration management (CM) strategy for your data-oriented assets, including data-definition language code, which defines your database schema; data manipulation language code, which accesses the data within your database; reference data; test data; and testing code. Even if you're not agile, you'd still want to adopt effective regression testing and CM strategies.

Yes, It's Hard; Get Over It

[click for larger image]
Iteration Six: The Refactored Physical Data Model
Now we introduce the Person table and rename the StudentPOID columns in several tables to PersonPOID.

Software developers struggled to adopt evolutionary approaches, and then later agile approaches, but they succeeded. Data professionals can do the same.

First, take a realistic approach to deployment. People often lament that it isn't possible to deploy systems into production on a weekly basis, and for many organizations, that's true. Just because you release your system internally on a weekly basis doesn't mean that you show it to the rest of the world. I create builds in my own workspace several times a day, within my team workspace at least daily, into the pre-production test environment at the end of each iteration, and into production between quarterly and yearly.

Second, get better at data migration. When you make a structural database refactoring, you'll need to migrate the data from the original column into the new columns. Data migration is hard, particularly in large databases. However, refactoring source code safely was hard to do just a few years ago, but it's easy today because we've accepted the idea, discovered effective techniques and built tools.

Third, find new ways to deal with fundamental data management issues. People often say that this stuff won't work in Fortune 100 firms because they need bureaucratic processes to ensure data consistency, to ensure standards are followed, to address data architecture issues and so on. Those issues are important, but you don't need bureaucracy. Data professionals can collaborate with development teams; they can mentor people in data management issues; and they can ensure that enterprise-scope issues are taken into account. The true challenge? For many data groups, this is a major departure from their existing rigid and specialized organizational structure.

A Call to Action

I have yet to hear a coherent reason why an organization can't take an agile approach to data modeling. It's time for the data community to step up and adopt agile ways of working.

New Rules for Data Professionals
Agile data modeling requires a significant mind shift.

Take an evolutionary approach. Most modern development processes are evolutionary in nature, so you must do the same if you want to be part of the team. A detailed discussion of an evolutionary approach on a RUP project can be found at www.agiledata.org/essays/rup.html.
Become agile. Working in an evolutionary manner is the first step; doing so in a cooperative and collaborative manner is the next.
Build just-good-enough models. To paraphrase Kent Beck, agile data modelers know that they can solve today's problem today and trust that they can solve tomorrow's problem tomorrow.
Look beyond data. With an agile approach, you'll quickly discover that your data skills aren't needed all the time; you'll have to roll up your sleeves and help with non-data activities. Becoming a generalizing specialist (The Agile Edge, Jan. 2003) is crucial.

—SA

Scott Ambler is author of the Productivity Award-winning Agile Database Techniques (Wiley, 2003).

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

The New Rules

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

The New Rules

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content