Software Development
Traditionally, data professionals take a relatively serial approach to development: They start by creating a nearly complete domain data model, which they use to create an almost complete physical data model representing the database design. While there's ample opportunity to update the models as a project progresses, it's often a difficult and time-consuming task, because the database schema is usually set early in the project and subsequently remains sacrosanct. This is a convenient assumption for data professionals, as it streamlines their work, but it doesn't reflect the iterative and incremental processes commonly followed by developers. Minimally, data professionals should work in an evolutionary manner; better yet, they should take an agile approach that is both evolutionary and collaborative in nature.
In the first segments of this series, I revealed how to iteratively evolve a data model in the face of new and changed requirements. This month, I'll explore some of the philosophies behind the approach and discuss supporting techniques.
Initial Modeling?
[click for larger image] High-Level Domain Model After the fact, we remember to identify the basic business entities and relationships. |
More importantly, it's interesting to note that the model's structure is slightly different than our physical database schema; in particular, the Person supertype table occurs in the domain model. Had I started with this domain model, I would have introduced a Person table in the first iteration. (Click here to see a reworking of the schemas reflecting this approach.) This wouldn't have made much of a difference during the first five iterations, but it would have helped immensely during iteration six, when the ability to place orders is added to the schema. A portion of "Iteration Six's Physical Data Model, based on last month's original schema, includes the ability for students to place orders. Unfortunately, this isn't a feature our users want—nonstudents as well as students can place orders; so if, for example, Sarah's mother orders some sparring gear for her birthday, Sarah could find out and spoil the surprise. To avert this problem, we really need the schema depicted in "Interation Six: The Refactored Physical Data Model. Not only was the Person table introduced, but we also had to rename the StudentPOID columns in several tables to PersonPOID; keys are a primary source of coupling within relational schemas, so whenever a refactoring affects a key, there's a good chance that the change will spread to several tables.
[click for larger image] Iteration Six's Physical Data Model After the fact, we remember to identify the basic business entities and relationships. |
Adopt New Data Techniques
So what techniques let you work in an evolutionary manner? First, the principles and practices of Agile Modeling (AM) can guide your data modeling efforts: Model in small increments, then prove that models work with actual software. Understand that there's far more to development than data modeling. Learn how to create multiple models and apply the right artifact. Model with others to share information and skills, and support stakeholder participation, because they have the best domain information. Database refactoring is critical; applying small changes to your database schema improves its design without changing its semantics. Whenever you need to support a new requirement, first ask, "Is the existing design the best one to support this requirement? If the answer is yes, add the new functionality. If not, refactor your design to make it the best possible, and then proceed. For example, in iteration five, we needed to add the ability to support multiple martial arts. Looking at the iteration four data model, we discovered that there was a one-to-many association between students and belts, but what we now needed was a many-to-many association. So we introduced the StudentBelt table via the Replace One-To-Many With Associative Table refactoring. In this way, the system always has the highest quality design that supports the highest-priority requirements to date.
To safely refactor, you must be able to test your system to ensure you haven't broken anything. You need a full regression test suite for your system, including your database. Just as Java programmers use tools such as JUnit, data professionals must adopt tools such as DBUnit.
To back out of database schema changes, you need a strong configuration management (CM) strategy for your data-oriented assets, including data-definition language code, which defines your database schema; data manipulation language code, which accesses the data within your database; reference data; test data; and testing code. Even if you're not agile, you'd still want to adopt effective regression testing and CM strategies.
Yes, It's Hard; Get Over It
[click for larger image] Iteration Six: The Refactored Physical Data Model Now we introduce the Person table and rename the StudentPOID columns in several tables to PersonPOID. |
First, take a realistic approach to deployment. People often lament that it isn't possible to deploy systems into production on a weekly basis, and for many organizations, that's true. Just because you release your system internally on a weekly basis doesn't mean that you show it to the rest of the world. I create builds in my own workspace several times a day, within my team workspace at least daily, into the pre-production test environment at the end of each iteration, and into production between quarterly and yearly.
Second, get better at data migration. When you make a structural database refactoring, you'll need to migrate the data from the original column into the new columns. Data migration is hard, particularly in large databases. However, refactoring source code safely was hard to do just a few years ago, but it's easy today because we've accepted the idea, discovered effective techniques and built tools.
Third, find new ways to deal with fundamental data management issues. People often say that this stuff won't work in Fortune 100 firms because they need bureaucratic processes to ensure data consistency, to ensure standards are followed, to address data architecture issues and so on. Those issues are important, but you don't need bureaucracy. Data professionals can collaborate with development teams; they can mentor people in data management issues; and they can ensure that enterprise-scope issues are taken into account. The true challenge? For many data groups, this is a major departure from their existing rigid and specialized organizational structure.
A Call to Action
I have yet to hear a coherent reason why an organization can't take an agile approach to data modeling. It's time for the data community to step up and adopt agile ways of working.
New Rules for Data Professionals
—SA |
Scott Ambler is author of the Productivity Award-winning Agile Database Techniques (Wiley, 2003).