Site Archive (Complete)
Database Blog: Assessing Production Data Quality
DATABASE
EXCEPTION::QUERY

A Blog About Database Products and Technology.

by Kevin Carlson
SELECT * FROM [Musings]

Database matters.

by Niklas Hemdal
August 08, 2007

Assessing Production Data Quality

In a recent Agile Newsletter, Scott Ambler highlighted the need to validate data quality via testing (Questioning Traditional Data Management). At a minimum, he proposed regression testing things like column domain value rules, column default value rules, value existence rules, row value rules, and size rules to help ensure data quality. Scott also points out that constraints setup to prohibit data quality errors are easily dropped or reworked.

Existing constraints are easily checked with commercial schema comparison tools and a reference database; however, the quality of the data in a production system is a different story. Keep in mind that data quality issues can incubate over time in the rush to quickly deploy new features when fundamental design and modeling practices are skirted, or (shudder) when a Production DBA relaxes constraints to merge in some new data.

I started out with the question of how can I quickly assess data quality in 300 of my production Microsoft SQL Server databases, at any time, using some of Scott’s regression testing criteria? I wanted these assessment tests to be close to the database itself, a simple table with a few stored procedures. I started out simple for this post by testing column default values. The first thing I needed was a Stored Procedure, ExecLiteral, that would automatically execute intermediate results similar to what sp_execresultset did in Microsoft SQL Server 2000. Next I needed to generate the reference xml data required to validate column default values and column value existence for future tests in other databases. The idea was a simple stored procedure that used the INFORMATION_SCHEMA.COLUMNS view and FOR XML EXPLICIT to generate the xml that I needed to validate the column defaults, and test the existing column data for value existence. In future posts, we may evolve this to include actual default value INSERT tests. The Stored Procedure that identifies missing defaults uses the new T-SQL EXCEPT operator to check for missing default schema values. Then ExecLiteral is invoked to test for the existence of a value in a column with a specified default.

Let’s see where this takes us. You can read the source listing in html, and text.

Posted at 01:25 AM  Permalink




 
INFO-LINK


Related Sites: DotNetJunkies, SD Expo, SqlJunkies