Database

Relational Databases 101

By William R. Vaughn with Peter Blackburn, October 20, 2006

Understanding what relational databases are all about is the first step to effectively using them.

Many of my readers come from backgrounds that don't include formal training on the best ways to design and create efficient, business-class relational databases. If you arrive here with Microsoft Access or FoxPro experience, you're at an advantage—you know that, for the most part, the process of creating a database is hidden from you by the application's IDE—you just use drag-and-drop or use wizards to build the databases and tables you want. That's not at all bad, but without an in-depth understanding of how to best create, tune, and protect a relational database, I suspect that the relational "normalcy," relational integrity constraints, performance, and scalability of the result might not be particularly stellar. And, more importantly, the data might not be particularly secure. By "normalcy," I mean how well the database conforms to the recognized standards of relational design where database designers attempt to "normalize" databases to at least the third level. If you're not sure how to do this, or even what this means, have no fear—I'll explain this later. The SQL gurus with whom I work (like Peter Blackburn, Kimberly Tripp, and a litany of others) are convinced that more problems can be solved by efficient database design than by the cleverest, best-written application front-end.

IMHO - It's not how fast you ask the question—it's how long it takes to find the answer that gates performance.

Getting Started with Solid Database Design

The Microsoft Books Online (BOL) documentation seems to fall a bit short in this important subject, so this chapter might be helpful for those who need a more complete understanding of how to create a best-practice relational database. The problem faced by any database designer is knowing what's going to be stored ahead of time—before the first table is created. That's always been (and always will be) a problem. As I've said before, a customer rarely knows what they want until they don't get it.

To get started on the right foot, I recommend a good course in relational theory like Extended Relational Analysis. This can do a world of good—but its depth is well beyond the scope of this book. In courses like this, you learn how to ask the right questions for each "entity" you expect to store in the database.

I also think that using a (big) whiteboard to lay out the database with your team (or customer) can help visualize the data. Getting everyone who is going to consume the data is essential. How I design a database for a single-focused project is very different than the way it's designed for projects where a small multitude of groups expect to consume the data. Admittedly, database development by committee is tough, and one should try to avoid those situations, but leaving town might not be an option.

Before we get into the academics of normalization, let's spend a few moments in quiet contemplation and focus on a few guiding principles. As you design your database, you should keep these basic tenants in mind:

Each table needs a unique identifier. That is, you need to choose one or more columns to permit SQL Server to find specific rows to return or update without including other irrelevant rows. In SQL Server, this typically means each table should have an "ID" column (typically cast as an Identity integer) that gives the row a unique (SQL Server–generated) value. You should define this column using the Primary Key (PK) constraint (as discussed in Chapter 2, "How Does SQL Server Work?"). For example Au_ID is the unique identifier for the Authors table in the sample Biblio database.
Each table should store only one kind of information and not repeat information in multiple columns. As I discuss next, this is where normalization comes in. SQL Server is tuned to work with small, tight rows that contain relatively few columns. If you find your table has more than a dozen columns, you're treading off the boards and into the swamp. Remember, the largest row you can define is only 8K (not counting BLOB columns).
Microsoft feels that you should avoid columns that can be set to NULL—I'm not so sure. That is, they feel that you should avoid columns where you might not have access to the data at all and cannot (logically) assign an arbitrary default. Each time you define a column as permitting NULL (making it "nullable"), SQL Server incurs extra overhead. It makes sense to move these columns to another table and cross-reference them to the table's PK as long as the database complexity does not get out of hand.
Each column needs to be defined both with the content in mind and with the constraints needed to keep it pure. It's not enough to type a column as integer and hope that the data entered therein is going to be pure. All columns, especially numeric columns, need to have (at least) CHECK constraints defined, if not TSQL rules. Columns should be defined to hold what's expected to be saved—and no more. Needlessly bloating data type capacity simply chews up memory to no good purpose. If you have columns that contain text, but that text is never expressed in Unicode, don't use a Unicode type. If you have a date column but don't need to store time with 3.33-millisecond accuracy, use smalldatetime instead of datetime. You get the idea. In this case, less is more—more space saved, more memory available to store other pertinent stuff. I'll show you how much each column costs in memory near the end of this chapter.

IMHO - Understand that all data is evil until proven innocent—it's not in the U.S. Constitution, but given the state of the current Congress, it just might get there.

Start thinking about a concurrency strategy from the beginning. Determine how data is to be shared (if at all). Consider that many "single-user" databases are doomed to failure once they are "converted" to multi-user. Think about the mechanism you're planning to use to determine if a row has changed once it's fetched. For example, you might (perhaps ought) to use a "Rowversion" or "TimeStamp" column to help track access—it can make Visual Studio's job a lot easier (and yours, too) when it comes time to write action commands to change the database.
Avoid BLOBs in the database. I have been suggesting this for over a decade, and those who have listened have been able to build a smaller, faster, and simpler database. If you have BLOBs, store them in files (or on RO media) and use the database to store the path.
Finally, and perhaps most importantly, for less-experienced developers, I think that you should strive to keep your database simple—simple to understand, support, and maintain. Excessive complexity is the bane of many a mature and amateur database designer.

1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Relational Databases 101

Getting Started with Solid Database Design

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Database

Relational Databases 101

Getting Started with Solid Database Design

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content