Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Relational Databases 101


Many of my readers come from backgrounds that don't include formal training on the best ways to design and create efficient, business-class relational databases. If you arrive here with Microsoft Access or FoxPro experience, you're at an advantage—you know that, for the most part, the process of creating a database is hidden from you by the application's IDE—you just use drag-and-drop or use wizards to build the databases and tables you want. That's not at all bad, but without an in-depth understanding of how to best create, tune, and protect a relational database, I suspect that the relational "normalcy," relational integrity constraints, performance, and scalability of the result might not be particularly stellar. And, more importantly, the data might not be particularly secure. By "normalcy," I mean how well the database conforms to the recognized standards of relational design where database designers attempt to "normalize" databases to at least the third level. If you're not sure how to do this, or even what this means, have no fear—I'll explain this later. The SQL gurus with whom I work (like Peter Blackburn, Kimberly Tripp, and a litany of others) are convinced that more problems can be solved by efficient database design than by the cleverest, best-written application front-end.


IMHO - It's not how fast you ask the question—it's how long it takes to find the answer that gates performance.


Getting Started with Solid Database Design

The Microsoft Books Online (BOL) documentation seems to fall a bit short in this important subject, so this chapter might be helpful for those who need a more complete understanding of how to create a best-practice relational database. The problem faced by any database designer is knowing what's going to be stored ahead of time—before the first table is created. That's always been (and always will be) a problem. As I've said before, a customer rarely knows what they want until they don't get it.

To get started on the right foot, I recommend a good course in relational theory like Extended Relational Analysis. This can do a world of good—but its depth is well beyond the scope of this book. In courses like this, you learn how to ask the right questions for each "entity" you expect to store in the database.

I also think that using a (big) whiteboard to lay out the database with your team (or customer) can help visualize the data. Getting everyone who is going to consume the data is essential. How I design a database for a single-focused project is very different than the way it's designed for projects where a small multitude of groups expect to consume the data. Admittedly, database development by committee is tough, and one should try to avoid those situations, but leaving town might not be an option.

Before we get into the academics of normalization, let's spend a few moments in quiet contemplation and focus on a few guiding principles. As you design your database, you should keep these basic tenants in mind:

  • Each table needs a unique identifier. That is, you need to choose one or more columns to permit SQL Server to find specific rows to return or update without including other irrelevant rows. In SQL Server, this typically means each table should have an "ID" column (typically cast as an Identity integer) that gives the row a unique (SQL Server–generated) value. You should define this column using the Primary Key (PK) constraint (as discussed in Chapter 2, "How Does SQL Server Work?"). For example Au_ID is the unique identifier for the Authors table in the sample Biblio database.
  • Each table should store only one kind of information and not repeat information in multiple columns. As I discuss next, this is where normalization comes in. SQL Server is tuned to work with small, tight rows that contain relatively few columns. If you find your table has more than a dozen columns, you're treading off the boards and into the swamp. Remember, the largest row you can define is only 8K (not counting BLOB columns).
  • Microsoft feels that you should avoid columns that can be set to NULL—I'm not so sure. That is, they feel that you should avoid columns where you might not have access to the data at all and cannot (logically) assign an arbitrary default. Each time you define a column as permitting NULL (making it "nullable"), SQL Server incurs extra overhead. It makes sense to move these columns to another table and cross-reference them to the table's PK as long as the database complexity does not get out of hand.
  • Each column needs to be defined both with the content in mind and with the constraints needed to keep it pure. It's not enough to type a column as integer and hope that the data entered therein is going to be pure. All columns, especially numeric columns, need to have (at least) CHECK constraints defined, if not TSQL rules. Columns should be defined to hold what's expected to be saved—and no more. Needlessly bloating data type capacity simply chews up memory to no good purpose. If you have columns that contain text, but that text is never expressed in Unicode, don't use a Unicode type. If you have a date column but don't need to store time with 3.33-millisecond accuracy, use smalldatetime instead of datetime. You get the idea. In this case, less is more—more space saved, more memory available to store other pertinent stuff. I'll show you how much each column costs in memory near the end of this chapter.

IMHO - Understand that all data is evil until proven innocent—it's not in the U.S. Constitution, but given the state of the current Congress, it just might get there.


  • Start thinking about a concurrency strategy from the beginning. Determine how data is to be shared (if at all). Consider that many "single-user" databases are doomed to failure once they are "converted" to multi-user. Think about the mechanism you're planning to use to determine if a row has changed once it's fetched. For example, you might (perhaps ought) to use a "Rowversion" or "TimeStamp" column to help track access—it can make Visual Studio's job a lot easier (and yours, too) when it comes time to write action commands to change the database.
  • Avoid BLOBs in the database. I have been suggesting this for over a decade, and those who have listened have been able to build a smaller, faster, and simpler database. If you have BLOBs, store them in files (or on RO media) and use the database to store the path.
  • Finally, and perhaps most importantly, for less-experienced developers, I think that you should strive to keep your database simple—simple to understand, support, and maintain. Excessive complexity is the bane of many a mature and amateur database designer.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.