Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

A Conversation with Jim Gray


DDJ: So let's go back. You are looking at hierarchical systems. You are also working on a team of developing relational databases.

JG: So I am working with a bunch of people. It is interesting. The main part of IBM and, in fact, the main part of the computer industry is going down the network hierarchical path. Their products, with millions of dollars in those days, would be considered billions today being made on these products. Here are a bunch of guys in research saying, "This is the wrong way to do things. The right way to do things is to take a relational approach". Of course the product guys are saying to us, "Why don't you figure out how to fix our products rather than come up with a brand new product. Why are you stabbing us in the back and saying that our products are no good".

DDJ: Can you explain why you use the term "network hierarchical"?

JG: If you pick up the standard textbooks in database systems, they will say that the worlds can be divided into three data models. One of the data models is the network data model. The network data model has the property that the database looks like a graph. The nodes of the graph are sets of records. Every record has a set of parents and it has sets of children.

DDJ: The word "Network", here, means it's modeled like a graph. Not necessarily as distributed.

JG: Exactly. That's right. A second data model is the hierarchical data model. That is a network data model where, in fact, every child has a unique parent. So, it is a tree. The relational data model doesn't fit the simple comparison like the other two. It says, "All data is represented in relations". So, certainly the sets are relations, but also the relationships between the sets are stored as relations. Therefore, there is only one kind of data. There isn't "edge" data and "node" data. There is just data. There is certain power that comes with being able to talk about the edges using the same language that you talk about the nodes. Chris Date, and others, have done a uniform language which can handle network data model, hierarchical data model, relational data model. Then, they go and do a set of applications in this uniform language. It routinely happens that the relational implementation is more simply described and the programs are simpler to write. If the relational system has reasonable performance, and a good optimizer, the execution speed of all three implementations are comparable.

That is more, or less, the breakthrough that the relational guys brought to the table. That is, they figured out how to compile from the relational model down to an execution environment that has comparable performance to the network and hierarchical data models. You ask the question, "What did I do"? Well, there were a bunch of people who were interested in trying to make the relational model perform well. This means that you take non procedural queries and you map them down to a record oriented storage retrieval engine that is pretty similar to the kind of plumbing that is used by the other data models. I was working in that plumbing group. There were several people who wanted to work on access paths, make B trees, making indices, doing other things. There were a bunch of issues that nobody wanted to work on, which were security, start up, shutdown, error recovery, and dealing with the operating system, in general. I love to program. I don't care what programs I write. I just like to write programs.

DDJ: You didn't mind getting your hands dirty.

JG: Oh, none of us minded getting our hands dirty. I was sort of an operating systems guy. I had worked in security. I had worked in this capability based operating system stuff. I knew how disks worked. I knew how operating systems worked. I ended up working mostly in the operating systems area. That meant that I got to do all of the stuff like -- Well, if the system crashes, what are we going to do? We are going to bring the database back right. I ended up. . . What if we run multiple processes against the same data? How do we organize the fact that you are updating our joint checking account and I am updating our joint checking account and the updates are supposed to happen in some particular order? Otherwise, things are going to be pretty unpleasant.

DDJ: These issues nobody wanted to deal with, although now they seem so critical to the performance.

JG: Everybody else wanted to work on relational databases. They wanted to make Ted Codd's ideas reality. Codd's database was a read-only database. Nobody ever updated it. He did not have update statements in his papers. The challenge was to make the queries run fast. As I said, I was working on stuff that, frankly, nobody else wanted to work on. The amusing thing about this is, you were right, now these are considered. . . Security is definitely a hot topic and has been for a long time. Preserving the data and making it come back after a failure is definitely something people want. Don't misunderstand, other people had, in building their database systems, had to have some plumber who worried about these issues. They had to have a plumber who worried about security. They had to have a plumber who worried about recovery and concurrency control. Perhaps the thing that is a little unusual about research groups that go at these problems is that their mind set is -- Let's understand this thing. Let's get a big complicated theory about it first before we start writing any code.

Therefore, I had been through this whole Ph.D. thing and had been programmed to be a mathematician and to come up with a theory first and then code second. I sat back and thought -- What is really going on here? We have got a bunch of programs. They are running concurrently. What property do we want? We've got this relational model which says, "We want to have a really simple model of how data is stored and how data is accessed. What is the really simple model of running programs in parallel?" We came up with the notion of a transaction and, as a programmer, you declare. . . You say -- Look the database is okay right now. I am beginning a step. I am beginning a transaction. I'm doing a bunch of things. I am getting to the end of my program and I say -- Okay, this is the end of the transaction. I want atomicity. I want all these things or none of these things to happen to the database. Subsequently, we have come up with the terminology of "ACID", which is -- Atomicity, all or nothing, consistency. The program starts out in a correct state, and ends in a correct state. In between, things may be slightly incorrect because when you are transferring money from one account to another, you have taken money out of here and had them put it in here. Therefore, things are not really consistent yet. Isolation is if two guys are running in parallel, it is as though you are running one transaction at a time. Durability is once the transaction is completed, its effect should survive any kind of failure. Those are the ACID properties that transaction systems try to implement. We developed a mathematical theory about this. I think a very elegant theory. It has been refined and developed over the time.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.