Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

A Conversation with Jim Gray


DDJ: You mentioned the words "records" and "fields". When did that come about? Was there a specific time where computer scientists started using these terms?

JG: Yes. I think there was this evolution of programming languages. Fortran was a huge success in the scientific community.

The success of Fortran gave rise to a group that said, "Gee, we need a language for business." A group started COBOL. I forget what COBOL stands for, Common Business Oriented Language, or something like that. There were a lot of ideas. Record keeping goes back thousands of years. The COBOL group developed terminology for records and fields. Those ideas, then, grew into a subgroup of the COBOL group called the Database Task Group. Better known as DBTG. DBTG, in fact, developed not just a notion of record, but also the notion of record set and relationships among records. Charlie Bachman was the leader of that group and got an early Turing Award for his contribution to developing the ideas of records and record sets and navigating among them. Again, in the era of say 1964, implementations of DBTG were happening. The standard bumped along and I think it was the late '70s before the standard got approved. It was so controversial that there were many people who thought that DBTG was the wrong way of thinking about data.

DDJ: What was the alternative or some of the other options?

JG: Well, there were commercial alternatives. There was a product from IBM called IMS, which is still with us today. It is a high relational data model as opposed to a network data model. In the late '60s, the relational data model came on the scene. There was a fairly religious and polarized debate between the people who thought that the network data model was right and the people who thought the relational data was right. They were, frankly, talking right past each other. They are both good data models.

DDJ: They both share common aspects, don't they?

JG: They both have records. They have relationships. The relational guys say, "Well, we want to have the meta-data and we want to have all the data all inside the database. We want to have a common language for data definition and navigation. Simplicity is extremely important." The network guys said, "Performance is extremely important." In that era, performance was, in fact, king. They also made the observation that certain applications are very, very heterogeneous in their data structures. The ironic thing is that if you look at the internet, it is very much more than network data model. The DBTG data model, the procedural navigation through data -- Go here, then go here, then go here, follow URL, follow URL. That is much more in the style of DBTG than the relational model, which tries to have a single statement that treats the database as a whole and says, "Here is the question I want to ask. Go off and execute it and come back with an answer."

DDJ: Why are these abstractions important. For example, ultimately the data is stored on the disk as sequential bytes. That is the database really. At the application level, the programmer looks at it, usually in terms of he brings the data up in a binary tree, or in a form.

JG: Yes, or as a graphical display, or a tabular display.

DDJ: Why does the architecture or the abstraction of a database matter? What does it matter that much really?

JG: Well, I think it is striking that high-school kids can build web sites. What is going on there is that there are huge numbers of layers of abstraction that are making this possible. We have got, first, the hardware layer -- SCSI, instruction sets, and PCI, and so on. Then, we have an operating system layer, which introduces the notion of processes, address spaces, sockets, messages, and files. The next layer up is a notion of structured information, that, when you look at a file, you can certainly look at it as just a sequence of bytes. If you do, then you end up building, in your application, a great deal of mapping between the concepts that you are working with and the byte concepts. You can do that. There is no question about it. It is harder than, for example, having a tool that understands about records and fields and, in fact, will allow you to find all the records with a certain attribute. It will give you a so-called associative access to the data. What a database is, in fact, is one of these pieces of middle ware that gives you a slightly higher level of abstraction. We are still not anywhere near getting to the web site. There is a whole bunch of other software that is dealing with fields on the screen. In fact, HTML is a way of taking data and laying it out on the screen. XML is a way of saying here is a data structure and here is a style sheet and that is how you render it to the screen.

The evolution of these concepts of "record" and "field" and the ability to index information is continuing to this day. It is those abstractions which allow a fairly unsophisticated user (high school students who have minimal education because they are fairly early in the education process) to have enormous power in terms of the applications they can build. This is mostly because they can build on top of these layers of abstraction. On the graphic side, you can take JavaBeans or an ActiveX control and say, "I want a data grid." You describe what the properties of that data grid are. Then, you take that data grid and you bind it to a file. That binding is, more or less,automagic because you take this name and you match it up with this name. The notion of having, in the database, fields with associated types and associates integrity constraints is one of the things that is enabling web sites and application generators more generally.

DDJ: Okay, we'll talk a little about the web. I would like to get your thoughts on the web as this large distributed database. Before we get to that, I would like to ask you about your personal contributions to database theory and database science in the '60s.

JG: It is an interesting story. I was very interested in computers, in general, and their ability to represent information and help us think about information and reasoning. It was clear, at that point, that we were many, many years away from being able to attack these problems at the level I wanted to attack them. I just had an awful lot to learn. I started learning about programming languages and operating systems. I worked briefly, for a year, at AT&T on building a simulation system. That is to say a system which you could describe a set of differential equations and it would go off and simulate those equations and give you outputs. I came back to Berkeley and did a doctoral dissertation in the area of programming language parsing.

About that time, I got very interested in capability based operating systems, which today would be called object-oriented operating systems. The idea was that you could build a much better operating system if it was structured along a couple of organizing principles. One of the principles of capability based systems is that everything is encapsulated. The way you get modularity and the way you get security is by only allowing the caller to come through certain interfaces to the person being called. These are enforced both by the development environment and the operating system.

DDJ: This is the first time, by the way, that I've heard of object-oriented extractions or topics so early on.

JG: There was a great enthusiasm. Simila 67 is where an awful lot of the object-oriented ideas come from. It was around in that era. There were doctoral dissertations being written by people trying to understand how type systems work and arguing for encapsulation and arguing for polymorphism. Many of those ideas were bubbling in the late '60s or early '70s.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.