Dr. Dobb's | Objects and Databases: State of the Union 2006

Objects and Databases: State of the Union 2006

Difficulties and different strategies of dealing with persistence in object-oriented programming environments.

November 15, 2006
URL:http://drdobbs.com/database/objects-and-databases-state-of-the-union/194400088

William Cook, the moderator of this panel discussion, is an assistant professor of computer science at the University of Texas in Austin. He is also co-author of the Dr. Dobb's article Native Queries for Persistent Objects.

William Cook, Moderator: Welcome, I'm William Cook. I'm moderating the panel this morning. This is the panel on "Objects and Databases: The State of the Union in 2006". It's very important that you keep in mind that the focus of the panel is on objects and databases, how they interact in general. It's not specific to object databases or any particular biased approach. We want to consider the problem in general.

I got interested in this because I was out in an enterprise software company trying to build applications, so I had first hand experience of the problems of building data intensive applications. So I am very interested in this topic. The way I look at it is that there are two aspects:

How well is the persistence idea integrated with the programming language where you actually build your general purpose solutions
How efficient, how scalable it is.

There is a tension, the level of integration and performance seem to be at odds, at least, we haven't gotten the sweet spot yet.

The other thing I wanted to say, in terms of historical things, just to give us a little context, for the object database viewpoint it is a little bit like there has been something like the A.I. [Artificial Intelligence] winter, I don't know whether people are familiar with that, where there was a huge explosion of hype around A.I. and then there was this gigantic backslash when they didn't deliver on all the promises, and so A.I. is actually coming back now a little bit more -- it is recovering from that.

My hypothesis is that maybe object databases experienced maybe an object database winter, and maybe it's time the thaw is coming out there. And for the other one that I am interested in is the whole object-relational mapping which is the other great thing that we do -- trying to match objects with relational databases. We have been doing that for a long time and I am not satisfied. That's not maybe as good as it could be, maybe it could be better. I wanted to challenge the panelists to tell me that as well. So those are just my viewpoints, I hope you all start having questions and write them down. And without any further due I like the panelists to begin.

Bob Walker, Gemstone Systems: My name is Bob Walker, I'm from Gemstone Systems. I've been with Gemstone for about 10 years. I've been dealing with object-oriented systems for closer to 15 or so years, mostly business type systems. I was gonna start by making a joke but my dog ate my notes so I'm just gonna have to adhoc a little bit here. A lot of programmers seem to think that object-orientated programming started with Java. And that the OR impedance mismatch is something new. Guess what? It's not! I'd like to take a quick walk down memory lane, if you don't mind.

In late '80s and in the early '90s corporate systems were becoming more and more complicated. Software was starting to get very brittled, data was becoming very flawed, very inaccurate. A lot of redundancy in both the software and the data, I think we all pretty much know this litany. Enlightened information technologists had realized there is a serious problem. Waiting in the wings was object orientated analysis design and programming. What happened? Well, object-orientated programming came to the rescue by weigh of Visual Age Smalltalk, in many cases.

There are other Smalltalk dialects that had been in use of the time. I recall a 1990s something IBM putting the stamp of approval on Smalltalk for the corporate business world, and it was turning into quite a hot market. So Java isn't the first pass at OO that corporate fortune 1000 companies have taken. Visual Age thus gave the risk averse corporate IT staff a permission to start doing OO. People were talking about it as the next Cobol. Typically when the proof of concept is done it's taken on as a simple application -- this has been true of Smalltalk, this is true of Java.

The reason I talk about this, is that there is what I call a "complexity of progression." A first pass application is never taking on the most complex problems that a corporation is trying to solve. We're dealing with relatively simple data. What happens in the second pass? We start dealing with relatively complex data. The impedance mismatch problem: I'm not going to try to define it here, I think everybody here knows what it is. A search to rear its ugly head. As people were running into that, using Smalltalk in the '90s, they started using something that we liked to talk Multiuser Smalltalk or Persistence Smalltalk, which is Gemstone Smalltalk. Fundamentally this is an extension to the Smalltalk language that allows you transparent persistence. It's very, very simple: You just simply take your object, bind it into a name space, commit it and it's there for ever afterwards, the APIs are simple, programming is simple, you do it all in Smalltalk. The reason why I'm talking about this is because sometime ago Java happened. When Java happened a lot of the organizations kind of put the breaks on where they where with OO. "What's this new language? What's going on in here? Maybe we shouldn't invest so much in where we are. Let's put on the breaks and see where this is going".

I think now 10 years into the progression of Java we are at a place where mission critical, time critical applications are being implemented and have been for the last couple of years. They've reached a complexity threshold where the OR impedance mismatch problems were finally starting to show up. A lot of people who walked into going "Oh Java! It's OO! It's new! We can do lots here!" didn't realize at first that this was going to happen.

The long story short: You had expressed some interest, Bill, in languages that are extended with persistence. We've been doing that at Gemstone for practically 20 years. We know that it's possible, we know that it works. It's very simple, you don't have to have a relational database, you can use a relational database if you want to. My challenge to the team here, and to the people in the audience is to start thinking in terms of transparent persistence as a part of the language. We've been doing that. But I think that it can be improved across the board.

Derek Henninger, Progress Software: Hello I'm Derek Henninger from Progress Software, and one of the things I've noticed looking into the introduction of this was that the last objects in database panel was quite a while ago. Someone looked at the one that occurred about 10 years ago. Everybody on the panel came from an object database vendor. Now I look up here and roughly half are talking about object relational mapping solutions and I think that is one of the trends that clearly has manifested itself since the last time this occurred and I think we'll continue to see that.

Progress is one of the vendors that has both an object database and an object relational mapping and caching solution. And I think we're going to continue to have progress and I think in the industry we'll continue to see the need for a interoperability between the object world and the relational world as well as other database formats. People are building their new applications in object orientated languages yet there is still a wealth of data in other structures beyond object databases. That said there are significant benefits that customers, that our customers get and other customers and people on this panel get from utilizing the power of an object database and not having that impedance mismatch.

Nonetheless though we see that interoperability is necessary. So even our customers who have standardized on ObjectStore for major parts of their applications still need to get data out of a relational database. They need that interoperability. So I give a kind of an anecdote in terms of the benefits of a ODB: A recent customer of ours, Starwood Hotels, had an application built on Oracle. What they were getting for this very sophisticated query they were doing in order to measure room availability and stays and pricing and all that kind of that in one fell swoop. They were getting hundreds of transactions per minute. As they moved to an object database, they were getting hundreds of transactions per second -- so roughly two orders of magnitude improvement.

For many of the applications out there I do believe that object representation can provide that kind of performance, particularly where performance and scalability and memory access is really critical.

Robert Greene, Versant: Hi I'm Robert Greene, I'm from Versant Corporation. I've been working with Versant for about 10 years -- also seen the ups and downs of the industry and I'm experiencing the winter, but I also feel that there's some potential for comeback.

I think that looking at the State of the Union, if we think about where we've been, where we're now, where we gonna go, it helps to put things in a little bit of perspective. Looking back we were very, very datacentric in our approach, focusing very much from a data perspective, driving that back up into sort of a structured programming paradigm and how does it operate over that data. As things have evolved and we've begun to think more about objects and domain driven designs, we're naturally now at the stage where we are trying to figure out how does it take the evolution in the language layer and bring that down into what we're doing in the data tier.

So the things you've seen happen in the Java space in particular with object relational mapping are first efforts to dealing with that problem. The reality is, most of the data out there is in relational form and we're increasing in XML form. So we need ways to deal with that but in the same time we need to appreciate, that evolutions in the language layer are as important as abstractions that we put over existing data infrastructure. So that ultimately we can create solutions that will solve the problems at hand in a more efficient scalable way.

Erik Meijer, Microsoft: I'm Erik Meijer, I get paid by SQL Server and I work for Visual Basic and C#, so I experience objects and databases everyday and the impedance mismatch.

But what I really want to talk about with you today is about car sales people. All think that you love them, right? You go and buy a car and then they say: "You can get your Corolla and a navigation system or you can get a Prius and your fuel efficient engine." But of course what you want is to have a Sienna and a navigation system and a fuel efficient engine. And the problem we have currently is that we also have this kind of disjunctive normal form for data.

So it's like either you have objects and pointer based traversal or you have objects and foreign key/primary key based traversal.

So how can we get rid of this kind of disjunctive normal form in this ugly world? There are two ways to do that. One is to define a kind of an uber data model that some people try to do and then map everything into that uber data model. Or the other one is to try to unify, to look what is common between all these data models, how can we abstract over each model and over each collection type etc. And this is like where the future will be so I will leave that as cliffhanger for the next round to talk to you about that.

But before that I want to give this -- it's a memorial candle. This is for all the companies that are still looking at the uber model and I'd like to symbolically burn this candle for them as a memorial that they will go down in peace.

Christof Wittig, db4objects: I totally agree with you Erik -- no uber model. My name is Christof Wittig I'm the founding CEO of db4objects, the open source database company based in San Mateo. We started two years ago and we're certainly news in the space because we're the only object database company that ventured for the last 10 years. And there's reason to do that other than trying to get ourselves hurt.

In fact, more and more people -- and that's the power of open source and the collaborative model behind it -- more and more developers understand that relational database paradigms are not a solution for all persistence tasks in object orientated environments. And they just don't want to believe what Microsoft or Oracle often tell them: Use ORMs and they will make the pain go away. So our 15,000 registered developers and growing a thousand every month, they understand there are spaces where different persistence strategies are needed and one of the options is object databases.

Object relational mappers are a band aid for a problem, they're not the solution. Ted Neward called it very nicely the "Vietnam of computer science": Object relational mappers are subject to diminishing marginal returns. You try to solve one problems and you get two new ones.

You see it in added complexity and deteriorated performance. He says there are basically three strategies for object orientated persistence. Either embrace objects, object databases; abandon objects -- that's a good strategy -- or suffer from object relational mappers. Many developers, that for instance build applications for these cell phones, or customers of ours like Ricoh and Seagate in consumer electronics, they understand that object relational mappers simply don't work on these devices. Market research shows that basically 50% of those developers write their own persistence solution -- and that's in 2006.

So I want to say there is a case to be made for object databases and whenever you hear someone saying "Object databases have no use to object orientated developers", you just can tell them "You simply don't know the space enough".

Patrick Linskey, BEA: Well this should be interesting. I'm Patrick Linskey, I work for BEA for more than about a year, before that I worked for a company called Solarmetric: That makes an object relational mapping framework.

BEA acquired us a year ago andI'm the lead of the team responsible for that and also getting involved in the rest of what BEA does with data. So I actually, from an ORM standpoint, get to solve two problems in one little kind of minute we have for talking here.

The gentleman on my left mentioned that object relational mapping is a band aid. I think that this is a very common perception. There's a strong desire to find this uber model, the one thing that can solve all of our persistence needs, and one way to do all that. And I think that the most important thing to remember about objects in databases is that the formatyou want to store your data in, and the format you want to use your data in, are almost never the same. So there's inherently almost always some sort of mapping that happens even if it's just taking the bits in memory and writing them to be bits on disk. There's still a transformation on to disk that is happening and you're exporting, memories are different constraints for loading that data back in etc.

So there's always some sort of a mapping happening, and the important thing is figuring out a solution that works for your business needs, your technical needs and that will work moving forward in the future. I think that the biggest thing there is the API that you use to interact with the data: The way that you access the data and the way you manage your transactions and the way that you manage your units of work and things like that. The mapping exercise, storing it into whatever format you might be using, ideally, if you can use a unified API for communicating with your data or some small number of unified APIs that are interoperable, then how you map your data becomes a tuning exercise. Whether you're storing data whether you're using an object database or a relational database or a LDAP store or a network database or a flat file or whatever, it becomes an exercise in politics and optimization. Rather than in kind of upfront, let's start designing, now we must decide how we are going to store our database.

Craig Russell, Sun Microsystems: Hi, I'm Craig Russell, I'm with Sun Microsystems, I came to Sun in 1999 in order to do object to data representation and I don't use the word object to relational data. From my perspective the important thing is how the programmers use the data and what they need. The less important thing is whether it's an object relational mapping or a flat file or a XML file. I like to concentrate on "How can we make the persistent data appear to the programmer already made for active use".

So unlike a used car salesman, which I really don't identify with at all, I'd like to use the analogy of "Trying to get a nice meal at a restaurant". The database is behind the doors in the kitchen and we really have no idea what goes on back there. The health department does, but certainly the diners don't ever see that. What they see is a presentation in front of them and so like in objects as the dishes, the flatware, the silverware, the gablets and so forth.

I really have to say that I think "impedance mismatch" is completely overrated. I give you an example of impedance mismatch that might work: You sit down to your table and you're given a giblet. That's what you have to eat. Out comes the chicken and you try to eat the chicken with the gablet. It already has got some wine in it and you like some water to the giblet, and the peas and the carrots, they're all in the giblet.

That's a real impedance mismatch from my perspective! The way you like to eat your meal, it comes out of the kitchen, it's all delivered for you, everything is in its proper place. You've got the wine, you've got the water, the various things, you can eat it with a fork and a knife and it's all just really easy to use. Easy to use food, that's my objective in this talk here.

So you can think of an object database as the dishes prepared in the kitchen: It comes out ready for use. You look at it, you see it, you eat it. You can look at JDBC as the food comes out in a big platter and it's your job to serve yourself, family style basically. You take a little bit of this and a little bit of this and you put it on your plate. So you construct the objects that you're looking for. And you can talk about an object relation mapper as the staging area inbetween the kitchen and where it's actually going to be served. The waiters are doing a tableside presentation, they mix all the stuff together, put it on the plate and serve it for you. You never had to do any of the work, you saw it done. Maybe a look behind it, you saw it done, you know what is going on. You know the data coming out of the kitchen is not in the form that you're going to eat it.

I'll just leave you with that in saying that there's a real separation in my mind between what is in the kitchen and what shows up on your plate. My focus is what is on the plate.

William Cook, Moderator: All right, thank you. If anybody wants to paint the picture of where we are going to be in 10 years, I'm going to give you a minute to do that.

Bob Walker, Gemstone Systems: What we found since we're already doing transparent access to objects without having to do anything other than to say "Wait or I want an egg" and the egg is still there. I think that problem has been solved in terms of OR mapping or impedance mismatch, I like your analogy, I think that is very good. I think the next step and from what I'm hearing from programmers, I hear things like "I just want my objects. I just want them here and I want them now and I don't want to have to mess with it, I just want them there, I don't want to have to deal with the database, I want all that stuff taken care of for me."

I think the next step, and I think we're seeing this at Gemstone, is what basically is a distributed in-memory live object cache that has transactional attributes but it doesn't deal with disk space storage what so ever. I think, five years from now, we are going to memory cheap enough and fast enough that there won't be any discs. There will simply be in memory objects, ubiquitous throughout the enterprise, a robust sea of objects always available for the programmer. They don't have to mess with OR mapping, they don't have to mess with object databases, they don't have to mess with SQL, the objects are just there ready for the taking and for the use.

William Cook, Moderator: Great so we have a convention. If you agree with that, you can just say "Gush!" I completely agree with everything and I don't need to say it again. So if anybody wants to agree with that they can and if anybody give a different view.

Derek Henninger, Progress Software: So I'll "gush" -- but I think "Fine there is this distributed object caches in memory, great"each object application, even two applications of the same domain, often have different object structures. There's going to be transformation required between that. I completely agree there's significant transactional characteristics that are going to be associated with that, that need to be managed and I think it's our responsibility to satisfy that and the needs of the programmer and to make that simple and easy to use.

Beyond that, I still think that the relational model and other data models are going to exist for 10 years. We've been saying that some of these mainframe database are going to die and they still sure as heck haven't. That kind of mapping and transformation into the objects in memory representation is going to need to occur and you're going to need to do it in a way that does not impact those legacy applications that nobody wants to touch. Those Cobol programmers that we talked about so affectionately earlier, those guys don't want to touch their applications. You do something that affects their application logic, and keep in mind those old Cobol applications make a lot of assumptions about what the transactional characteristics are of the data they are managing. So there's a lot more complexity to it and I think the mapping is a key part of that.

Robert Greene, Versant: I've been stuck with that recurring dream myself but I would tend here to side with Erik. You're not going to get away from the realities, of having to worry about the transaction nature of your logic and also the raw scale. When you start talking about dealing with information which is in the terrabyte and even approaching the petabyte rangeyou're gonna be hard pressed to blindly deal with that it's that there are just objects that are available. You're going to have to have new constructs that are either built into the mapping and transformation layers or natively in the language, that can help you as a programmer to deal with those things and its transparent way to possible, but I think it'll never be completely transparent -- at least not in my lifetime.

Erik Meijer, Microsoft: I think it's a delusion to ever think that all your data will be available as objects. Over time there will be more and more data models and forms of data MP3s, audio, video, whatever. And you want to deal with all these forms of data, you don't want to get stuck. So the only solution is to look for something that very much rises over all data models. And there is a solution for that, it's called "Monads". This was invented by the mathematicians in the 1920s.

But if we look 10 years from now: I'm going to make a little digression here because I was judging the students competition and none of the students told me "My work is impractical and maybe in a hundred years there will be a use". But if you look at this theory of Monads that was invented by mathematicians in 1920 or something: They had no clue what this was useful for, so my believe is that in 10 years time we will be in a crisis because the current generation of students are only into impractical stuff and they're not generating the theory that is necessary to produce the kind of next generation technology. It's like oil, we're running out of mathematics quickly. This is my sober view of from 10 years now -- it's sad.

Christof Wittig, db4objects: My prediction of what is going to happen in 10 years is sort of a cultural war who owns the data. Will be the side in one way or the other -- I don't want to make a prediction in which way. Actually, the fact is that in enterprise applications the data belongs to the DBAs and not to the developers. I think that it's time that developers reclaim their territory. That's why we have so much traction in the embedded space which is also dubbed as zero administration -- lack of DBAs. But we should think a little further and take ownership of the data persistence. And that's where I disagree with Craig. It does make a difference whether your meal comes in 10 minutes or 10 hours -- a developer should care about this. Even if the API is the same -- he has a menu and a bill to pay in the end and a meal in between -- it does make a difference whether it takes 10 minutes or 10 hours.

Also the quality: Can I properly persist inheritance, can I properly really persist the power of object-orientated languages. The fact is that these kitchens that we have to date don't allow and cater to those needs. So we have to reclaim those territories and put ourselves in charge.

What works well in the embedded space, I think will very well be picked up with service orientated architecture in the enterprise space where the application, the database form a contained silo and no MIS consultant will ever touch the database but only use the predefined service gateways which are ownership of the developers themselves.

Patrick Linskey, BEA: I guess to that I'll say "Gush". One of the things that Eric said is really interesting. I think you'll see a lot of different types of data models coming out. I think the next one we're going to see in the next 5 or 10 years is this kind of this concept that we were talking about earlier, of data grids and data in the network, like data on the cloud rather than data on the little disk.

All this different new modes of data storage and data durability are going to end up demanding different APIs and different ways of interacting with them. That is definitely going to make things more interesting. But I think that also 10 years from now we're going to use a whole set of new languages. No one is going to use Java, no one is going to use C#, everyone is going to use Dflat and Cava or whatever. We always do this: We throw away all the knowledge we learned and start from the scratch again. And I think 10 years from now we'll be going to do that again. We'll have the same discussions about what's the right way to do it, what's the right way for the IDs and what's the best storage. We'll probably disregard the past. That's not what I hope happens but it's what I think will happen.

Craig Russell, Sun Microsystems: I'll just say that I really "gush" about the fact that the big crisis coming is about who owns the data. Is it going to be the people who produce the content or is it going to be the people who distribute the content? The digital rights management I think, we haven't seen the third act or the fourth act play yet and that's going to be somewhat critical. What I do know is that the date you know who owns it, the in the corporations, and the people who pay the piper which is the corporations, there'll be a significant number of those who demand very fast access to the data and the higher programmers who can deliver that fast access.

I'm sorry because in my analogy I didn't carefully enough recognize these staff and the people in the kitchen in serving this meal. I do recognize that a good presentation is a well orchestrated operation involving a whole, serious people from the backend, from the people who actually order the meals, order the raw materials and prepare the meals to the people in the backoffice accounting wise who magically transport your plastic and deliver you the key to the exit door because you don't leave without you pay.

So there's a lot of roles going on in delivering persistent data. I'm in describing what my focus was: To deliver the ultimate dining experience. I recognize that there's some on the panel here who are more involved in the back office kinds of operations.

My vision for the future is ubiquitous data, but the data is as distributed as varied and the ownership varies with the data. To the point of all accesses aren't in this particular style. The needs of the cellphone or the PDA are very different -- I completely agree with that -- and I don't want to eat my chicken cordon bleu with a cell phone.

William Cook, Moderator: I've been a programmer for a long time and I'm not sure I trust myself with the data -- so it's a little scary.

We have a couple of questions [from the audience] here.

Basically there's a question: "What is happening with ODJ?" I just want to elaborate on this a little more, this is the latest in the long line of standards for accessing relational databases starting with ODBC and we've got to have more acronyms and standards and proposal in that spaced than in any other space I know. There's something clearly going on here, we're having a new proposal every year or two, right? How many of these: JDO, EJB-this EJB-that, the .NET versions of this stuff, what's going on here? Is this ever going to end? It's roughly the question I have so if the people here involved in that can answer that.

Robert Greene, Versant: I just have one question to your statement which is that you're not comfortable with the data but are you comfortable with your models because I think that goes back to what Christof was talking about, taking control again of the models and thinking from a domain driven perspective and not having to worry about what's going on with the data, how it's stored is part of that model, but: Be comfortable with your model.

William Cook, Moderator: That sounds good to me.

Patrick Linskey, BEA: Regarding the question of the audience I think that there are a lot of acronyms out there and in the defence of the persistence role: The web services world has even more.

William Cook, Moderator: They're not replacing their previous ones they are adding more of them. But in the relational world they are always replacing the previous one.

Patrick Linskey, BEA: That's true we do like to reinvent the wheel in the OR mapping world. I really hope that the standards in the OR mapping world settle down over time, because I think that the key thing, a good OR mapping standard, and a good object database API and a good LDAP API for storing objects in an LDAP server all end up looking a lot the same. You can do conversions between the different APIs with an easy set script. So it's in the best interest of everybody, except for the vendors, for there to be strong and adopted standards for in whatever form they get delivered for how you access data as objects. Because this is a problem that a lot of different people solve. Here's my way of accessing data as objects and the users of all these products are the ones that benefit from having a common API on top of them. Having extensions to APIs is fine but having some basic subset is really something that, any vendor that says something is not deliverable is trying to lock you into their product -- and that is not nice.

Christof Wittig, db4objects: I wanted to speak to one of the initiatives of William Cook himself, you may not know that he is the author of the safe queries concept, among others.

We have implemented that in our database as native queries and basically it makes the program language be Java or .NET itself the query language. It's totally native, it's type safe, it's object oriented, and it's fully optimized. That is one way to reclaim the territory, we don't speak the other language of the DBA guys. SQL was written for end users, not for programmers. We use our language because we're better in that. No one can be perfect in two languages at the same time -- no matter how good you are. So one standard is: Let's take our language standards and just use it, for querying for instance and for persistence tasks. So: no new standard please.

Bob Walker, Gemstone Systems: I want to second what Christof just said. I think having to deal with two different languages is part of the core of the difference that we run into in mapping objects into relational data. Our effort at Gemstone has to be try and keep it in a single language. So your queries are Java, your queries are Smalltalk. There's been an effort, ODMG 3.0 with the definition of OQL to define a standard object query language. This is something that would behove all of us as vendors to take a look at and participate.