Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

The Proper Care and Feeding of Object Databases In Embedded Systems


Truly intelligent embedded systems must not only think, they must also know, remember, and - thereby - learn. The thinking engine is the processor, aided in its learning by that part which knows and remembers: persistent storage.

Whenever one considers the persistent storage of complex data, one assumes that the infrastructure for that storage is some form of relational database. This, at least, is true of desktop and enterprise systems. And, given that fruits of technology employed on 'high-end' systems tends to tumble down into the embedded world, the conclusion that any persistence mechanism for complex data on an embedded system must also be supported by a relational database seems inescapable.

However, object-oriented languages have not only established themselves as the choice for enterprise and desktop applications, they have also made significant inroads in handheld and embedded applications. Given the widespread use of object-oriented languages, perhaps its time for embedded developers to consider a persistent storage infrastructure more in harmony with the application language: the object database management system (ODBMS).

This suggestion might at first appear to be contrary to the received wisdom of best practices for embedded software development. Processing power and memory space are often limited in an embedded system, and an object database would seem to be the hostile to such environments. After all, a frequent complaint of object-oriented systems - particularly those that are build atop virtual machines - is that they are processor and memory consumptive. Wouldn't the same hold true for an ODBMS?

However, the ongoing use of object oriented languages and systems has considerably improved the technology behind object- based systems. While the speed gap between a bytecode-executed application and a native-code application will never completely close, it is narrowing. Similarly, improvements in garbage collection algorithms (as well as programmer understanding of the garbage collection process) ameliorate -- to some degree -- memory concerns.

There are significant benefits to be had from choosing an object database over a relational alternative, benefits this article hopes to spotlight. To illustrate the points we'll be making, we have chosen the open-source object database db4o as our archetype. Db4o can be downloaded from www.db4objects.com, and is available in Java and C# versions. (Our examples will use Java.)

Performance and Memory Footprint
Two major concerns were alluded to at the outset of this article: processor horsepower and memory real estate needed to support an ODBMS might make such a database engine unusable.

Not necessarily.

Db4o's execution time competes well with a relational database "back-end". In a recent paper (Comparing the Performance of Object Databases and ORMs) published by the Department of Computer Science at the University of Pretoria, db4o was tested against a popular open-source object/relational database, and found to be the faster of the two -- over 40% faster in some cases. The authors ( Pietre Van Zyl et al) concluded that the object/relational database was superior only in "isolated cases."  Db4o's memory use is also favorable; the db4o library's memory consumption is about 400K. When executing, the library requires about a megabyte total to support the engine and its activities. In addition, db4o's API provides methods for tuning memory resource consumption.

In less visible, but still practical terms, db4o -- and any ODBMS, for that matter -- requires no SQL interpreter or execution engine. Meanwhile, if you choose a relational database as the back-end storage for your application, you will almost certainly be writing some SQL code. Your application will have to "step into" SQL to handle the database logic; and the SQL you will write is typically encoded in strings. The strings must be parsed and executed by some kind of SQL engine.

There are two immediate results to this. First, processor cycles will be consumed by the interpreter at runtime. SQL code cannot be executed directly it must be parsed and interpreted. Second (and less obvious) is the fact that the parsing at runtime allows syntactic errors to slip past the compiler. Something as simple as a misspelled word in an SQL statement could find its way into the executable. The subsequent failure would require debugging time, re-compilation time, and so on.

A less immediate result is the overhead injected into the application by the fact that the inclusion of an RDBMS into an object-oriented application places two paradigms under one roof. Data flowing between these two paradigms must be translated from one to the other as it crosses the invisible boundary between. That translation requires code, and that code eats memory and processor cycles.

This overhead becomes apparent if we compare the code required to read an object from a relational database, to similar code required for the object database db4o. Let's assume that we have a reasonably simple class called Datapoint, and we want to read objects of that class from the database. In fact, we want to read ALL of the objects of that class from the database.

// Represents a datapoint
public class Datapoint {
   public int deviceID;
    public int sensorID;
    public java.sql.Timestamp readingTime;
    public float value;

    ... datapoint's methods ...
}
Listing 1. The Datapoint class.

In the Listing 1 above, we have omitted Datapoint's methods, and defined Datapoint's members as public to keep things simple.

Reading the objects from a relational database would look something like the code in Listing 2. below. In this code, we will assume that the developer has chosen to store a single class per table. (In this instance, all Datapoint objects are stored in table DATAPOINT.) We will also assume that an open connection has been made, and is represented by the connection object.
...
Statement queryStatement = connection.createStatement();
ResultSet rset = queryStatement.executeQuery(
    "SELECT DEVICEID, SENSORID, TIMESTAMP, VALUE FROM DATAPOINT");

while(rset.next())
{
        Datapoint tDatapoint = new Datapoint();
        tDatapoint.deviceID = rset.getString("DEVICEID");
        tDatapoint.sensorID = rset.getString("SENSORID");
        tDatapoint.readingTime = rset.getTimestamp("TIMESTAMP");
        tDatapoint.value = rset.getFloat("VALUE");

       ... do something with tDatapoint ...

}

Listing 2. Fetching Datapoint objects from a relational database.

This code fetches an object from the database by actually fetching a row from a table. Then an "empty" object is instantiated, and the fields are copied from the row, and into the object's members. Once, that's done, the tDatapoint object can be manipulated by the application.

Equivalent code in db4o appears in Listing 3, below. Here, the object db is a handle to the database (an ObjectContainer, in db4o parlance), and corresponds to the JDBC connection object in Listing 2.

Datapoint tempDatapoint = new Datapoint();
ObjectSet rset = db.get(tempDatapoint);
while (rset.HasNext())
{
        Datapoint tDatapoint = (Datapoint)rset.Next();

        ... do something with tDatapoint...
}

Listing 3. Fetching Datapoint objects from an object database.

In Listing 3, we've employed db4o's simplest query technique: query by example (QBE). QBE uses a 'template' object to determine which objects are retrieved by the query. Since we have created an empty Datapoint object to use as the etemplate, this has the effect of fetching all Datapoint objects from the database.

Notice that, with the object database, no query string is needed; the object is simply called forth out of the ObjectSet iterator. More importantly, the object is fetched "wholesale" -- fully instantiated, and fields populated -- so the additional code in Listing 2 that must copy data from the rset object to the tDatapoint object is unnecessary. Hence, the corresponding object database code is shorter than the relational database code.

And the object database's advantage shown above holds regardless of the complexity of the object. If the object fetched were the root of a complex collection -- a binary tree, say -- and we had set the "fetch depth" (referred to as the "activation depth" in db4o-speak) to account for the tallest possible tree in our database, a single call would have fetched the entire tree.

BinTreeRoot tempBTRoot = new BinTreeRoot();
ObjectSet rset = db.get(tempBTRoot);
while (rset.HasNext())
{
        BinTreeRoot BTRoot = (BinTreeRoot)rset.Next();

        ... do something with BTRoot...
}
Listing 4. Fetching a binary tree from an object database.

The snippet in Listing 4 above fetches all the binary tree root objects (members of the BinTreeRoot class) from the database. And, assuming we have set the activation depth appropriates, also fetches the entire tree (all the nodes that the BinTreeRoot object references).

Imagine, now, trying to do that with a relational database. Fetching and instantiating a binary tree would have required iterating through a series of SELECT statements -- each pulling in a single node of the tree -- guided by some form of tree traversal algorithm, and converting the data retrieved from the ResultSet into the binary tree object members. The tree would have to be "wired together" explicitly in the application code.

Code of equal complexity would have to be constructed to store the tree into the database. Code would have to traverse the tree's member objects. Meanwhile, given that the root of the tree is in the object BTRoot, the equivalent code in db4o (for storing the entire tree) would be:

db.Set(BTRoot);

Listing 5. Writing a complex object in a db4o database.

The additional code required to fetch and store objects in a relational database arises from the oft-cited "impedance mismatch" between the relational and object paradigms. This additional code is absent from the db4o applications.

Intelligent API
Of course, a database library cannot simply boast a small memory footprint. The library's API must be well-chosen and efficiently implemented, so that functionality is not sacrificed on the altar of memory economy. The API should be "as simple as possible, but no simpler", to borrow a famous adage.

Db4o's API is surprisingly compact. In many cases, only a handful of methods are needed to perform the majority of database operations. Once a db4o ObjectContainer is opened, the following methods handle the functions of adding, updating, deleting, and searching:

1) Set(object) adds a new object to the database, or updates an existing object.
2) Delete(object) deletes an existing object from the database.
3) Get(templateObject) fetches objects from the database.

The above methods assume that the application is using db4o's QBE querying mechanism. db4o has two other query techniques, each suited to a different circumstances. We won't go into the details of those other mechanisms here, but suffice it to say that they cover even the most complex querying requirements.

Reduction of Complexity
The use of an ODBMS like db4o reduces the complexity of the final application in other not so obvious ways. For example, the db4o database library is housed in the same process space as the application. This allows the library to manipulate application objects directly, and eliminates any marshalling code required to pass data to a database engine executing in another process. (It also eliminates the memory space and processor cycles that would otherwise be consumed by inter-process communications used to 'connect' the application and the database engine.)

Admittedly, this is neither a requirement nor a mandatory characteristic of an ODBMs; many object database systems (even db4o, in fact) can operate in client/server fashion, dividing application logic from database logic. Nor is a single code-space architecture impossible for an RDBMs-based system. However, as a relational database uses a decidedly different representation of data than the form that data takes in the application's objects, a sort of bicameral structure is natural, with the SQL engine on the other side of an imaginary divide between it and the application. Many relational database systems separate the application and database into different processes.

Another easily-overlooked benefit is the fact that db4o keeps its database in a single file. In the world of enterprise applications, where disk storage is measured in hundreds of gigabytes and files are numbered in the thousands, a one-file database is no advantage at all. But, it's a different story on an embedded system with limited filesystem resources.

A single-file database reduces "clutter" on the destination device. In addition, the database is more easily installed, backed-up, or copied, because it's all in one place. Put simply, there are fewer pieces to keep track of. By contrast, some RDBMS systems create a subdirectory for each database, and store individual tables in separate files. This is another effect of the relational paradigm; each table stores well-defined rows, so it makes sense to separate tables in the filesystem.

Zero Administration
The actual computer driving many embedded systems is hidden from the user. There may be no mouse, a keypad instead of a keyboard, and LEDs instead of an SVGA display. User interaction with the system is strictly limited to the system's function. Consequently, the system must operate with zero administration.

You might say that we would prefer the database in an embedded application to simply "be there, and do what it's told."

From a developer's perspective, we would rather not have write any code that involves "describing" the structure of our data for the benefit of the database. For a relational -- or, more likely, an object-relational database -- such coding would take the form of a "schema file", that expresses the structure of our data in some formal language (sometimes a proprietary language, sometimes XML).

This schema file would be read by an interpreter to create the data definition language (DLL) code that builds the database to begin with. The interpreter might also create the interface code that reads and write database objects. (Such code would be the equivalent of what we did by hand in Listing 2, earlier.)

We would prefer -- again, from a developer's perspective -- to simply put an object in the database without having to tell the database what the object looks like. With an ODBMs like db4o, we don't have to build any schema files, because -- in a real sense -- we already have. The class definitions in the source code is the database schema.

To put an object in the database ... you just put the object in the database. As a result, we don't have to resort to anything like SQL's DDL (data definition language) to define the architecture of persistent storage. There is no "initialization" code that we need to write that constructs tables, defines columns, supports relationships, and so on.

Change Tolerance
Closely tied to the concept of zero administration is "change tolerance." A "change tolerant" database is one that easily manages alterations in the structure of the persistent data it stores. For example, if we modify an application so that an additional class is made persistent, we would prefer that the database need no alterations to accomodate the change.

An ODBMS, such as db4o, will accept a new class of objects easily ... transparently, in fact. Suppose an embedded application using db4o has stored only objects of class A in the database. For whatever reason, a time arrives at which the application must begin storing objects of class B in the database. What changes have to be made to the database? None; the application simply begins storing B objects, and the db4o database engine takes care of all the behind-the scenes work.

Contrast this with an RDBMS as the back-end database. A change in the kinds of objects stored in the database would likely necessitate the creation of a new table (that, in turn, means that DDL code must be written and executed to create the new table).

What about a change in an existing class structure? Suppose, for example, that a later version of a given application modifies a class by adding a new data member. Objects instantiated from the 'new' class will possess an additional data element, as compared to those objects of the same class already in the database.

An RDBMS-based application will have to either create a new table, and translate the old into the new, or modify the existing table, and fill the new fields (of the 'old' objects) with default values. In either case, the application's developers much construct code -- both SQL and application code -- to manage the upgrade.

With an ODBMS like db4o, changing an object's structure requires little or no database-specific code. 'Old' and 'new' objects of the same class can coexist in the same database. When an 'old' object is fetched, db4o instantiates the the object into the 'new' class and fills in the missing fields with default values (zeros for numeric, byte, and char data; empty arrays for arrays; and nulls for everything else).

Writing such an object back to the database causes that object to become a 'new' version. Over time, then old objects are silently transformed to new objects. Hence, the database can keep pace with evolving object structures invisibly and -- because the database need not be reconstructed wholesale in response to a class structure change -- upgrading deployed systems is easier.

For more complicated object evolutions, db4o provides callback methods that allow application code to intercept objects of specified classes to and from the database. So, the callback method can identify old objects, and populate the new data members with values other than the default. In addition, because the alteration is made in a callback, it is isolated from the remainder of the application code, yielding a more readable (and maintainable) application.

Efficient Use of Persistent Storage
If data in the database is going to see a lot of turnover, the database must manage deleted space reclamation. Historically, this has been an area of weakness for object databases, given that a single database may store objects of different size and structure. Meanwhile, a relational database system has the advantage that every row in a table is comprised of the same kinds of columns. (Sometimes, a table's rows might even be of fixed length.)

Db4o is closing the gap on this advantage that an RDBMS has over an ODBMS. Currently, db4o does reuse deleted object space, but a 4-byte 'leak' occurs each time an object is deleted. Db4o's developers are currently working on an upcoming version that should eliminate this leak.

In addition, db4o provides (separate from the database library) the source for a defragmentation class. This source can be woven into your code so that you can defragment the database at a time when doing so does not affect the embedded application's activities.

Other Considerations
Other aspects of db4o make it worth consideration in an embedded application. For example, db4o provides built-in synchronization (referred to as "replication" in db4o documentation). This feature is EXTREMELY useful for embedded applications running on 'remote' devices, whose data must be periodically exchanged with some central database. In fact, for such applications, this capability simply MUST be present ... unless the developer wants to copy the database over wholesale, and resolve differences on the destination.

You enable replication on a db4o database when you create the database. Objects added in the database are given UUIDs, and each time a persistent object is modified, a transaction number is associated with the object. Suppose now that a separate database is created (with replication enabled), and objects from the first are replicated into the second.

When the objects are moved into the new database, their UUIDs and transaction numbers follow them. Later, when the databases are synchronized (the objects in the second database are reconciled to those in the first) db4o's replication code can -- via the UUIDs and transaction numbers -- which objects have been modified, and which have not.

For each modified object, db4o calls a conflict resolution callback routine in your code, which examines the two objects in question and determines which object is the 'winner'. The process is conceptually straightforward, and having the bulk of the work done for you by the database engine makes implementing synchronization quick and easy.

Rick Grehan is a QA Engineer for Compuware's NuMega Labs in Merrimack, NH. He has been programming for nearly 30 years, and has written software in languages ranging from Forth to Fortran, 8-bit BASIC to Java, and 6502 assembly language to PHP. He is also a freelance writer. His articles have appeared in BYTE Magazine, JavaPro, Linux Journal, The Microprocessor Report, Embedded Systems Journal, and others. He has also co-authored three books; one on RPCs, another on embedded systems, and a third on object databases in Java. He can be contacted at [email protected].

Embedded Database resources on Embedded.com

1) Ensuring Database Quality
2) Designing data-centric software
3) Providing real-time embedded to enterprise connectivity with DDS and DBMS
4) XML, SQL, and C
5) Building a effective real-time distributed publish-subscribe framework
6) Tackling memory allocation in multicore and multithreaded applications
7) Designing data-centric software
8) Reduce complexity of network systems development with data-centric software
9) Telematic software needs data-centric development approaches

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.