FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
Embedded Systems
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
January 03, 2007
Local Device Search

The need for device-level search and how to integrate a data manager into a device application.

Malcolm Colton, Hitachi America
Malcolm Colton talks about the need for device-level search, with examples of search-enabled device applications and how to integrate a data manager into a device application.
Search, especially in small foot printed embedded devices in the consumer market, is non-trivial to implement, especially if it is applied to more than just alphanumerics. Savvy embedded developers are starting to take advantage of a new breed of database management systems designed specifically for devices.

They fit in a small footprint at run time and offer not just a high level query language, but also shared access to data, and guaranteed database consistency even after unexpected power outages.

Such features are becoming important in many embedded consumer apps because a number of converging trends act together to constantly multiply the amount of content.

Exploding mobile content
In the movies space, the advent of cheap digital film cameras and editing equipment has led to an explosion in amateur video. YouTube is the definitive example. Following the general decline in people's attention spans, a "typical" video program is rapidly shifting from the studio-generated 25- or 50-minute program to the amateur-produced 2- or 3-minute clip.

Instead of content coming only from a handful of studios, videos are being produced by tens of thousands of individuals and small groups. The result: millions of pieces of video content are being created and distributed every year.

Similar trends in low cost development and distribution are driving an explosion in still photography and musical content. Even the volume of commercially-produced broadcast content is exploding. Satellite and cable packages now may contain hundreds of channels. IPTV promises to increase this even further.

Why should an embedded developer care about this? Surely Google has the search problem under control? Well, yes, but only if you are searching content on the Web, and only if you have an Internet connection, which can't always be assumed.

Increasingly, content is being downloaded to handheld portable devices. These devices are now storing so much data that finding the content that the user wants is becoming a challenge. Here's a subversive thought: Was the success of the iPod Shuffle based on the fact that actually finding music on an MP3 player is a bit of a chore, so it is easier to just randomize it?

Integrating the Data
In one sense the MP3 player presents a simple search challenge: the task is to search a single collection of data to find the target. Looming on the horizon like a threatening storm cloud is the inconvenient fact that devices increasingly store many kinds of information, and the real value comes from integrating all that data.

Take the case of a mobile device carried by a field support engineer. It needs to integrate customer data, product data, service data, location data, and inventory data to be able to answer an obvious question like this one: "Find me a nearby customer with an open service order on a product that I am certified to maintain and where I already have the likely parts in my truck."

Moore's Law to the Rescue
Of course the increases in device storage come along with increases in RAM capacity and processing power, driven by the so far inexorable Moore's law that predicts a doubling in compute power roughly every 18 months. This increased power can be used to drive increasingly sophisticated search, and users of both consumer and commercial devices are going to want it. The question is how to best provide it.

The obvious way is just to code search into the device application. But this is not as simple a solution as it seems, and nor is it necessarily the best use of scarce development resources. This article takes the position that the optimal way to provide search is to embed into the application a COTS (Commercial Off The Shelf) relational database management system (RDBMS) that is optimized for the device environment.

The reason is straightforward: searching across multiple sets of shared updateable data is hard to do well. Training in device software development is not necessarily a good background for writing a database manager. While it is reasonably easy to write a search solution for a given requirements, it is very hard to write an efficient, compact, general purpose data manager.

Even if a very talented development team could accomplish this task, why would a manger want to spend their resources on invisible infrastructure instead of focusing on adding value within the team's core competency? Just as in almost all cases embedded developers use COTS operating systems rather than writing their own, so the time has come to use COTS data management rather than writing code for this function.

Desirable Database Features: Choosing a DBMS
There are three fundamental kinds of data manager available to the device developer: data management libraries, object database management systems (ODBMS), and relational database management systems (RDBMS). It is important to choose the right tool for the task at hand.

A data management library is useful for the storage and management of simple data sets. An application that saves the preferences of several users could benefit from this approach. An example could be the code that manages seat and mirror positioning in a passenger vehicle, retaining the settings for several drivers. The application is simple, the data is simple and the application can be written faster using a simple data management library.

An ODBMS is designed to provide more or less transparent persistence for application objects. An ODBMS can make object-oriented programming in a language like Java much simpler because it takes care of moving objects between persistent storage and RAM. While some ODBMS provide limited search capability, this is not their strength. Objects are normally retrieved because the application knows which objects it wants.

An RDBMS, on the other hand, is designed for content-based search. RDBMS are based on SQL (Structured Query Language), which is a set-oriented language that provides that ability to retrieve a record based on the value of any of its fields. This makes it the perfect choice for a device search application.

The Enterprise Database Grows Down
Historically, RDBMS have been the data manager of choice for enterprise applications, and they were designed for the enterprise data center environment. They demand large powerful machines, and frequent attention from database administrators.

Fortunately a new class of self-managing RDBMS is appearing on the market with much smaller footprint than their enterprise-class predecessors. Focusing on the subset of SQL most suited to device applications and sometimes offering advanced search for device datatypes like text and spatial, an embedded RDBMS will fit into a Megabyte of RAM or even less at run time.

For the first time, it is possible to think of embedding a relational database management system into a device application, and there are compelling time-to-market considerations that encourage embedded software developers to do just that.

Let's look at what a relational database management system has to offer.

Content-Based Search
The SQL language provides a simple search interface to data. In SQL, the application finds data by means of its content, not its location. An RDBMS stores data in tables made up of rows and columns. Rows are retrieved because the content of one or more columns matches the values in the query.

For example, information about music albums could be stored in a table like this.

In SQL, you would create this table with a statement like this:

CREATE TABLE Albums (
        Album_name             VARCHAR(254),
        Album_artist             VARCHAR(254),
        Album_label              VARCHAR(254),
        Album_year               SMALLINT)

You could find all albums by a given band using a query like this:

SELECT Album_name
FROM Albums
WHERE Album_artist = "Jefferson Airplane"

To make this query execute quickly, you could create an index on the Album_artist field:

CREATE INDEX Album_artist on Albums(Album_artist)

Once the index is created, the RDBMS maintains it and will use it automatically to speed searches on that field.

Integrating Data
An RDBMS enables you to integrate data from many tables using a join. A join connects columns of two or more tables using matching column values. This is useful for a couple of reasons.

One is that it enables cross-reference data stored in different tables, perhaps by different applications. Second is that you can reduce the storage requirements by storing each piece of data only once and then cross-referencing it where it is needed.

In the example table above, we can see that the artist name is stored many times, once for each of their albums. We can eliminate this redundancy by separating the artists out into a table of their own. We allocate each artist an arbitrary id so that we can perform the cross-reference:

To do this, you would execute SQL like this:

CREATE TABLE Albums (
        Album_name VARCHAR(254),
        Album_artist SMALLINT,
        Album_label VARCHAR(254),
        Album_year SMALLINT)

CREATE TABLE Artists (
        Artist_name VARCHAR(254),
        Artist_id SMALLINT),

To find all the albums by a given artist, you join the tables together, using the artist_id:

SELECT Album_name
FROM Albums, Artists
WHERE Albums.Album_artist = Artists.Artist_id
AND Artist_name = "Jefferson Airplane"

This powerful technique enables an application to cross-reference any data stored in a database, merely by specifying that field contents should match.

Transactions
Central to maintaining data integrity is the idea of transactions. In an RDBMS, a transaction is a collection of statements that either execute completely or not at all. The classical transaction is a transfer between a checking and savings account. Both the debit and the credit should happen, or neither should happen. There should never be a condition when only one or the other has taken place.

An RDBMS provides simple semantics to signal the beginning of a transaction and to either commit the set of actions or roll them back. The RDBMS makes four guarantees about transactions; these are known as the ACID properties: a transaction is Atomic, Consistent, Isolated and Durable.

Atomic means that the transaction succeeds or fails as a unit. Consistent refers to the fact that a transaction may not violate database integrity rules: if the checking account debit would result in a negative balance, and negative balances are illegal, then the transaction will not take place. Transactions are Isolated so that other applications cannot get an inconsistent view of the data by seeing partial results midway through transaction execution, and they are Durable because they survive power fail and reboot.

Even in a single-user, or single application environment, transactions are useful in protecting database integrity from errors caused by such things as an unexpected loss of power during a sequence of actions or a media error. But they are essential in the more complex environment of many modern devices in which the data is shared by many applications.

Time to Market
Maintaining data structures that reliably support efficient and controlled access to shared data is a complicated business. Database management systems are very sophisticated pieces of software built by engineers who specialize in this arcane branch of computing science.

Now that small footprint, self-managing RDBMS are available, it makes a lot more sense to embed an RDBMS in an application than to build data management logic from scratch. Once developers are relieved of the need to attend to the fine points of data management, they can focus on delivering the features that win customers. The result is a more robust, richer application delivered to market faster. And in today's competitive markets there are few second prizes. Getting to market fast is often the key to commercial success.

Embedded Application Optimizations
The enterprise RDBMS was created to run back office business, and so it is heavily focused on support for alphanumeric data. But embedded applications often have the need to deal with text search and spatial search. Some modern embedded RDBMS provide extensions to support these datatypes, bringing to them the same high level query interface that standard SQL provides for alphanumerics.

Each of these datatypes requires a new kind of index. Alphanumerics are scalar data: they can be distributed along a line. The B-tree indexing used by enterprise RDBMS is in effect a way to do a binary search along this line. Spatial data is 2- or 3-dimensional and cannot be efficiently searched using B-trees. Thus, a B-tree powered DBMS cannot answer a simple question like "Find me the points of interest that lie within this circle." The spatial search engines that power sites like Yahoo Maps use special purpose search algorithms, not enterprise RDBMS.

Some embedded RDBMS now provide Quad Tree indexing that allows efficient search of spatial data. Because of the ability of an RDBMS to integrate data from multiple tables, direct support for spatial data within an RDBMS enables an application to treat geography as just another source of information that can be joined to other data within the database. It becomes easy to ask questions like, "Show me the names of people who have sent me a text message recently and who are in this shopping mall."

With more mobile devices becoming location aware, either through GPS or some other location technology, new opportunities are arising to deliver services that leverage knowledge of the location of the device and its surrounding environment. These applications may even be able to operate when the device is disconnected from the network, supporting emergency services, or field workers in distant locations.

An unusual application of spatial search is to use it to locate media content. With devices like MP3 players and Personal Video Recorders (PVRs) storing thousands of items of content, the classical folder-based interface breaks down.

Users need a more intuitive way to find desirable content. One key is quantitative tagging. Content is tagged (by the user, or the content provider, or a community of users) using a number of quantitative metrics.

In the case of movies, this could be the complexity of the plot, or how scary the movie is, what ages it is suitable for, etc. In effect, this scatters media across a multi-dimensional space that can searched using spatial queries, enabling the user to find suitable content without knowing the name or location of the media.

Malcolm Colton is Vice President, Sales and Marketing/Deputy General Manager in the Embedded Business Group at Hitachi America, Ltd.

Embedded Database resources on Embedded.com

1) Ensuring Database Quality
2) Designing data-centric software
3) Providing real-time embedded to enterprise connectivity with DDS and DBMS
4) XML, SQL, and C
5) Building a effective real-time distributed publish-subscribe framework
6) Tackling memory allocation in multicore and multithreaded applications
7) Designing data-centric software
8) Reduce complexity of network systems development with data-centric software
9) Telematic software needs data-centric development approaches
RELATED ARTICLES
No Related Articles
TOP 5 ARTICLES
No Top Articles.



MICROSITES
FEATURED TOPIC

ADDITIONAL TOPICS

INFO-LINK