January 03, 2007
Local Device SearchThe need for device-level search and how to integrate a data manager into a device application.Malcolm Colton, Hitachi America
Malcolm Colton talks about the need for device-level search, with examples of search-enabled device applications and how to integrate a data manager into a device application.
Search, especially in small foot printed embedded devices in the
consumer market, is non-trivial to implement, especially if it is
applied to more than just alphanumerics. Savvy embedded developers are
starting to take advantage of a new breed of database management
systems designed specifically for devices.
They fit in a small footprint at run time and offer not just a high level query language, but also shared access to data, and guaranteed database consistency even after unexpected power outages. Such features are becoming important in many embedded consumer apps because a number of converging trends act together to constantly multiply the amount of content.
Exploding mobile content Instead of content coming only from a handful of studios, videos are being produced by tens of thousands of individuals and small groups. The result: millions of pieces of video content are being created and distributed every year. Similar trends in low cost development and distribution are driving an explosion in still photography and musical content. Even the volume of commercially-produced broadcast content is exploding. Satellite and cable packages now may contain hundreds of channels. IPTV promises to increase this even further. Why should an embedded developer care about this? Surely Google has the search problem under control? Well, yes, but only if you are searching content on the Web, and only if you have an Internet connection, which can't always be assumed. Increasingly, content is being downloaded to handheld portable devices. These devices are now storing so much data that finding the content that the user wants is becoming a challenge. Here's a subversive thought: Was the success of the iPod Shuffle based on the fact that actually finding music on an MP3 player is a bit of a chore, so it is easier to just randomize it?
Integrating the Data Take the case of a mobile device carried by a field support engineer. It needs to integrate customer data, product data, service data, location data, and inventory data to be able to answer an obvious question like this one: "Find me a nearby customer with an open service order on a product that I am certified to maintain and where I already have the likely parts in my truck."
Moore's Law to the Rescue The obvious way is just to code search into the device application. But this is not as simple a solution as it seems, and nor is it necessarily the best use of scarce development resources. This article takes the position that the optimal way to provide search is to embed into the application a COTS (Commercial Off The Shelf) relational database management system (RDBMS) that is optimized for the device environment. The reason is straightforward: searching across multiple sets of shared updateable data is hard to do well. Training in device software development is not necessarily a good background for writing a database manager. While it is reasonably easy to write a search solution for a given requirements, it is very hard to write an efficient, compact, general purpose data manager. Even if a very talented development team could accomplish this task, why would a manger want to spend their resources on invisible infrastructure instead of focusing on adding value within the team's core competency? Just as in almost all cases embedded developers use COTS operating systems rather than writing their own, so the time has come to use COTS data management rather than writing code for this function.
Desirable Database Features:
Choosing a DBMS A data management library is useful for the storage and management of simple data sets. An application that saves the preferences of several users could benefit from this approach. An example could be the code that manages seat and mirror positioning in a passenger vehicle, retaining the settings for several drivers. The application is simple, the data is simple and the application can be written faster using a simple data management library. An ODBMS is designed to provide more or less transparent persistence for application objects. An ODBMS can make object-oriented programming in a language like Java much simpler because it takes care of moving objects between persistent storage and RAM. While some ODBMS provide limited search capability, this is not their strength. Objects are normally retrieved because the application knows which objects it wants. An RDBMS, on the other hand, is designed for content-based search. RDBMS are based on SQL (Structured Query Language), which is a set-oriented language that provides that ability to retrieve a record based on the value of any of its fields. This makes it the perfect choice for a device search application.
The Enterprise Database Grows Down Fortunately a new class of self-managing RDBMS is appearing on the market with much smaller footprint than their enterprise-class predecessors. Focusing on the subset of SQL most suited to device applications and sometimes offering advanced search for device datatypes like text and spatial, an embedded RDBMS will fit into a Megabyte of RAM or even less at run time. For the first time, it is possible to think of embedding a relational database management system into a device application, and there are compelling time-to-market considerations that encourage embedded software developers to do just that. Let's look at what a relational database management system has to offer.
Content-Based Search For example, information about music albums could be stored in a table like this.
In SQL, you would create this table with a statement like this:
CREATE TABLE Albums ( You could find all albums by a given band using a query like this:
SELECT Album_name To make this query execute quickly, you could create an index on the Album_artist field: CREATE INDEX Album_artist on Albums(Album_artist) Once the index is created, the RDBMS maintains it and will use it automatically to speed searches on that field.
Integrating Data
One is that it enables cross-reference data stored in different tables, perhaps by different applications. Second is that you can reduce the storage requirements by storing each piece of data only once and then cross-referencing it where it is needed. In the example table above, we can see that the artist name is stored many times, once for each of their albums. We can eliminate this redundancy by separating the artists out into a table of their own. We allocate each artist an arbitrary id so that we can perform the cross-reference:
To do this, you would execute SQL like this: CREATE TABLE Albums (
CREATE TABLE Artists ( To find all the albums by a given artist, you join the tables together, using the artist_id:
SELECT Album_name This powerful technique enables an application to cross-reference any data stored in a database, merely by specifying that field contents should match.
Transactions An RDBMS provides simple semantics to signal the beginning of a transaction and to either commit the set of actions or roll them back. The RDBMS makes four guarantees about transactions; these are known as the ACID properties: a transaction is Atomic, Consistent, Isolated and Durable. Atomic means that the transaction succeeds or fails as a unit. Consistent refers to the fact that a transaction may not violate database integrity rules: if the checking account debit would result in a negative balance, and negative balances are illegal, then the transaction will not take place. Transactions are Isolated so that other applications cannot get an inconsistent view of the data by seeing partial results midway through transaction execution, and they are Durable because they survive power fail and reboot. Even in a single-user, or single application environment, transactions are useful in protecting database integrity from errors caused by such things as an unexpected loss of power during a sequence of actions or a media error. But they are essential in the more complex environment of many modern devices in which the data is shared by many applications.
Time to Market Now that small footprint, self-managing RDBMS are available, it makes a lot more sense to embed an RDBMS in an application than to build data management logic from scratch. Once developers are relieved of the need to attend to the fine points of data management, they can focus on delivering the features that win customers. The result is a more robust, richer application delivered to market faster. And in today's competitive markets there are few second prizes. Getting to market fast is often the key to commercial success.
Embedded Application Optimizations Each of these datatypes requires a new kind of index. Alphanumerics are scalar data: they can be distributed along a line. The B-tree indexing used by enterprise RDBMS is in effect a way to do a binary search along this line. Spatial data is 2- or 3-dimensional and cannot be efficiently searched using B-trees. Thus, a B-tree powered DBMS cannot answer a simple question like "Find me the points of interest that lie within this circle." The spatial search engines that power sites like Yahoo Maps use special purpose search algorithms, not enterprise RDBMS. Some embedded RDBMS now provide Quad Tree indexing that allows efficient search of spatial data. Because of the ability of an RDBMS to integrate data from multiple tables, direct support for spatial data within an RDBMS enables an application to treat geography as just another source of information that can be joined to other data within the database. It becomes easy to ask questions like, "Show me the names of people who have sent me a text message recently and who are in this shopping mall." With more mobile devices becoming location aware, either through GPS or some other location technology, new opportunities are arising to deliver services that leverage knowledge of the location of the device and its surrounding environment. These applications may even be able to operate when the device is disconnected from the network, supporting emergency services, or field workers in distant locations. An unusual application of spatial search is to use it to locate media content. With devices like MP3 players and Personal Video Recorders (PVRs) storing thousands of items of content, the classical folder-based interface breaks down. Users need a more intuitive way to find desirable content. One key is quantitative tagging. Content is tagged (by the user, or the content provider, or a community of users) using a number of quantitative metrics. In the case of movies, this could be the complexity of the plot, or how scary the movie is, what ages it is suitable for, etc. In effect, this scatters media across a multi-dimensional space that can searched using spatial queries, enabling the user to find suitable content without knowing the name or location of the media.
Malcolm Colton is Vice President,
Sales and Marketing/Deputy General Manager in the Embedded Business
Group at Hitachi America, Ltd.
Embedded Database
resources on Embedded.com 2) Designing data-centric software 3) Providing real-time embedded to enterprise connectivity with DDS and DBMS 4) XML, SQL, and C 5) Building a effective real-time distributed publish-subscribe framework 6) Tackling memory allocation in multicore and multithreaded applications 7) Designing data-centric software 8) Reduce complexity of network systems development with data-centric software 9) Telematic software needs data-centric development approaches
|
|
||||||||||||||||||||||||||||
|
|
|
|