FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
Database
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
September 10, 2009
Introducing HadoopDB

A hybrid of DBMS and MapReduce technologies that targets analytical workloads

Computer scientists at Yale University have created HadoopDB, an open-source data management hybrid system for large amounts of data. The system is a hybrid of DBMS and MapReduce technologies that targets analytical workloads and is designed to run on a shared-nothing cluster of commodity machines, or in the cloud.

"In essence, HadoopDB is a hybrid of MapReduce and parallel DBMS technologies," said Daniel Abadi, assistant professor of computer science at Yale and one of the system designers. "It's designed to take the best features of both worlds. We get the performance of parallel database systems with the scalability and ease of use of MapReduce."

Yale graduate students and cocreators Azza Abouzeid and Kamil Bajda-Pawlikowski presented in-depth details of the new system at the Very Large Databases (VLDB) conference in Lyon, France, on August 27. The team demonstrated the system performance on a range of representative queries at the conference, both on structured and unstructured data, and outlined HadoopDB's characteristics along the run-time performance, loading time, fault tolerance, and scalability dimensions. HadoopDB will also be presented at HadoopWorld:NYC in New York in October.

Traditional approaches to managing data at this scale typically fall into one of two categories. The first includes parallel database management systems (DBMS), which are good at working with structured data that contain, for instance, tables with trillions of rows of data. The second includes the kind of approach taken by MapReduce, the software framework used by Google to search data contained on the Web, which gives the user more control over how the data is retrieved.

HadoopDB reduces the time it takes to perform some typical tasks from days to hours, making more complicated analysis possible -- the kind that could be used to find patterns in the stock market, earthquakes, consumer behavior and even outbreaks, Abadi said. "People have all this data, but they're not using it in the most efficient or useful way."

TOP 5 ARTICLES
No Top Articles.



MICROSITES
FEATURED TOPIC

ADDITIONAL TOPICS

INFO-LINK