February 01, 2008
MapReduce II
David J. DeWitt and Michael Stonebraker are at it again. There was a lot of buzz on the Internet after their previous post. (Here is what I had to say about it).
Their first point on the new post tries to counter the claim that MapReduce is not a database so it shouldn't be judged as one. They claim that it isn't a matter of apples and oranges, but rather:
We are judging two approaches to analyzing massive amounts of information, even for less structured information.
The problem with that from there they continue to define a problem in database terms, then show how MapReduce will not be as good as a database in solving it. Well, duh.
The fact that isolated queries may run better in a pre-indexed database should come as no great surprise. As I noted in the previous post on the subject, MapReduce can be used to create the appropriate index or
partition the data into smaller chunks that would be easier to use to answer the type of queries David and Michael mention.
As Mark Chu-Carroll explains Map/Reduce and databases don't solve the same kind of problems.
Also what happens when the database is constantly updated?! I don't mind how scientifically accurate are the measurements that say database scale like no other things. I am more comfortable with the empiric
experience by companies like Amazon, Diggs, Google , and eBay who found they have to shard their data to support their scalability needs and not use distributed transactions/distributed databases.
Posted by Arnon Rotem-Gal-Oz at 03:55 PM Permalink
|