RavenDB & CouchDB - Map and Reduce

Published on 2010-6-6

Previous entries in the series

One of the recurring features present in the popular document databases is the use of map-reduce functions as the primary way to create views on the stored data.

Map Reduce

At this point, I could go into a long description of what map/reduce actually is but that kind of thing is available via the use of a convenient google search.

The short of it is that you map some data from each document into a structure to be queried on, and then run (and re-run) a reduce function over the mapped data in order to group it by some key.

Now, these map functions can get quite complicated, but the concept remains the same from the most basic versions up to the more complicated reports on the data.

Let’s look at the standard example of getting the number of comments across all the blog entries by a certain author.

Here is the structure of our example document:

   1:  {
   2:      title: 'robashton',
   3:      category: 'tech',
   4:      content: 'blah blah blah',
   5:      comments: [
   6:          { author: 'anon', content: 'blah' },
   7:          { author: 'anon', content: 'more blah' }
   8:      ]
   9:  }
 
 

The important data here is ‘author’ and the length of the comments array, so naturally we would map these in our map function.

In RavenDB

   1:  from doc in docs
   2:  select new
   3:  {
   4:       author = doc.author,
   5:       count = doc.comments.Length
   6:  }

In CouchDB

   1:  function(doc) {
   2:    emit(doc.author, doc.comments.length);
   3:  }

There isn’t anything much to say about these, the RavenDB map function just returns the mapped data, and the CouchDB function emits the mapped field(s) as a key alongside the value(s) associated with that key.

The reduction will therefore take place on ‘author’ (the key), and we would sum all the comment counts (the value) for that author in the reduce function.

In RavenDB

   1:  from result in results
   2:  group result by result.author into g
   3:  select new 
   4:  {
   5:       author = g.Key,
   6:       count = g.Sum(x => x.count)
   7:  }

 

In CouchDB

   1:  function (key, values, rereduce) {
   2:     return sum(values);    
   3:  }

 

The structure of these two functions immediately strike us as being very different and questions are therefore raised by that.

The reduce function can be called more than once for a set of documents (and this holds true for both Raven + Couch, that’s  the whole point of map/reduce), and in CouchDB an extra parameter is present called “rereduce”, which specifies whether this is the first pass or a subsequent pass. If the result shapes differ between the map function and reduce function, a check is required so different logic can be performed based on this.

For those of you who skipped that big body of text, the important thing to take away is that in RavenDB the responsibility of deciding what to group the documents on falls to the Reduce function, and in CouchDB the responsibility falls to the Map function.

So, these are two rather different ways of thinking about MapReduce and this is definitely something to be aware of when trying to jump between the two.

This was quite a long entry with a really short summary, so in the next entry, I’ll be listing and explaining some of the actual functionality differences between CouchDB and RavenDB.

This used to ask if you wanted to hire me

But chances are I'm not available, as I'm busy shipping stuff.

Drop me an e-mail anyway, as I like interesting problems.

Get in touch

blog comments powered by Disqus