Subscribe to my blog

RavenDB & CouchDB - Basic Queries

Published on 2010-6-2

Previous entries in the series

Once you have a number of documents in the database, you soon want to do more complex operations than simply retrieving a list of them.

Consider therefore the following and rather over-used example document:

   1:  {
   2:      title: "Another blog entry",
   3:      content: 'blah blah blah',
   4:      category: 'code',
   5:      author: 'robashton'
   6:  }

Our example query would be to get all of the documents from the database that were written by a particular author AND in a certain category.

Obviously querying all the blogs written by a single author, or all the blogs in a certain category would be fairly expected queries too.

Indexes in RavenDB

In order to perform any queries whatsoever in RavenDB, we first need to create an index.

   1:  from doc in docs
   2:  select new {
   3:       doc.author,
   4:       doc.category
   5:  };

This is effectively a map function written as a LINQ query which returns a single value, an object that is a map of the values to be indexed.

Get all the documents by author and category

indexes/entriesByAuthorAndCategory?query=category:tech AND author:robashton

Get all the documents by category

indexes/entriesByAuthorAndCategory?query=category:tech

Get all the documents by author

indexes/entriesByAuthorAndCategory?query=author:robashton

Those queries will return a list of whole documents which match the queries passed in.

Indexes in CouchDB

The same goes for CouchDB, only map functions in CouchDB have two outputs, and are written in JavaScript.

   1:  function(doc) {
   2:    emit([doc.category, doc.author], doc);
   3:  }

Return values are specified by calling emit, and emit can be called more than once for each document, thus multiple keys can be created for each document with a single map function. The first parameter in Emit is the “key” to be searched on, and the second parameter is the data associated with that key (in this case, the document).

Get all the documents by author and category

blogs/_view/byAuthorAndCategory?startkey=["tech","robashton"]

Get all the documents by category

blogs/_view/byAuthorAndCategory?startkey=["tech"]

Get all the documents by author

Ah. This suddenly a bit more complicated. I’ve not actually managed to come to a convenient solution, as far as I can understand from the docs, if you want to query specific fields within the key, you have to submit a POST request containing a JSON document with the fields you wish to search.

So it’s either that or create specific indexes for the queries you wish to perform. Performance-wise this is probably optimal but I don’t actually know for sure.

Paging in RavenDB

Paging in RavenDB is as simple as appending a start + pageSize to the query string

indexes/entriesByAuthorAndCategory?query=category:tech&start=10&pageSize=10

This will perform the query across the entire index and only retrieve the documents requested, this is an operation with trivial expense.

Paging in CouchDB

In CouchDb, a similar query string can be used, using “skip” and “count parameters, but these are considered expensive and instead to perform paging you should:

Summary

This really is just a whistle-stop of some basic functionality in these two systems, although it does highlight some fairly major differences in basic functionality between them.

Next up some more advanced functionality will be covered, going over the differences between writing reduce functions in the two

blog comments powered by Disqus

Peter Curd


Excellent summary! From this I can see that RavenDB makes more sense to me and supports notation I can understand. Great introduction for document database newbies like me :)What are the practical implications of having to index before you can select from a field? Does index maintenance become costly?

robashton


Does index maintenance become more costly?In RavenDB, indexing is strictly a background process, the idea being that they'll eventually become consistent (providing you aren't hammering updates in constantly). This is why it's great for low-write, high-read scenarios, because reads just look at an already computed index and are incredibly cheap.I wouldn't say it becomes *costly*, because of it being a background task, but sure - the more indexes you add, the more effort the server will have to make to keep them up to date as data is inserted/modified. That's when you look at scaling out by adding more servers and perhaps splitting your data/indexes out over those servers and using sharding strategies or whatever makes sense.

Simone Chiaretta


Indexes here are not like indexes in relational databases: they looks to me a lot like stored procedures.What happens if I want to do the same query without having the index? Will it just scan the whole document library looking for objects with the given properties?Wouldn't it be possible to have the server itself understands the query performed and create the indexes automatically?

robashton


Very astute, indexes in these systems can be considered more like materialized views or indeed stored procedures.Put simply, you don't perform the query without having the index - Couch does support temporary indexes but they're *slow* and the point of being up front about your querying needs is to make the act of reading data from the database *cheap*Sure it would be possible to get the server to understand the query, and create the index and then query against that index - but that would be expensive and defeat the point of moving to a system like this :)

Simone Chiaretta


With create the index automatically I meant like, creating the index just the first time, and then use it in subsequent queries like if it was defined up-front. So, after a warm up period the system will be as fast as with the manually created indexes.

robashton


Neither of the systems support this.It would be a bit hard to do it in a sensible way too.

Simone Chiaretta


Nothing is too hard for Ayende :)

robashton


I'm sure the discussion would be welcome on the mailing list!http://groups.google.com/group/ravendb/topics

Simone Chiaretta


Moved the conversation over there

Web Dev .NET


Tech Tweets for 2-Jun-2010

Ross Hawkins


A quick collection of useful .NET related links