RavenDB - The HiLo what how and why

Published on 2010-5-16

One of the issues I touched on in with the basic interaction with RavenDB was the awkwardness of with having to call SaveChanges in order to get the ids of entities that had been saved across the unit of work. This is not a problem new to the document db space, nor is it a problem new to any system where the domain has been mapped to any id based data store (ORMs/RDBMS/etc).

I was going to cook a home brew solution specifically for my use within my projects and blog about it in order that other people could use it, but after posting my intentions in the RavenDB mailing list to create something like this, Oren suggested that making it the default behaviour and moving id generation to the Store would be a welcome move.

After posting on Twitter about this now being default, I got asked quite a few questions on what HiLo was, what the advantages were, and why it was a good thing that in the .NET client for RavenDB this was now going to be the default.

The gist

Waiting until SaveChanges to get ids for saved entities makes writing logic against those entities troublesome
Calling SaveChanges every time a new entity is created makes transactions troublesome
Calling SaveChanges to get the entity id means a call across the wire just to get an entity id, which is expensive
Simply assigning a Guid to the Id makes accessing documents via REST an unpleasant experience
You can’t just assign a random integer, because you’d just get collisions as other clients did the same and tried to save their entities
HiLo provides a method of creating *incremental* integer based ids for entities in a fashion that is safe in concurrent environments

The algorithm

The basic premise, is that the server still controls the id generation, but effectively hands out a range of ids to each client, which the client can then hand out to objects as they are created, and when the client runs out of ids, it simply requests more.

Obviously, requesting a heap of Ids all at the same time would be expensive, so the idea is that the server provides a single id, a “Hi” value which controls the creation of the range on the client. (which provides the “Lo” value)

There are a number of ways this can be implemented, but the one I chose was probably the simplest, and credit goes to Tuna Toksoz for the blog entry which provided the means to implementing it myself.

The data store needs only store the latest “Hi” value, which starts at 1, and increases by 1 every time a new “Hi” value is requested by a client
The clients all use the same number for a “Capacity”, that is – the range of numbers that each “Hi” value represents. For example 1000
Each client requests a “Hi” value and resets their “Lo” value to 0
Every time a new Id is requested from the generator, the Id is generated by combining the Hi and Lo numbers together:

   1:  (currentHi - 1)*capacity + (++currentLo)

When currentLo reaches capacity, a new Hi is requested and the cycle starts over again

In the actual implementation, there is some locking going on around this algorithm in order to make the client generator available across threads (web requests) and avoid having to create a new generator per session (defeating the point of having one if you only create a single object in a session).

Let’s look at a sample run through, with a small capacity of “3”, to keep the sample small!

Description	currentLoBefore	currentHi	Created Id	currentLoAfter
Hi Request	0	1	1	1
	1	1	2	2
	2	1	3	3 (capacity)
Hi Request	0	2	4	1
	1	2	5	2
	2	2	6	3 (capacity)

As we can see, if all the clients are using the same capacity, and they are given different “Hi” values, then they can’t generate duplicate keys, but by and large they’ll be sequential in nature.

The implementation in RavenDB

In RavenDB, the default function configured against the DocumentConvention is now HiLo, which means if a new document is saved against the session with its Id set to NULL, it will have an Id generated on the spot which contains the name of the document and the incremented Id. Obviously this can be overridden by changing the convention to leave the created id at some default value of your application’s choosing.

My original implementation was a bit poor, generating quite a bit of noise in the document database (it was inserting documents to get the ids), and the incremented Ids were being shared amongst objects – which meant if you created say, blogentry/1, saving a new user would mean having newuser/2.

Oren changed this to directly store a single object in the RavenDB for the generator, and to create a generator per-type – which means a lot less noise and more sensible ids being generated for each document.

What it means

What this essentially means, is if you’re using RavenDB out of the box without changing any of the conventions, documents will have a generated Id as soon as Store is called for that document. This means that SaveChanges does not have to be called until right at the very end of the Unit of Work, which means all changes can be efficiently batched in a single request and as a result applications should be easier to write and performance should be easier to maintain.

This is a .NET client specific feature and nothing was changed in the database itself to make this work.

What this does mean, is that if multiple clients from different platforms are going to be connecting to RavenDB and manipulating data, if you’re using the default HiLo implementation then a similar algorithm will need implementing for those other platforms, using the same capacity in order to prevent concurrency issues. This is not necessarily a downside, but is worth making a note of if you are going to be having this sort of set up.

What I learned

While I might contribute the odd bug fix to open source projects now and then, the idea of going in and changing the fundamental way the .NET RavenDB client worked was a bit daunting – not from a technical perspective, but from a taste perspective as I wasn’t sure how Oren wanted things done. As he later said, he’d prefer that code that has to then change be submitted, then no code at all be submitted. I’d like to raise that with anybody who wants to contribute to this project – if you’ve got a good idea then hit the mailing list and suggest it and maybe implement it – nothing to be lost if it’s something people want to use.

In the end, my implementation is barely visible in there, but I'm still pleased that this is in there, it makes *my* life easier :)

Index Subscribe Respond

Rob Ashton

RavenDB - The HiLo what how and why

Published on 2010-5-16