RavenDB-The Image Gallery Project (XV) - Improving Tag Search with AutoComplete

Published on 2010-10-24

In the last entry we set up a basic search as you type search, but the user experience was still left rather wanting – what we could do with is the ability to get a list of tags starting with the current text, and for an added degree of complexity it would be nice if the tags were sorted by how many times they were used as well, so the most common ones appeared first.

This calls for a Map/Reduce Index

The great thing about indexes is that they don’t necessarily have to just map data into indexes, they can also pick a field or a collection of fields to group the mapped data by, and optionally perform some calculations at the same time. You don’t have to understand how this works in order to use it and I therefore won’t get into too much detail on that front.

What we need:

It is quite clear that the ‘tag’ is the field we’ll group by, and every time we come across a tag we need to add ‘1’ to a total.

Solving this with an ordinary LINQ query

Assuming we had a collection of ImageDocuments in a list called images like so:

List<ImageDocument> images = GetAllTheImagesFromSomewhere();

And we wanted to query this collection to ask it “What are the unique tags in this collection, and how many times do they appear?” – a simple LINQ query might look like this:

        List<ImageDocument> images = GetAllTheImagesFromSomewhere();
 
        var allTags = from image in images
                        from tag in image.Tags
                        group tag by tag.Name into g
                        select new
                        {
                            Name = g.Key,
                            Count = g.Count()
                        };

Now imagine instead of a collection we were querying against a DataContext in Linq2Sql or Entity Framework or whatever data wrapper you use, performing this kind of query against an actual database every-time the user changes the content of the text box would leave most DBAs in tears.

But the knowledge of how we build this query maps almost directly across into RavenDB.

Our View

We’ll build a generic view that we can use for purposes like this, and it will be very basic like so:

    public class ImageTagCollectionView
    {
        public IEnumerable<ImageTagCollectionItem> Items { get; private set; }
 
        public ImageTagCollectionView(IEnumerable<ImageTagCollectionItem> items)
        {
            this.Items = items;
        }
    }
    public class ImageTagCollectionItem
    {
        public string Name { get; private set; }
        public int Count { get; set; }
 
        public ImageTagCollectionItem(string name, int count)
        {
            this.Name = name;
            this.Count = count;
        }
    }

Our map can therefore look something like this:

                Map = docs => from doc in docs
                              from tag in doc.Tags
                              select new
                              {
                                  tag.Name,
                                  Count = 1
                              },

We’re just doing a select many to get all of the tags in the system, just like we did in the original LINQ query

The reduce is *exactly* the same as in the original LINQ query:

                Reduce = results => from result in results
                                    group result by result.Name into g
                                 select new
                                 {
                                     Name = g.Key,
                                     Count = g.Sum(x=>x.Count)
                                 }

 

Note: The shape of the map component must be identical to the shape of the reduce component.

Putting  all of this together into an AbstractIndexCreationTask, we get:

    public class ImageTags_GroupByTagName : AbstractIndexCreationTask<ImageDocument, ImageTagCollectionItem>
    {
        public ImageTags_GroupByTagName()
        {
            Map = docs => from doc in docs
                          from tag in doc.Tags
                          select new
                          {
                              tag.Name,
                              Count = 1
                          };
            Reduce = results => from result in results
                                group result by result.Name into g
                                select new
                                {
                                    Name = g.Key,
                                    Count = g.Sum(x => x.Count)
                                };
            SortOptions.Add(
                x => x.Count, Raven.Database.Indexing.SortOptions.Int);
        }
    }

That simple – we use the View we created before as the second generic argument to our  IndexDefinition and use that to perform our reduce/grouping statement, the essence of this is exactly the same query as we would use in an ordinary LINQ job.

As an extra, we also specify to RavenDB that when ordering by Count that we want it treated as an integer – this is required if sorting is to be done on any of the indexed fields.

The View Factory

The input for my view factory is going to consist of just the text we’re looking for matching tags for, to keep things simple

    public class ImageTagCollectionInputModel
    {
        public string SearchText { get; set; }
    }

I’ll write some tests for this in the usual manner, again this is a bit too wordy to just paste in the blog post, so can be found on Github, the tests themselves look like this though:

        [Test]
        [TestCase("So", 4)]
        [TestCase("SomeTag3", 1)]
        [TestCase("SomeO", 2)]
        [TestCase("Ano", 1)]
        public void WhenLoadIsInvokedWithSearchText_ExpectedNumberOfResultsAreReturned(string searchTerm, int expectedCount)
        {
            PopulateData();
            var results = ViewFactory.Load(new ImageTagCollectionInputModel() { SearchText = searchTerm });
            Assert.AreEqual(expectedCount, results.Items.Count());
        }
 
        [Test]
        [TestCase("SomeTag1", 1)]
        [TestCase("SomeTag3", 1)]
        [TestCase("SomeOtherTag1", 2)]
        [TestCase("SomeOtherTag2", 2)]
        [TestCase("AnotherTagEntirely", 1)]
        public void WhenLoadIsInvokedWithNoSearchText_ModelContainsItemsWithCorrectInstanceCounts(string searchTerm, int expectedCount)
        {
            var results = ViewFactory.Load(new ImageTagCollectionInputModel());
            var specificResult = results.Items.Where(x => x.Name == searchTerm).FirstOrDefault();
            Assert.AreEqual(expectedCount, specificResult.Count);
        }

Not the most efficient tests, and I might revisit that later in the series as a topic in its own right

The implementation of the view factory looks like the following:

 
        public ImageTagCollectionView Load(ImageTagCollectionInputModel input)
        {
            var query = this.documentSession.Query<ImageTagCollectionItem, ImageTags_GroupByTagName>()
                .OrderByDescending(x=>x.Count)
                .Take(25);
 
            if (!string.IsNullOrEmpty(input.SearchText))
            {
                query = query.Where(x => x.Name.StartsWith(input.SearchText));
            }
 
            var results = query.ToArray();
            return new ImageTagCollectionView(results);                
        }

Note: We’re not querying ImageDocument, as although it was used to create the map/reduce index, it isn’t what we are looking at – instead we use the ImageTagCollectionItem which we used to create the Reduce function in the first place, as it contains the same fields and will therefore result in the correct query.

Implementing this in Web

As before, I’m just going to expose this via a JSON service directly like so:

        public ActionResult _GetTags(ImageTagCollectionInputModel input)
        {
            var model = viewRepository.Load<ImageTagCollectionInputModel, ImageTagCollectionItem>(input);
            return Json(model, JsonRequestBehavior.AllowGet);
        }

Setting this up in my already created textbox using jQuery Autocomplete, I get a awesome user experience like this:

image

The great thing about doing this is…

We can run this query over and over again, it’s looking up a pre-computed index and that’s a cheap operation – we’re getting really good performance out of RavenDB and not really having to learn anything  too different from what we’ve learned from doing LINQ in the past.

2020 © Rob Ashton. ALL Rights Reserved.