Static analysis of Simple.Data code to generate databases

Published on 2011-4-4

The dynamic keyword in C#4 has been put to some good use already - and has attracted a few detractors (probably because dynamic is to C# as generic is to Java), but the fact that the dynamic keyword compiles down to simple reflection over System.Object does present some interesting possibilities over its more pure counterparts.

I had this thought at the weekend whilst doing something completely unrelated, and a brief Google suggested nobody else has bothered doing it with Mark Rendle's Simple.Data yet, so here I am with a proof of concept scribbled down after a couple of hours work this evening.

Given the following code:

I want to end up with a database table that looks like:

Looking at it, this is actually quite a simple problem, and we have two possible solutions to follow if a as a user, we are to do this without writing any further code on top of this.

Create/modify the database as the code is executed
Analyse the compiled IL and figure it out from there

Both have their merits and cons (and some cons are definitely shared hah), but from an accessibility point of view being able to do either of these would be pretty "cool"

I'm going for option 2, because I haven't done any IL in a while and want to remind folks that I'm not just a JS monkey, but still care about those .NET leanings too ;-)

Looking at the above code, as a user we can work out that the various columns and tables exist, and their types, so this should mean we can do the same programnatically against the compiled IL.

Let's look at the compiled IL for the first method as dumped out with Mono.Cecil in my immediate Window:

Okay, that's quite daunting, but breaking it down we can easily understand what is going on (I haven't read any docs, I just read the IL and figured it out, so I could be wrong :)

First up, we look at a compiled generated static field and if it's present we skip forward by about 30 instructions
If it's not present, we load up the name of our method call (FindByUsername) and do some reflection to get information about that method call
We then do the same with the property access (Users)
Arriving at the point we would have skipped ahead to if those values had been present, we realise they are cached information about the calls to the "Object", only loaded once (sensible as Reflection is expensive yeah?
At this point, we can safely load up the arguments into the stack and make a call via the Callvirt to the cached reflection information on the DB object

This is nice and simple, the only information we haven't got for sure is that those dynamic calls are actually being made to a SimpleData object because it's just a System.Object once compiled. I figure it might be possible to trace through the code to find at what point that object was actually created via the Open call, but that's way beyond the scope of this blog post.

As for analysing this, we have Mono.Cecil so may as well write a feature test to try our initial play out.

I'm not going to be clever about this, as it's just a play-about, so let's dive in and see what information we can find in the assembly - to do this we enumerate the types and pass them into some type of scanner.

We then have a look at all the methods on that type (duh)

The important information is found in the method call, and the important stuff we want to look for in a method is (for now):

Are there any dynamic method calls made?
Are there any references to cached fields (Callsites)

With this in mind, I can think about how to identify these things

Looking at whether we have any method calls (returning references to those instructions - we just look for any call virts to an Invoke method (This is hardly fail-safe, but it'll easily do for that test)

Looking at any cached references to reflected data, again we just look for a loading of a field, the subsequent "goto", and check the type of the field (Callsite)

I can use these methods to get me information about what is going on here, and just check we're in a method that actually does something similar to what we're interested in.

The references to those fields will yield in interesting information about the table/column we are dealing with in Simple.Data, that is - the names of those objects.

I find this by going to that instruction and looking for the inevitable call to Ldstr, loading the name of the method call/property access onto the stack before making the reflection call.

So far so good, now I just need the type of the argument passed into the call, and I achieve that by looking at the arguments being loaded into the actual method call

Can you say hacky? I just look at the previous instruction and if it's a ldstr I know the argument is a string :)

All that is left is the putting together of this information into the model we're building.

This gives me an in memory model of the database, with the name of the table and the column we've found - creating a DB creation script from this is a trivial task left to the imagination by the reader (My Sql is awful man!)

This is where I stopped as I don't have much time to go further tonight, if anybody wants to fork the repository and carry on where I left off, it can be found here: https://github.com/robashton/Simple.Data.Generation

Clearly the rest of the work takes the following path if it was to be continued:

Check for all the other types of 'const' to be passed into Simple.Data method calls
Check for arguments/local variables being passed into the Simple.Data method calls
Allow for multiple arguments to Simple.Data method calls
Deal with other types of Simple.Data method call other than FindBy
Deal with dynamic operations being passed to other dynamic operations (Simple.Data does this)

Is this actually a good idea? Possibly? Possibly not? I haven't read about the implementation of dynamic behind the scenes by the compiler (literally, not at all) - and don't know how much is left up the compiler when choosing how to do it (Looking at those cached fields...), and this particular script makes quite a lot of assumptions about this.

As an example of what implementing the dynamic key word on top of a statically typed language and runtime brings to us though, it's quite powerful - and it would be interesting to see it pushed further.

Thoughts?

Index Subscribe Respond

Rob Ashton

Static analysis of Simple.Data code to generate databases

Published on 2011-4-4