Static analysis of Simple.Data code to generate databases

Published on 2011-4-4

The dynamic keyword in C#4 has been put to some good use already - and has attracted a few detractors (probably because dynamic is to C# as generic is to Java), but the fact that the dynamic keyword compiles down to simple reflection over System.Object does present some interesting possibilities over its more pure counterparts.

I had this thought at the weekend whilst doing something completely unrelated, and a brief Google suggested nobody else has bothered doing it with Mark Rendle's Simple.Data yet, so here I am with a proof of concept scribbled down after a couple of hours work this evening.

Given the following code:

I want to end up with a database table that looks like:

Looking at it, this is actually quite a simple problem, and we have two possible solutions to follow if a as a user, we are to do this without writing any further code on top of this.

Both have their merits and cons (and some cons are definitely shared hah), but from an accessibility point of view being able to do either of these would be pretty "cool"

I'm going for option 2, because I haven't done any IL in a while and want to remind folks that I'm not just a JS monkey, but still care about those .NET leanings too ;-)

Looking at the above code, as a user we can work out that the various columns and tables exist, and their types, so this should mean we can do the same programnatically against the compiled IL.

Let's look at the compiled IL for the first method as dumped out with Mono.Cecil in my immediate Window:

Okay, that's quite daunting, but breaking it down we can easily understand what is going on (I haven't read any docs, I just read the IL and figured it out, so I could be wrong :)

This is nice and simple, the only information we haven't got for sure is that those dynamic calls are actually being made to a SimpleData object because it's just a System.Object once compiled. I figure it might be possible to trace through the code to find at what point that object was actually created via the Open call, but that's way beyond the scope of this blog post.

As for analysing this, we have Mono.Cecil so may as well write a feature test to try our initial play out.

I'm not going to be clever about this, as it's just a play-about, so let's dive in and see what information we can find in the assembly - to do this we enumerate the types and pass them into some type of scanner.

We then have a look at all the methods on that type (duh)

The important information is found in the method call, and the important stuff we want to look for in a method is (for now):

With this in mind, I can think about how to identify these things

Looking at whether we have any method calls (returning references to those instructions - we just look for any call virts to an Invoke method (This is hardly fail-safe, but it'll easily do for that test)

Looking at any cached references to reflected data, again we just look for a loading of a field, the subsequent "goto", and check the type of the field (Callsite)

I can use these methods to get me information about what is going on here, and just check we're in a method that actually does something similar to what we're interested in.

The references to those fields will yield in interesting information about the table/column we are dealing with in Simple.Data, that is - the names of those objects.

I find this by going to that instruction and looking for the inevitable call to Ldstr, loading the name of the method call/property access onto the stack before making the reflection call.

So far so good, now I just need the type of the argument passed into the call, and I achieve that by looking at the arguments being loaded into the actual method call

Can you say hacky? I just look at the previous instruction and if it's a ldstr I know the argument is a string :)

All that is left is the putting together of this information into the model we're building.

This gives me an in memory model of the database, with the name of the table and the column we've found - creating a DB creation script from this is a trivial task left to the imagination by the reader (My Sql is awful man!)

This is where I stopped as I don't have much time to go further tonight, if anybody wants to fork the repository and carry on where I left off, it can be found here: https://github.com/robashton/Simple.Data.Generation

Clearly the rest of the work takes the following path if it was to be continued:

Is this actually a good idea? Possibly? Possibly not? I haven't read about the implementation of dynamic behind the scenes by the compiler (literally, not at all) - and don't know how much is left up the compiler when choosing how to do it (Looking at those cached fields...), and this particular script makes quite a lot of assumptions about this.

As an example of what implementing the dynamic key word on top of a statically typed language and runtime brings to us though, it's quite powerful - and it would be interesting to see it pushed further.

Thoughts?

This used to ask if you wanted to hire me

But chances are I'm not available, as I'm busy shipping stuff.

I am available for conference speaking, I like talking and will do so for just T&E (and perhaps a bottle of wine or two).

I have done soft keynotes, and usually entertain/rile people in equal proportion, but prefer to talk tech as that's what I do

Email me :)

blog comments powered by Disqus

fschwiet


Hmm that's a cool idea. It seems its going to get a lot tougher though when you need to generate a database update script for an existing application, even when the original database schema was created by the same tool.

robashton


That's just the standard "compare and contrast" these two database schemas right? :-)

[ICR]


Interesting. I tried using Cecil for reflection for a similar project of code generation based on property access on a dynamic object after I discovered you can't use dynamic in expression trees, but couldn't get it to work. Maybe I'll have a peek at your code at some point to see if that can set me back on the right track.

tobi


This is a disgusting and unneeded hack. It adds no value. It is just a toy made for the sake of doing new and interesting stuff, not for adding productivity. What about generating a database schema from fractals? Sounds fun.

robashton


"It is just a toy made for the sake of doing new and interesting stuff"No shit, it's called a blog.

tobi


Yeah I was being mean. Sry about that. I still stand by my opinion though.

tobi


Sometimes I think being able to act anonymously on the internet is a bad trait. It certainly was in this case. To finally add some value: I think this approach does not add anything because there is no difference between 'db.Users.FindByID(1)' and 'db.GetTable("Users").AddIdentity("ID")'. Both are string based. However the dynamic approach adds complexity to no end and requires the programmer to have a bigger working set of concepts because it is not self documenting how it works.

robashton


We're talking about the viability of Simple.Data or we talking about the blog post?The blog post I can agree with on the 'would I do this to completion' scale, it was clearly a bit of fun to see how far something can be pushed, with a nice write-up so others may benefit from the learnings. I can actually think of some good uses of static analysis that don't involve database generation. The key word is probably analysis.On Simple.Data I cannot agree, I've seen a lot of people picking it up lately and being rather happy with how it functions, as a way of just "getting stuff done" in a legible and non-ceremonial way its value has already been demonstrated. It might not be the tool for you, and it's certainly not the tool for me in a lot of the applications I develop, but it is a valuable tool for a lot of other people in the .NET space.

tobi


I did not talk about Simple.Data which I did not notice before today but I can see clear value in Simple.Data for certain users. I would actually recommend it to beginners. How much easier could it be to start writing data access in .net?

rock-meister


Another (related) area where static analysis would be really beneficial is to check the validity of the sql that is generated. Since the table names and column names are strings, I see value in doing a static analysis (post compile) to report errors in the sql.Thoughts?rock-meister