Evented Github Adventure - Crossing the streams to gain real insights

Published on 2013-5-13

Carrying on in the EventStore series...

It's not enough to just create a stream of 'events' from correlated actions over time, although that is super cool, what we want to do is then use that stream to tell us something interesting about the activity of developers on Github.

Like, given that we have this information about paranoid pushes, how do they stack up as a percentage to pushes overall when broken down by language?

Problem: This information is in completely different streams.

Not to worry, this is where the Event Store's ability to re-partition and consume streams in a variety of ways comes to the rescue once more, want to consume the events from two different streams? Not a problem

fromStreams([ "github", "paranoidpushes" ])
   .when({

   })

First off, let's keep it simple and just find out as a percentage what the paranoid pushes are out of the overal Github stream.

fromStreams([ "github", "paranoidpushes" ])
   .when({
     "$init": function() {
       return { paranoid: 0, total: 0 }
     },
     "ParanoidPush": function(state, ev) {
       state.paranoid++
     },
     "PushEvent": function(state, ev) {
       state.total++
     }
   })

So yeah, er - that was stupidly easy, I didn't even have to think about that one, we just get all the events from those streams as a unified collection and were able to generate stats from these.

What does the result look like?

{"paranoid":181423,"total": 2272796}

So, in fact around 8% of all pushes to Github happen within 2 minutes of the previous one - that's actually quite high - and my guess is that these are either people rectifying mistakes in previous commits (actually, there's an idea for another projection) or people who are new to git.

Let's generate a result object by language and see what we get

function getPerLanguageState(state, ev) {
  var language = getLanguageFromEvent(ev)
  var langState = state[language]
  if(!langState) {
    langState = { paranoid: 0, total: 0 }
    state[language] = langState
  }
  return langState
}

function getLanguageFromEvent(ev) {
  if(ev.body.repo)
    return ev.body.repo.language
  if(ev.body.first)
    return ev.body.first.body.repo.language
}

fromStreams([ "github", "paranoidpushes" ])
  .when({
   "$init": function() {
     return {}
   },
   "ParanoidPush": function(state, ev) {
     var langState = getPerLanguageState(state, ev)
     langState.paranoid++
     return state
   },
   "PushEvent": function(state, ev) {
     var langState = getPerLanguageState(state, ev)
     langState.total++
     return state
   }
  })

There is a lot to take in here, but we can see

Anyway, the results of this?

And scaled

So the old school Java and C++ developers can't get enough of that push-based action, with nearly 16% of their pushes happening within a couple of minutes of their previous push.

Matlab too (presumably they're not doing CI to heroku), maybe they're just worried about their university computer sessions crashing or something.

Not terribly interesting results, but a good example of when it makes sense to combine two streams from the EventStore together.

2015 © Rob Ashton. ALL Rights Reserved.