Evented Github Adventure - Sentiment Analysis of Github Commits

Published on 2013-5-15

Carrying on in the EventStore series...

Okay, back to more practical things now we've covered how easy temporal queries are with the event store.

Ever wondered how happy developers from different languages were? Well, let's find out

First off, I downloaded a list of words for both positive and negative sentiment from the internet, here are the references to the studies done which provided these word lists for use:

   Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews."
       Proceedings of the ACM SIGKDD International Conference on Knowledge
       Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle,
       Washington, USA, 
   Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing
       and Comparing Opinions on the Web." Proceedings of the 14th
       International World Wide Web conference (WWW-2005), May 10-14,
       2005, Chiba, Japan.

So, how to use this? Well, I just pasted the list of words into a file in vim, and ran a macro over them to convert them into two arrays like so:

var happyWords = [ "yay", "funsome", "winsome" ]
var sadWords = [ "boo", "crap", "lame" ]

There are actually about 5000 words in total, but essentially what I'm going to do is partition by language and keep a count of

Now, real sentiment analysis is a little more complicated than simply looking for words, but we'll be happy with this for now, let's have a look at the projection:

function collectHappinessIndexOfCommit(commit, state) {
   var index = 0
   for(var i in happyWords) {
       if(commit.message.indexOf(happyWords[i]) >= 0)
          state.happycount++
   }
   for(var i in sadWords) {
       if(commit.message.indexOf(sadWords[i]) >= 0)
          state.sadcount++
   }
   state.commits++
}

fromStreams(['github-commits'])
  .partitionBy(function(ev) {
    if(ev.body.repo)
      return ev.body.repo.language
  })
  .when({
    "$init": function() {
      return { 
         commits: 0, sadcount: 0, happycount: 0
      }
    },
    "Commit": function(state, ev) {
       collectHappinessIndexOfCommit(ev.body.commit, state)
    },
  })

I guess I'll say that my "happiness index" can be expressed by

var index = happycount / sadcount

Or something similar (not the point of this post, if you want to change it then modify the JS on this page..), let's have a look at the chart of happiness over languages

Wow, look at those guys writing Delphi! Presumably they've got the best work/life balance ever known, or they know something the rest of us don't. The folk doing Puppet? I guess when your job is automating the crap that nobody else wants to touch you're going to be pretty miserable most of the time ;-)

Actually, most of the "old school" languages hang around to the right and the "new school" to the left - is this an indication that unhappy people jump ship sooner than others?

Note: The differences are actually hilariously small, and although there is a huge amount of data it is likely not statistically that relevant, this is just a bit of fun

2020 © Rob Ashton. ALL Rights Reserved.