Open Thoughts

Google Summer of Code 2014

Posted by Cheng Soon Ong on June 3, 2014

GSoC 2014 is between 19 May and 18 August this year. The students should now be just sinking their teeth into the code, and hopefully having a lot of fun while gaining invaluable experience. This amazing program is in its 10th year now, and it is worth repeating how it benefits everyone:

  • students - You learn how to write code in a team, and work on projects that are long term. Suddenly, all the software engineering lectures make sense! Having GSoC in your CV really differentiates you from all the other job candidates out there. Best of all, you actually have something to show your future employer that cannot be made up.

  • mentors - You get help for your favourite feature in a project that you care about. For many, it is a good introduction to project management and supervision.

  • organisation - You recruit new users and, if you are lucky, new core contributors. GSoC experience also tends to push projects to be more beginner friendly, and to make it easier for new developers to get involved.

I was curious about how many machine learning projects were in GSoC this year and wrote a small ipython notebook to try to find out.

Looking at the organisations with the most students, I noticed that the Technical University Vienna has come together and joined as a mentoring organisation. This is an interesting development, as it allows different smaller projects (the titles seem disparate) to come together and benefit from a more sustainable open source project.

On to machine learning... Using a bunch of heuristics, I tried to identify machine learning projects from the organisation name and project titles. I found more than 20 projects with variations of "learn" in them. This obviously misses out projects from R some of which are clearly machine learning related, but I could not find a rule to capture them. I am pretty sure I am missing others too. I played around with some topic modelling, but this is hampered by the fact that I could not figure out a way to scrape the project descriptions from the dynamically generated list of project titles on the GSoC page.

Please update the source with your suggestions!


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.