A problem with reproducible research

Posted by Cheng Soon Ong on May 3, 2011

One weird side effect of open source software and reproducible research is that it would make it much more challenging to set meaningful computational exercises for teaching.

I'm organising a course this semester that looks are various applications of matrix factorization. The students solve various matrix problems throughout semester, and apply them to solve questions such as compression, collaborative filtering, role based access control and inpainting. The various solutions to the applications are ranked, and students are graded based on their rank in class for this part of the course. At the end of the semester, there is a small project where the students have to do something novel, and write up a short paper about it. We thought about trying to encourage open source submissions to the exercises and projects, but quickly realized that it would raise the bar.

If all students submitted open solutions to their exercises, than it would quickly become a plagiarism checking nightmare for the teaching assistants, since students submitting later would be able to copy earlier solutions. However, requiring each exercise submission to be different from previous ones is also somewhat unfair, as it quickly becomes quite difficult to find new ways to solve an exercise. Just to put things in perspective, exercises are simple things like using singular value decomposition to perform image compression. However, making solutions public has all the benefits of that we know and love from open source software. More importantly in a classroom environment, we encourage the students to learn from each other's solutions and to discuss problems amongst themselves.

Fine, we thought: "we can make the solutions open after the exercise deadline". This somehow defeats the last idea of encouraging students to discuss and solve problems together. Since the lectures then cover different material by then, the students are less motivated to work on a previous exercise. More subtly, it would make the final project much more challenging. If everything was secret, then all the students had to do for the final project was whip together some "baseline" methods using their exercise submissions, and develop a "novel" method that beats their baseline. Given the short 6 week time frame for the project, we do not expect significant novelty, but something that was not presented in the lecture. However, if all student exercise solutions were open, the novelty level would quickly rise, as the students would now have a baseline of all submitted exercise solutions.

Even if we could figure out a way to time it such that the solutions could not be copied by other submissions, there is still an effect on the following year's course. Since the previous year's solutions would all be available, the new batch of students start would need to be "different" from all previous iterations of the course. Of course, some "leaks" happen already, since students get solutions from their seniors, and there are already plenty of publicly available open source solutions out there.

In essence, what we need are courses that are unique each year (in each university), and still have "easy" enough exercises.

I'm ashamed to admit that in the end, in the face of these challenges, we decided that we would keep all submissions secret, and did not push an open source idea for this course.

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

You must be logged in to post comments.

Latest Thoughts

Archive

A problem with reproducible research

Comments

Leave a comment