Open Thoughts

June 2008 archive

Interlinking with the R Machine Learning Community

June 24, 2008

When it is about scientific computing, one of the best organized and experiences open source communities is the R community. They, already a long time ago managed to develop a free alternative to S. Nowadays the R community offers a wide variety of well categorized packages. We are proud to announce that with the help of Torsten Hothorn, Kurt Hornik and Achim Zeileis we are now automagically listing packages from the R-cran machine learning section.

10 from 133

June 17, 2008

There was a paper at the beginning of this year by Budden et. al. A who looked at double blind reviews, and claimed that double blind reviews increases the proportion of accepted papers with female first authors. Soon after, Webb et. al. responded that actually the trend is true for other (non double blind) journals too. Recently, Budden et. al. B reanalysed the data, and rebutted the rebuttal.

The blog article at Sandwalk looks at this issue in more detail.

But, here at mloss, we have no review process (yet), and there is no bias against women. Or is there? Out of the 133 author names listed on my guess is there are 10 women. Mind you, my ability to judge whether a name is from a guy or a girl is not 100% correct, but I think the estimate is pretty good. Where are all the women who write mloss? The fact remains that less than 10% of the authors of projects that appear on mloss are women. What do you think? Why is this the case?


  • Budden, A., Tregenza, T., Aarssen, L., Koricheva, J., Leimu, R. and Lortie, C. (2008A) Women, Science and Writing. Trends in Ecology & Evolution, 23(1), 4-6.
  • Budden, A.E., Lortie, C.J., Tregenza, T., Aarssen, L., Koricheva, J., and Leimu, R. (2008B) Response to Webb et al.: Double-blind review: accept with minor revisions. Trends in Ecology and Evolution
  • Webb, T. J., O'Hara, B. and Freckleton, R. P. (2008) Does double-blind review benefit female authors? Trends in Ecology and Evolution

Data repositories

June 12, 2008

I read an interesting blog at Science in the open about the problems he has about institutional repositories.

"But the key thing is that all of this should be done automatically and must not require intervention by the author. Nothing drives me up the wall more than having to put the same set of data into two subtly different systems more than once."

I think this is not limited to institutional repositories. This is in general true for all repositories. While web forms are nice, it is extremely irritating for a researcher to manually upload data and manually fill out information more than once. The question is how to automate the distribution of data and metadata once it has been manually included somewhere?

Maybe this is all a pipedream, but would it not be possible to have some way of reconstructing metadata by the way the data is used and accessed (say based on web links)? Of course, if we are trawling the web and slurping up data, how do we know what is open access and what is not? One of the comments in the blog above mentioned Romeo which is a list of open access journals. Would this also work for open data? From the same people (eprints, which incidentally powers pascal networks' eprints), we get some examples of how other repositories can be built, for example data repositories.

NIPS*08 Deadline Fever

June 7, 2008

Does this picture look familiar to you?

NIPS Server Load

Well it is over now, but hey why are people always last minute (and I am obviously no exception here)? Which reminds me that for the planned nips workshop wouldn't it be a good idea to use as the submission system, i.e. instead of receiving emails from people ask contributors to announce their project at including a reasonable description and setting a nips08 tag? And hey who knows if is capable of dealing with that load :-)

Style checking in python

June 5, 2008

Python is an interpreted language, and hence there are some bugs which only get caught at run time. I recently had a discussion about how irritating it is that programs crash due to errors which can easily be caught at "compile time". My view is that compilation is something that should be transparent to the programmer, and one should still be able to catch all these silly errors while coding. Of course, there is already lots of work in programming languages for this. Paradoxically, most of the concepts were developed for compiled languages.

From wikipedia:

"In computer programming, lint was the name originally given to a particular program that flagged suspicious and non-portable constructs (likely to be bugs) in C language source code. The term is now applied generically to tools that flag suspicious usage in software written in any computer language. The term lint-like behavior is sometimes applied to the process of flagging suspicious language usage. lint-like tools generally perform static analysis of source code."

For python, I found three projects which seemed well supported:

Does anyone know of other style checkers? Are there any user experiences out there?