<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>The mloss.org community blog</title><link>http://mloss.org/community</link><description>Some thoughts about machine learning open source software</description><language>en</language><lastBuildDate>Wed, 03 Sep 2008 17:25:43 -0000</lastBuildDate><item><title>New JMLR-MLOSS publication and progress updates for September 2008</title><link>http://mloss.org/community/blog/2008/sep/03/new-jmlr-mloss-publication-and-progress-updates-fo/</link><description>&lt;p&gt;Again almost two months have passed since the last progress report. Well as Cheng already &lt;a href="/community/blog/2008/aug/14/walking-the-walk/"&gt;posted&lt;/a&gt;, we finally took the time and made a slightly polished version of the &lt;a href="http://mloss.org"&gt;mloss.org&lt;/a&gt; source code &lt;a href="/software/view/132/"&gt;available&lt;/a&gt;. 
&lt;/p&gt;
&lt;p&gt;And the usual statistics follows, &lt;a href="http://mloss.org"&gt;mloss.org&lt;/a&gt; now has 235 registered users and 129 software projects.
&lt;/p&gt;
&lt;p&gt;Finally, the mloss project &lt;a href="http://jmlr.csail.mit.edu/papers/v9/fan08a.html" title="Liblinear"&gt;liblinear&lt;/a&gt; - a library to very train linear SVMs in very little time - got accepted in JMLR and we again &lt;a href="/software/view/61/"&gt;highlight&lt;/a&gt; the software interlinking it with the jmlr publication.
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Soeren Sonnenburg</dc:creator><pubDate>Wed, 03 Sep 2008 17:25:43 -0000</pubDate><guid>http://mloss.org/community/blog/2008/sep/03/new-jmlr-mloss-publication-and-progress-updates-fo/</guid></item><item><title>Software Freedom Law Center on GPL compliance</title><link>http://mloss.org/community/blog/2008/aug/22/software-freedom-law-center-on-gpl-compliance/</link><description>&lt;p&gt;The &lt;a href="http://www.softwarefreedom.org"&gt;Software Freedom Law Center&lt;/a&gt; has posted a &lt;a href="http://www.softwarefreedom.org/resources/2008/compliance-guide.html"&gt;guide&lt;/a&gt; on how to ensure that you do not violate the GNU Public License when using GPL'd software in your project. ArsTechnica also has a few &lt;a href="http://arstechnica.com/news.ars/post/20080822-new-guide-from-slfc-not-violating-the-gpl-for-dummies.html"&gt;comments&lt;/a&gt;. 
&lt;/p&gt;
&lt;p&gt;The guide might also come in very handy if you're legal department is eager to learn more about the implications of using open source software.
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mikio Braun</dc:creator><pubDate>Fri, 22 Aug 2008 16:47:14 -0000</pubDate><guid>http://mloss.org/community/blog/2008/aug/22/software-freedom-law-center-on-gpl-compliance/</guid></item><item><title>Wuala, social online storage</title><link>http://mloss.org/community/blog/2008/aug/15/wuala-social-online-storage/</link><description>&lt;p&gt;There was a small party on last night to celebrate the beta launch of &lt;a href="http://wua.la"&gt;Wuala&lt;/a&gt;, the latest in a long line of &lt;a href="http://en.wikipedia.org/wiki/Online_storage"&gt;online storage&lt;/a&gt; services. The idea of online storage is compelling: no need to synchronise all your different computers, somebody else takes care of you backup, easy to share data with others. However, the reality of the situation is that there is no free lunch, and for most people, the cost of online storage is prohibitive. There are several free services (for example the list &lt;a href="http://websearch.about.com/od/web20/a/online-storage.htm"&gt;here&lt;/a&gt;), but in general, you cannot just upload everything to the cloud and throw away your hard drive.
&lt;/p&gt;
&lt;p&gt;Wuala lets you store anything -- photos, videos, your latest paper -- for free, with no bandwidth or file size limits. What's the catch? You have to contribute storage, megabyte for megabyte, to the service. You get 1GB free to start with, but any extra space that you need, you have to plug in your own hard drive and offer it for them to add to the cloud. So, basically you convert your hard drive from a private one person device to a shared device with bits of data from everyone. Like &lt;a href="http://en.wikipedia.org/wiki/Google_File_System"&gt;GFS&lt;/a&gt;, it creates redundant copies of data and distributes them on commodity hardware, and in the case of Wuala, the commodity hardware is your hard drive and the data bus is the internet. When a user transfers data to and from Wuala, they push and pull P2P style from all the different hard drives of their members.
&lt;/p&gt;
&lt;p&gt;There are two ways to access Wuala, via a web browser and via an application that runs on your computer. The linux version of the application effectively needs the user to have root access to his box, since it calls for an fstab entry. So, for those linux users in academic environments who have centralized admins, this makes life difficult for you. The web browser interface uses &lt;a href="http://www.java.com/en/"&gt;java&lt;/a&gt;. Their website was a bit slow this morning when I tried it, so be patient with them.
&lt;/p&gt;
&lt;p&gt;Personally, for storage and backup, I think there are better ways to do it (e.g. buying an external hard drive, cloning my current laptop drive and leaving the external disk with a good friend that I meet regularly). However, if you are sharing data among collaborators, this seems like a wonderful thing to have. Each member of the team contributes some amount of disk space and bandwidth, and Voilà!
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cheng Soon Ong</dc:creator><pubDate>Fri, 15 Aug 2008 10:09:59 -0000</pubDate><guid>http://mloss.org/community/blog/2008/aug/15/wuala-social-online-storage/</guid></item><item><title>Walking the walk</title><link>http://mloss.org/community/blog/2008/aug/14/walking-the-walk/</link><description>&lt;p&gt;We have made the source of mloss.org available at:
   &lt;a href="http://mloss.org/software/view/132/"&gt;http://mloss.org/software/view/132/&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This site is based on &lt;a href="http://www.djangoproject.com"&gt;Django&lt;/a&gt;, and we have borrowed several components from other open source projects. We hope that by making the source of this site open, we can benefit other communities who also want to build a similar type of site. If you do build a site which lists open source software, and you have some projects which could be of interest to the machine learning community, please let us know. We would love to be able to regularly (automatically) update our site from external sources like what we are currently doing with CRAN (see the earlier &lt;a href="http://mloss.org/community/blog/2008/jun/24/interlinking-with-the-r-machine-learning-community/"&gt;blog&lt;/a&gt;).
&lt;/p&gt;
&lt;p&gt;Also, some personal communication from a disgruntled new user convinced us that we should have our &lt;a href="http://mloss.org/community/forum/"&gt;forum&lt;/a&gt; more clearly located. So, now we have added a new tab to our navigation bar. Hopefully we will have a more lively forum now that it is not "hidden away".
&lt;/p&gt;
&lt;p&gt;Finally, one plea to those budding python programmers out there who believe in the cause, &lt;strong&gt;please join the team&lt;/strong&gt;.
&lt;/p&gt;
&lt;p&gt;To those wondering where the headline comes from: &lt;a href="http://www.wsu.edu/~brians/errors/walk.html"&gt;http://www.wsu.edu/~brians/errors/walk.html&lt;/a&gt;
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cheng Soon Ong</dc:creator><pubDate>Thu, 14 Aug 2008 14:26:57 -0000</pubDate><guid>http://mloss.org/community/blog/2008/aug/14/walking-the-walk/</guid></item><item><title>Interoperability and the Curse of Polyglotism</title><link>http://mloss.org/community/blog/2008/aug/12/interoperability-and-the-curse-of-polyglotism/</link><description>&lt;p&gt;It seems that this homepage is steadily growing. We already have a
   large number of registered projects covering many different
   applications and machine learning methods. Time to think where we're
   heading with all of this.
&lt;/p&gt;
&lt;p&gt;I think one of the first goals of this whole endeavor is that you can easily find software to
   methods published elsewhere. Irrespective of whether you're interested
   in comparing your own method against some method, or if you actually
   want to apply the method to some real data, being able to find and
   download the software is a huge improvement with respect to having to
   re-implement the method based on the paper.
&lt;/p&gt;
&lt;p&gt;However, I think that ultimately it would be great if some form of
   interoperability between different software packages which address the
   same problem would evolve. In particular in a field as machine
   learning where the number of (abstract) problems is relatively slow,
   and there exist many competing methods for a given problem (like, for
   example, two-class classification on vectorial data), and being able
   to replace one of these methods easily with another one would be very
   useful.
&lt;/p&gt;
&lt;p&gt;The way to achieve this is, as everywhere else in the industry, to
   develop standards. Actually, there are many different level where such
   standards could be defined, ranging from web-services, over binary
   APIs to data file formats.
&lt;/p&gt;
&lt;p&gt;&lt;a href="http://mloss.org/community/blog/2008/may/24/some-thoughts-on-machine-learning-toolboxes/"&gt;A few week's ago&lt;/a&gt;, I advocated the use of modern scripting languages
   like python or ruby to develop new machine learning toolboxes, but
   actually with respect to interoperability, this "polyglotism" puts up
   some new problems. Back in the "old days" when people where mostly
   using compiled languages, making your software usable for others was a
   matter of creating a library which could then be linked against new
   programs. Differences in calling conventions aside, this approach was
   relatively flexible, for example, you could use a Fortran library in C
   or a C library in C++.
&lt;/p&gt;
&lt;p&gt;But if you use a scripting language like python, you can use that
   library only in python. You cannot like your C file against the python
   module, or import the module in another language like ruby. If you
   want to re-use some library in python in another language, you have to
   invest in some more infrastructure.
&lt;/p&gt;
&lt;p&gt;The hard way would be to set up a language-agnostic interface to your
   python code, for example by creating a web-service, or use some form
   of protocol like CORBA.
&lt;/p&gt;
&lt;p&gt;The low cost version would be to settle on a common data format. Then,
   you can in principle combine methods from different environments by
   storing intermediate results in files. It won't be fast, but it will
   work.
&lt;/p&gt;
&lt;p&gt;To support his approach, we have started a &lt;a href="http://mloss.org/community/standards/13/"&gt;discussion&lt;/a&gt; some
   time ago, where we have settled on the &lt;a href="http://weka.sourceforge.net/wekadoc/index.php/en:ARFF_%283.5.6%29"&gt;ARFF&lt;/a&gt; format as a possible
   starting point. Furthermore, we have &lt;a href="https://ml01.zrz.tu-berlin.de/trac/dataformat"&gt;started&lt;/a&gt; to write and/or compile
   code for reading and writing ARFF files for a large number of
   programming languages, such that you do not have to write the file
   format yourself.
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mikio Braun</dc:creator><pubDate>Tue, 12 Aug 2008 16:01:41 -0000</pubDate><guid>http://mloss.org/community/blog/2008/aug/12/interoperability-and-the-curse-of-polyglotism/</guid></item><item><title>Django 1.0</title><link>http://mloss.org/community/blog/2008/aug/06/django-10/</link><description>&lt;p&gt;The framework that mloss.org is based on, &lt;a href="http://www.djangoproject.com"&gt;django&lt;/a&gt;, is now approaching &lt;a href="http://www.djangoproject.com/weblog/2008/jul/29/updates/"&gt;version 1.0&lt;/a&gt;. So far, we have been using the SVN version of django.
&lt;/p&gt;
&lt;p&gt;So, of course we are planning to move to django version 1.0 when it become available, and depending on how much time we have maybe even track the betas. To all those silent users out there, please let us know if you find anything strange or wrong with mloss.org.
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cheng Soon Ong</dc:creator><pubDate>Wed, 06 Aug 2008 10:57:00 -0000</pubDate><guid>http://mloss.org/community/blog/2008/aug/06/django-10/</guid></item><item><title>Final Call for Comments: MLOSS NIPS*08 Workshop </title><link>http://mloss.org/community/blog/2008/jul/11/final-call-for-comments-mloss-nips08-workshop/</link><description>&lt;p&gt;This is a final call for comments regarding our &lt;a href="/workshop/nips08/"&gt;NIPS'08 MLOSS Workshop&lt;/a&gt; proposal, which we will be sending to the NIPS workshop organizers next thursday (July 17). 
&lt;/p&gt;
&lt;p&gt;As mentioned before we managed to secure a number of high profile invited speakers, like the author of &lt;a href="http://www.octave.org"&gt;octave&lt;/a&gt; - John W. Eaton and the author of &lt;a href="http://matplotlib.sf.net"&gt;matplotlib&lt;/a&gt; John D. Hunter. 
&lt;/p&gt;
&lt;p&gt;Apart from this we decided to have a discussion in the morning and in the afternoon, to discuss
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is a good mloss project?&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;
     Review criteria for JMLR mloss
 &lt;/li&gt;

 &lt;li&gt;
     Interoperable software
 &lt;/li&gt;

 &lt;li&gt;
     Test suites
 &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Reproducible research&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;
     data exchange standards
 &lt;/li&gt;

 &lt;li&gt;
     shall datasets be open too? How to provide access to data sets.
 &lt;/li&gt;

 &lt;li&gt;
     Reproducible research, the next level after UCI datasets
 &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally we invite others of mloss software to present their projects. This time submission will be done in a radically new way, i.e. to submit:
&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;
     Tag your mloss.org project with the tag nips2008
 &lt;/li&gt;

 &lt;li&gt;
     Ensure that you have a good description (limited to 500 words)
 &lt;/li&gt;

 &lt;li&gt;
     Any bells and whistles can be put on your own project page, and of course provide this link on mloss.org
 &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We very much invite feedback and are looking for active co-organizers too!
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Soeren Sonnenburg</dc:creator><pubDate>Fri, 11 Jul 2008 15:25:38 -0000</pubDate><guid>http://mloss.org/community/blog/2008/jul/11/final-call-for-comments-mloss-nips08-workshop/</guid></item><item><title>New JMLR-MLOSS publication and progress updates for July 2008</title><link>http://mloss.org/community/blog/2008/jul/07/new-jmlr-mloss-publication-and-progress-updates-fo/</link><description>&lt;p&gt;Almost two months have passed since the last progress report. Well the biggest news is the recent &lt;a href="/community/blog/2008/jun/24/interlinking-with-the-r-machine-learning-community/"&gt;pulling of R machine learning packages&lt;/a&gt;. This lead to 35 additional projects on &lt;a href="http://mloss.org"&gt;mloss.org&lt;/a&gt; and we are now at 120 projects and 224 registered users at &lt;a href="http://mloss.org"&gt;mloss.org&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;We also made a lot of progress regarding the upcoming &lt;a href="/workshop/nips08/"&gt;NIPS'08 MLOSS Workshop&lt;/a&gt; proposal and managed to secure a number of high profile invited speakers, like the author of &lt;a href="http://www.octave.org"&gt;octave&lt;/a&gt; - John W. Eaton and the author of &lt;a href="http://matplotlib.sf.net"&gt;matplotlib&lt;/a&gt; John D. Hunter, as well as the program committee. In case you have suggestions - let us know! We will otherwise submit the proposal in the next weeks. Although we planned to have mloss.org t-shirts at &lt;a href="http://icml2008.cs.helsinki.fi/"&gt;ICML'08&lt;/a&gt; it remained unclear whether we are being reserved a table to distribute them. We therefore decided to postpone t-shirts for &lt;a href="http://nips.cc"&gt;NIPS'08&lt;/a&gt; again. After all it makes a lot of sense to distribute them there in case we get the workshop accepted :-).
&lt;/p&gt;
&lt;p&gt;Finally, the &lt;a href="http://jmlr.csail.mit.edu/papers/v9/igel08a.html" title="SHARK"&gt;SHARK C++ Machine Learning Library&lt;/a&gt; got accepted in JMLR. We again &lt;a href="/software/view/70/"&gt;highlight&lt;/a&gt; the software interlinking it with the jmlr publication. Note that SHARK in contrast to &lt;a href="/software/view/65/"&gt;LWPR&lt;/a&gt; is the first full fledged toolbox - implementing more than just a single algorithm - that got accepted.
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Soeren Sonnenburg</dc:creator><pubDate>Mon, 07 Jul 2008 05:59:21 -0000</pubDate><guid>http://mloss.org/community/blog/2008/jul/07/new-jmlr-mloss-publication-and-progress-updates-fo/</guid></item><item><title>Interlinking with the R Machine Learning Community</title><link>http://mloss.org/community/blog/2008/jun/24/interlinking-with-the-r-machine-learning-community/</link><description>&lt;p&gt;When it is about scientific computing, one of the best organized and experiences open source communities is the &lt;a href="http://www.r-project.org" title="R"&gt;R&lt;/a&gt; community. They, already a long time ago managed to develop a free alternative to S. Nowadays the R community offers a wide variety of well categorized packages. We are proud to announce that with the help of Torsten Hothorn, Kurt Hornik and Achim Zeileis we are now automagically listing packages from the &lt;a href="http://cran.r-project.org/web/views/MachineLearning.html" title="R-cran"&gt;R-cran&lt;/a&gt; machine learning section.
&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Soeren Sonnenburg</dc:creator><pubDate>Tue, 24 Jun 2008 23:30:48 -0000</pubDate><guid>http://mloss.org/community/blog/2008/jun/24/interlinking-with-the-r-machine-learning-community/</guid></item><item><title>10 from 133</title><link>http://mloss.org/community/blog/2008/jun/17/10-from-133/</link><description>&lt;p&gt;There was a paper at the beginning of this year by &lt;a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;amp;_udi=B6VJ1-4R05HXW-2&amp;amp;_user=10&amp;amp;_rdoc=1&amp;amp;_fmt=&amp;amp;_orig=search&amp;amp;_sort=d&amp;amp;view=c&amp;amp;_acct=C000050221&amp;amp;_version=1&amp;amp;_urlVersion=0&amp;amp;_userid=10&amp;amp;md5=815de80415439b9d78428fe08565fcde"&gt;Budden et. al. A&lt;/a&gt;
   who looked at double blind reviews, and claimed that double blind reviews increases the proportion of accepted papers with female first authors. Soon after, &lt;a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;amp;_udi=B6VJ1-4SD36MP-2&amp;amp;_user=10&amp;amp;_rdoc=1&amp;amp;_fmt=&amp;amp;_orig=search&amp;amp;_sort=d&amp;amp;view=c&amp;amp;_acct=C000050221&amp;amp;_version=1&amp;amp;_urlVersion=0&amp;amp;_userid=10&amp;amp;md5=2cf38de9bc7f55b6c8dac983f5e9839b"&gt;Webb et. al.&lt;/a&gt;
   responded that actually the trend is true for other (non double blind) journals too. Recently, &lt;a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;amp;_udi=B6VJ1-4SD36MP-1&amp;amp;_user=10&amp;amp;_rdoc=1&amp;amp;_fmt=&amp;amp;_orig=search&amp;amp;_sort=d&amp;amp;view=c&amp;amp;_acct=C000050221&amp;amp;_version=1&amp;amp;_urlVersion=0&amp;amp;_userid=10&amp;amp;md5=472015bc151115ceb81decea90f4455f"&gt;Budden et. al. B&lt;/a&gt;
   reanalysed the data, and rebutted the rebuttal.
&lt;/p&gt;
&lt;p&gt;The blog article at &lt;a href="http://sandwalk.blogspot.com/2008/06/bias-against-female-first-author-papers.html"&gt;Sandwalk&lt;/a&gt; looks at this issue in more detail.
&lt;/p&gt;
&lt;p&gt;But, here at mloss, we have no review process (yet), and there is no bias against women. Or is there? 
   Out of the 133 author names listed on &lt;a href="http://mloss.org/software/author/"&gt;mloss.org&lt;/a&gt; my guess is there are 10 women. Mind you, my ability to judge whether a name is from a guy or a girl is not 100% correct, but I think the estimate is pretty good.
   Where are all the women who write mloss? The fact remains that less than 10% of the authors of projects that appear on mloss are women. What do you think? Why is this the case?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;
     Budden, A., Tregenza, T., Aarssen, L., Koricheva, J., Leimu, R. and Lortie, C. (2008A) Women, Science and Writing. Trends in Ecology &amp;amp; Evolution, 23(1), 4-6.
 &lt;/li&gt;

 &lt;li&gt;
     Budden, A.E., Lortie, C.J., Tregenza, T., Aarssen, L., Koricheva, J., and Leimu, R. (2008B) Response to Webb et al.: Double-blind review: accept with minor revisions. Trends in Ecology and Evolution
 &lt;/li&gt;

 &lt;li&gt;
     Webb, T. J., O'Hara, B. and Freckleton, R. P. (2008) Does double-blind review benefit female authors? Trends in Ecology and Evolution
 &lt;/li&gt;
&lt;/ul&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cheng Soon Ong</dc:creator><pubDate>Tue, 17 Jun 2008 11:04:55 -0000</pubDate><guid>http://mloss.org/community/blog/2008/jun/17/10-from-133/</guid></item></channel></rss>