Open Thoughts

What is a file?

Posted by Cheng Soon Ong on December 5, 2011

What is a file?

Two bits of news appeared recently:

  • The distribution of file sizes on the internet indicates that the human brain limits the amount of produced data. The article however observes that "it'll be interesting to see how machine intelligence might change this equation. It may be that machines can be designed to distort our relationship with information. If so, then a careful measure of file size distribution could reveal the first signs that intelligent machines are among us!"

  • Paul Allen's Institute has been publishing its data in an open fashion. Ironically, the article is behind a paywall. However, the Allen Institute for Brain Science has a data portal.

I wondered about the distribution of data which is clearly machine generated and in some sense most easily digested by machine as well. It turns out that it is quite difficult to find out how big files are. In some sense, for the brain atlas, the amount of data (>1 petabyte of image data alone) is more than is easily transferable across the internet. Most human users of this data would use some sort of web based visualization of the data, and hence the meaning of the word "file" isn't so obvious. In fact, there has been a recent trend to "hide" the concept of a file. One example is iPhones and iPads where you do not have access to the file system, and hence do not really know whether you are transfering parts of a file or streaming bytes. Another example is Google's AppEngine, where users access data through a database. A third example is Amazon's Silk browser which "renders" a web page in a more efficient fashion using Amazon's infrastructure rather than your local client.

If we take the extreme view that we use some sort of machine learning algorithm to filter the world's data for our consumption, this implies that all the world's data is in one "file", and we are just looking at parts of it. From this point of view, the paper about using file sizes to reveal machine intelligence is not going to work. In fact, thinking about file sizes in the first place is just plain misleading.


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.