Open Thoughts

October 2012 archive

Predict Elections with Twitter

October 12, 2012

In a rather self deprecating title "I wanted to Predict Elections with Twitter and all I got was this Lousy Paper" Daniel Gayo-Avello takes us on a tour of how hard it is to do reproducible research, and how often authors take short cuts. From the abstract:

"Predicting X from Twitter is a popular fad within the Twitter research subculture. It seems both appealing and relatively easy. Among such kind of studies, electoral prediction is maybe the most attractive, and at this moment there is a growing body of literature on such a topic. This is not only an interesting research problem but, above all, it is extremely difficult. However, most of the authors seem to be more interested in claiming positive results than in providing sound and reproducible methods."

It is an interesting survey of papers that use Twitter data.

He lists some flaws in current research on electoral predictions, but they are generally applicable to any machine learning paper (my comments in brackets):

  1. It's not prediction at all! I have not found a single paper predicting a future result. (Neither is bootstrap nor cross validation prediction)
  2. Chance is not a valid baseline...
  3. There is not a commonly accepted way of "counting votes" in Twitter
  4. There is not a commonly accepted way of interpreting reality! (In supervised learning, we tend to ignore the fact that there is no ground truth in reality.)
  5. Sentiment analysis is applied as a black-box... (As machine learning algorithm get more complex, more people will tend to use machine learning software as a black box)
  6. All the tweets are assumed to be trustworthy. (I don't know if anybody is doing adversarial election prediction)
  7. Demographics are neglected. (The biased sample problem)
  8. Self-selection bias.

The window is closing on those who want to predict the upcoming US elections from X.