sparkcrowd 0.1.5

Enrique G. Rodrigo, Juan A. Aledo, Jose A. Gamez — Wed, 13 Dec 2017 13:13:35 -0000

The use of crowdsourcing for labelling data for machine learning introduces several complications: the annotators may not understand the problem correctly, they may not have the expertise required, they may be random annotators or even try to deteriorate the results. To learn from this labels in contexts of Big Data, practitioners need to take into consideration, in some way, the quality of the annotators labelling the data, as these is crucial when the annotations are scarce. This package implements several methods for dealing with this situations using Apache Spark, to facilitate the transition to big scale problems.

mloss.org sparkcrowd

sparkcrowd 0.1.5