Joint Generative-Discriminative Aggregation Model for Multi-Option Crowd Labels

Abstract

Although some crowdsourcing aggregation models have been introduced to aggregate noisy crowd labels, these models mostly consider single-option (i.e. discrete) crowd labels as the input variables, and are not compatible with multi-option (i.e. non-deterministic) crowd data. In this paper, we propose a novel joint generative-discriminative aggregation model, which is able to efficiently deal with both single-option and multi-option crowd labels. Considering the confidence of workers for each option as the input data, we first introduce a new discriminative aggregation model, called Constrained Weighted Majority Voting (CWMVL1), which improves the performance of majority voting method. CWMVL1 considers flexible reliability parameters for crowd workers, employs L1-norm loss function to deal with noisy crowd data, and includes optimization constraints to have probabilistic outputs. We prove that our object is convex, and derive an efficient optimization algorithm. Moreover, we integrate the discriminative CWMVL1 model with a generative model, resulting in a powerful joint aggregation model. Combination of these sub-models is obtained in a probabilistic framework rather than a heuristic way. For our joint model, we derive an efficient optimization algorithm, which alternates between updating the parameters and estimating the potential true labels. Experimental results indicate that the proposed aggregation models achieve superior or competitive results in comparison with the state-of-the-art models on single-option and multi-option crowd datasets, while having faster convergence rates and more reliable predictions.

Publication
ACM International Conference on Web Search and Data Mining (WSDM)
Date