A game theory approach for estimating reliability of crowdsourced relevance assessments

Moshfeghi, Yashar and Huertas-Rosero, Alvaro Francisco (2022) A game theory approach for estimating reliability of crowdsourced relevance assessments. ACM Transactions on Information Systems, 40 (3). pp. 1-29. ISSN 1046-8188 (https://doi.org/10.1145/3480965)

[thumbnail of Moshfegi-etal-Huertas-Rosero-ACMTIS-2021-A-game-theory-approach-for-estimating-reliability-of-crowdsourced-relevance-assessments]
Text. Filename: Moshfegi_etal_Huertas_Rosero_ACMTIS_2021_A_game_theory_approach_for_estimating_reliability_of_crowdsourced_relevance_assessments.pdf
Accepted Author Manuscript
License: Strathprints license 1.0

Download (487kB)| Preview


In this article, we propose an approach to improve quality in crowdsourcing (CS) tasks using Task Completion Time (TCT) as a source of information about the reliability of workers in a game-theoretical competitive scenario. Our approach is based on the hypothesis that some workers are more risk-inclined and tend to gamble with their use of time when put to compete with other workers. This hypothesis is supported by our previous simulation study. We test our approach with 35 topics from experiments on the TREC-8 collection being assessed as relevant or non-relevant by crowdsourced workers both in a competitive (referred to as "Game") and non-competitive (referred to as "Base") scenario. We find that competition changes the distributions of TCT, making them sensitive to the quality (i.e., wrong or right) and outcome (i.e., relevant or non-relevant) of the assessments. We also test an optimal function of TCT as weights in a weighted majority voting scheme. From probabilistic considerations, we derive a theoretical upper bound for the weighted majority performance of cohorts of 2, 3, 4, and 5 workers, which we use as a criterion to evaluate the performance of our weighting scheme. We find our approach achieves a remarkable performance, significantly closing the gap between the accuracy of the obtained relevance judgements and the upper bound. Since our approach takes advantage of TCT, which is an available quantity in any CS tasks, we believe it is cost-effective and, therefore, can be applied for quality assurance in crowdsourcing for micro-tasks.