Human-supervised clustering of multidimensional data using crowdsourcing.

crowdsourcing data clustering games human-computing

Journal

Royal Society open science
ISSN: 2054-5703
Titre abrégé: R Soc Open Sci
Pays: England
ID NLM: 101647528

Informations de publication

Date de publication:
May 2022
Historique:
received: 15 07 2021
accepted: 29 04 2022
entrez: 27 5 2022
pubmed: 28 5 2022
medline: 28 5 2022
Statut: epublish

Résumé

Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.

Identifiants

pubmed: 35620007
doi: 10.1098/rsos.211189
pii: rsos211189
pmc: PMC9128850
doi:

Banques de données

figshare
['10.6084/m9.figshare.c.5994902']
Dryad
['10.5061/dryad.qv9s4mwh4']

Types de publication

Journal Article

Langues

eng

Pagination

211189

Informations de copyright

© 2022 The Authors.

Références

Proc Natl Acad Sci U S A. 2013 Dec 24;110(52):20935-40
pubmed: 24277835
PLoS One. 2013 Jul 31;8(7):e69958
pubmed: 23936126
Proc Natl Acad Sci U S A. 2006 Jun 6;103(23):8577-82
pubmed: 16723398
Front Artif Intell. 2021 Sep 29;4:667963
pubmed: 34661095
Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Feb;75(2 Pt 2):027105
pubmed: 17358454
Nat Biotechnol. 2018 Oct;36(9):820-828
pubmed: 30125267
PLoS One. 2019 Jan 15;14(1):e0210236
pubmed: 30645617
Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Nov;80(5 Pt 2):056117
pubmed: 20365053
Nat Biotechnol. 2020 Oct;38(10):1124-1126
pubmed: 32973359
R Soc Open Sci. 2022 May 24;9(5):211189
pubmed: 35620007
Eur J Immunol. 2019 Oct;49(10):1457-1973
pubmed: 31633216
Perspect Psychol Sci. 2011 Jan;6(1):3-5
pubmed: 26162106
Nature. 2010 Aug 5;466(7307):756-60
pubmed: 20686574
PLoS One. 2012;7(3):e31362
pubmed: 22412834

Auteurs

Alexander Butyaev (A)

School of Computer Science, McGill University, Montréal, Canada.

Chrisostomos Drogaris (C)

School of Computer Science, McGill University, Montréal, Canada.

Olivier Tremblay-Savard (O)

Department of Computer Science, University of Manitoba, Winnipeg, Canada.

Jérôme Waldispühl (J)

School of Computer Science, McGill University, Montréal, Canada.

Classifications MeSH