Assessing the difficulty of annotating medical data in crowdworking with help of experiments.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2021
2021
Historique:
received:
28
09
2020
accepted:
02
07
2021
entrez:
29
7
2021
pubmed:
30
7
2021
medline:
4
11
2021
Statut:
epublish
Résumé
As healthcare-related data proliferate, there is need to annotate them expertly for the purposes of personalized medicine. Crowdworking is an alternative to expensive expert labour. Annotation corresponds to diagnosis, so comparing unlabeled records to labeled ones seems more appropriate for crowdworkers without medical expertise. We modeled the comparison of a record to two other records as a triplet annotation task, and we conducted an experiment to investigate to what extend sensor-measured stress, task duration, uncertainty of the annotators and agreement among the annotators could predict annotation correctness. We conducted an annotation experiment on health data from a population-based study. The triplet annotation task was to decide whether an individual was more similar to a healthy one or to one with a given disorder. We used hepatic steatosis as example disorder, and described the individuals with 10 pre-selected characteristics related to this disorder. We recorded task duration, electro-dermal activity as stress indicator, and uncertainty as stated by the experiment participants (n = 29 non-experts and three experts) for 30 triplets. We built an Artificial Similarity-Based Annotator (ASBA) and compared its correctness and uncertainty to that of the experiment participants. We found no correlation between correctness and either of stated uncertainty, stress and task duration. Annotator agreement has not been predictive either. Notably, for some tasks, annotators agreed unanimously on an incorrect annotation. When controlling for Triplet ID, we identified significant correlations, indicating that correctness, stress levels and annotation duration depend on the task itself. Average correctness among the experiment participants was slightly lower than achieved by ASBA. Triplet annotation turned to be similarly difficult for experts as for non-experts. Our lab experiment indicates that the task of triplet annotation must be prepared cautiously if delegated to crowdworkers. Neither certainty nor agreement among annotators should be assumed to imply correct annotation, because annotators may misjudge difficult tasks as easy and agree on incorrect annotations. Further research is needed to improve visualizations for complex tasks, to judiciously decide how much information to provide, Out-of-the-lab experiments in crowdworker setting are needed to identify appropriate designs of a human-annotation task, and to assess under what circumstances non-human annotation should be preferred.
Sections du résumé
BACKGROUND
As healthcare-related data proliferate, there is need to annotate them expertly for the purposes of personalized medicine. Crowdworking is an alternative to expensive expert labour. Annotation corresponds to diagnosis, so comparing unlabeled records to labeled ones seems more appropriate for crowdworkers without medical expertise. We modeled the comparison of a record to two other records as a triplet annotation task, and we conducted an experiment to investigate to what extend sensor-measured stress, task duration, uncertainty of the annotators and agreement among the annotators could predict annotation correctness.
MATERIALS AND METHODS
We conducted an annotation experiment on health data from a population-based study. The triplet annotation task was to decide whether an individual was more similar to a healthy one or to one with a given disorder. We used hepatic steatosis as example disorder, and described the individuals with 10 pre-selected characteristics related to this disorder. We recorded task duration, electro-dermal activity as stress indicator, and uncertainty as stated by the experiment participants (n = 29 non-experts and three experts) for 30 triplets. We built an Artificial Similarity-Based Annotator (ASBA) and compared its correctness and uncertainty to that of the experiment participants.
RESULTS
We found no correlation between correctness and either of stated uncertainty, stress and task duration. Annotator agreement has not been predictive either. Notably, for some tasks, annotators agreed unanimously on an incorrect annotation. When controlling for Triplet ID, we identified significant correlations, indicating that correctness, stress levels and annotation duration depend on the task itself. Average correctness among the experiment participants was slightly lower than achieved by ASBA. Triplet annotation turned to be similarly difficult for experts as for non-experts.
CONCLUSION
Our lab experiment indicates that the task of triplet annotation must be prepared cautiously if delegated to crowdworkers. Neither certainty nor agreement among annotators should be assumed to imply correct annotation, because annotators may misjudge difficult tasks as easy and agree on incorrect annotations. Further research is needed to improve visualizations for complex tasks, to judiciously decide how much information to provide, Out-of-the-lab experiments in crowdworker setting are needed to identify appropriate designs of a human-annotation task, and to assess under what circumstances non-human annotation should be preferred.
Identifiants
pubmed: 34324540
doi: 10.1371/journal.pone.0254764
pii: PONE-D-20-29653
pmc: PMC8321104
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0254764Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
J Glob Health. 2017 Jun;7(1):011003
pubmed: 28686749
PeerJ. 2019 Apr 12;7:e6762
pubmed: 30997295
Hepatology. 2016 Jul;64(1):73-84
pubmed: 26707365
Science. 2020 May 8;368(6491):
pubmed: 32234805
J Biomed Inform. 2016 Jun;61:44-54
pubmed: 27016383
Cell Mol Life Sci. 2018 Sep;75(18):3313-3327
pubmed: 29936596
J Biomed Inform. 2017 May;69:86-92
pubmed: 28389234
Biometrics. 2003 Dec;59(4):762-9
pubmed: 14969453
Radiology. 2012 Mar;262(3):824-33
pubmed: 22274839
Hepatology. 2012 Jun;55(6):2005-23
pubmed: 22488764
Radiology. 2012 Oct;265(1):133-42
pubmed: 22923718
Nature. 2010 Aug 5;466(7307):756-60
pubmed: 20686574
Infect Dis Poverty. 2020 Jan 20;9(1):8
pubmed: 31959234
J Gen Intern Med. 2014 Jan;29(1):187-203
pubmed: 23843021
AJR Am J Roentgenol. 2020 Jun;214(6):1316-1320
pubmed: 32208006
Int J Epidemiol. 2011 Apr;40(2):294-307
pubmed: 20167617
J Magn Reson Imaging. 2014 Jun;39(6):1494-501
pubmed: 24123655
J Hosp Med. 2014 Jul;9(7):451-6
pubmed: 24740747
J Virus Erad. 2017 Oct 1;3(4):223-228
pubmed: 29057087
PLoS Med. 2010 Dec 07;7(12):e1000376
pubmed: 21151888
Clin Radiol. 2019 Jul;74(7):539-546
pubmed: 30955836
Nat Biotechnol. 2012 Jan 22;30(2):190-2
pubmed: 22267011
Am J Gastroenterol. 2014 Sep;109(9):1404-14
pubmed: 24957156