On the impact of Citizen Science-derived data quality on deep learning based classification in marine images.
Animals
Aquatic Organisms
Arthropods
/ anatomy & histology
Citizen Science
/ methods
Cnidaria
/ anatomy & histology
Deep Learning
Echinodermata
/ anatomy & histology
Humans
Image Processing, Computer-Assisted
/ statistics & numerical data
Imaging, Three-Dimensional
Marine Biology
/ instrumentation
Mollusca
/ anatomy & histology
Porifera
/ anatomy & histology
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2019
2019
Historique:
received:
19
10
2018
accepted:
25
05
2019
entrez:
13
6
2019
pubmed:
13
6
2019
medline:
15
2
2020
Statut:
epublish
Résumé
The evaluation of large amounts of digital image data is of growing importance for biology, including for the exploration and monitoring of marine habitats. However, only a tiny percentage of the image data collected is evaluated by marine biologists who manually interpret and annotate the image contents, which can be slow and laborious. In order to overcome the bottleneck in image annotation, two strategies are increasingly proposed: "citizen science" and "machine learning". In this study, we investigated how the combination of citizen science, to detect objects, and machine learning, to classify megafauna, could be used to automate annotation of underwater images. For this purpose, multiple large data sets of citizen science annotations with different degrees of common errors and inaccuracies observed in citizen science data were simulated by modifying "gold standard" annotations done by an experienced marine biologist. The parameters of the simulation were determined on the basis of two citizen science experiments. It allowed us to analyze the relationship between the outcome of a citizen science study and the quality of the classifications of a deep learning megafauna classifier. The results show great potential for combining citizen science with machine learning, provided that the participants are informed precisely about the annotation protocol. Inaccuracies in the position of the annotation had the most substantial influence on the classification accuracy, whereas the size of the marking and false positive detections had a smaller influence.
Identifiants
pubmed: 31188894
doi: 10.1371/journal.pone.0218086
pii: PONE-D-18-30321
pmc: PMC6561570
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0218086Déclaration de conflit d'intérêts
The GPU donation from NVIDIA Corporation does not introduce a competing interest. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Références
Proc Biol Sci. 2013 Nov 06;280(1773):20131684
pubmed: 24197407
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
PLoS One. 2018 Sep 18;13(9):e0203827
pubmed: 30226871
PLoS One. 2016 Jan 14;11(1):e0147152
pubmed: 26766577
PLoS One. 2017 Feb 8;12(2):e0171750
pubmed: 28178346
PLoS One. 2018 Nov 16;13(11):e0207498
pubmed: 30444917
PLoS One. 2012;7(6):e38179
pubmed: 22719868
Nat Biotechnol. 2018 Oct;36(9):820-828
pubmed: 30125267