Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning.

U‐Net deep learning digitized herbarium specimens ferns machine learning semantic segmentation

Journal

Applications in plant sciences
ISSN: 2168-0450
Titre abrégé: Appl Plant Sci
Pays: United States
ID NLM: 101590473

Informations de publication

Date de publication:
Jun 2020
Historique:
received: 03 10 2019
accepted: 03 02 2020
entrez: 7 7 2020
pubmed: 7 7 2020
medline: 7 7 2020
Statut: epublish

Résumé

Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting practices involved. Here, we develop a workflow and data set of high-resolution image masks to segment plant tissues in herbarium specimen images and remove background pixels using deep learning. We generated 400 curated, high-resolution masks of ferns using a combination of automatic and manual tools for image manipulation. We used those images to train a U-Net-style deep learning model for image segmentation, achieving a final Sørensen-Dice coefficient of 0.96. The resulting model can automatically, efficiently, and accurately segment massive data sets of digitized herbarium specimens, particularly for ferns. The application of deep learning in herbarium sciences requires transparent and systematic protocols for generating training data so that these labor-intensive resources can be generalized to other deep learning applications. Segmentation ground-truth masks are hard-won data, and we share these data and the model openly in the hopes of furthering model training and transfer learning opportunities for broader herbarium applications.

Identifiants

pubmed: 32626607
doi: 10.1002/aps3.11352
pii: APS311352
pmc: PMC7328659
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e11352

Informations de copyright

© 2020 The Authors. Applications in Plant Sciences is published by Wiley Periodicals, LLC on behalf of the Botanical Society of America.

Références

BMC Bioinformatics. 2017 Nov 29;18(1):529
pubmed: 29187165
Appl Plant Sci. 2019 Mar 20;7(3):e01233
pubmed: 30937225
Nat Ecol Evol. 2017 Dec;1(12):1876-1882
pubmed: 29109468
Trends Plant Sci. 2016 Feb;21(2):110-124
pubmed: 26651918
Am Nat. 2011 Nov;178(5):596-601
pubmed: 22030729
Am J Bot. 2015 Oct;102(10):1599-609
pubmed: 26451038
Biodivers Data J. 2017 Nov 02;(5):e21139
pubmed: 29200929
New Phytol. 2018 Jan;217(2):939-955
pubmed: 29083043
Front Plant Sci. 2013 Sep 04;4:345
pubmed: 24027574
Nat Methods. 2012 Jul;9(7):671-5
pubmed: 22930834
Appl Plant Sci. 2014 Jul 09;2(7):
pubmed: 25202639
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
BMC Evol Biol. 2016 Nov 16;16(1):248
pubmed: 27852219
Philos Trans R Soc Lond B Biol Sci. 2018 Nov 19;374(1763):
pubmed: 30455211
PeerJ. 2017 Dec 1;5:e4088
pubmed: 29209576
IEEE Trans Med Imaging. 2016 May;35(5):1252-1261
pubmed: 27046893
Trends Ecol Evol. 2017 Jul;32(7):531-546
pubmed: 28465044
Front Plant Sci. 2019 Apr 24;10:508
pubmed: 31068958
Glob Chang Biol. 2018 Dec;24(12):5972-5984
pubmed: 30218548

Auteurs

Alexander E White (AE)

Data Science Lab Office of the Chief Information Officer Smithsonian Institution Washington D.C. USA.
Department of Botany National Museum of Natural History Smithsonian Institution Washington D.C. USA.

Rebecca B Dikow (RB)

Data Science Lab Office of the Chief Information Officer Smithsonian Institution Washington D.C. USA.

Makinnon Baugh (M)

Department of Plant and Wildlife Sciences Brigham Young University Provo Utah USA.

Abigail Jenkins (A)

Department of Plant and Wildlife Sciences Brigham Young University Provo Utah USA.

Paul B Frandsen (PB)

Data Science Lab Office of the Chief Information Officer Smithsonian Institution Washington D.C. USA.
Department of Plant and Wildlife Sciences Brigham Young University Provo Utah USA.

Classifications MeSH