Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2021
Historique:
received: 05 07 2019
accepted: 13 01 2021
entrez: 24 3 2021
pubmed: 25 3 2021
medline: 27 8 2021
Statut: epublish

Résumé

The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.

Identifiants

pubmed: 33760822
doi: 10.1371/journal.pone.0246099
pii: PONE-D-19-18843
pmc: PMC7990268
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0246099

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

PLoS Comput Biol. 2015 Oct 22;11(10):e1004525
pubmed: 26492633
Database (Oxford). 2017 Jan 1;2017:
pubmed: 29220453
Nat Ecol Evol. 2018 Jul;2(7):1093-1103
pubmed: 29915341
Br J Math Stat Psychol. 2008 May;61(Pt 1):29-48
pubmed: 18482474
BMC Bioinformatics. 2005;6 Suppl 1:S6
pubmed: 15960840
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W783-6
pubmed: 15980585
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2
pubmed: 25810773
Database (Oxford). 2017 Jan 1;2017:
pubmed: 29220451
PLoS One. 2017 Oct 12;12(10):e0186170
pubmed: 29023519
BMC Bioinformatics. 2018 Feb 06;19(1):34
pubmed: 29409442
PLoS Comput Biol. 2014 Apr 24;10(4):e1003542
pubmed: 24763340
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Trends Ecol Evol. 2016 Sep;31(9):711-719
pubmed: 27461041
Nat Ecol Evol. 2018 Mar;2(3):420-426
pubmed: 29453350
PLoS Comput Biol. 2016 Oct 20;12(10):e1005097
pubmed: 27764088
Sci Data. 2019 Apr 10;6(1):28
pubmed: 30971690
PLoS Comput Biol. 2013;9(2):e1002854
pubmed: 23408875
Nat Biotechnol. 2019 Apr;37(4):358-367
pubmed: 30940948
Bioinformatics. 2011 Oct 1;27(19):2721-9
pubmed: 21828087
Commun Biol. 2020 Aug 28;3(1):474
pubmed: 32859925
Nat Biotechnol. 2008 Aug;26(8):889-96
pubmed: 18688244
Adv Bioinformatics. 2012;2012:391574
pubmed: 22685456
Biometrics. 1977 Mar;33(1):159-74
pubmed: 843571

Auteurs

Felicitas Löffler (F)

Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany.

Valentin Wesp (V)

Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany.

Birgitta König-Ries (B)

Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany.
Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, Germany.
German Center for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany.

Friederike Klan (F)

Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, Germany.
Citizen Science Group, DLR-Institute of Data Science, German Aerospace Center, Jena, Germany.

Articles similaires

Lakes Salinity Archaea Bacteria Microbiota
Rivers Turkey Biodiversity Environmental Monitoring Animals

Insect diversity estimation in polarimetric lidar.

Dolores Bernenko, Meng Li, Hampus Månefjord et al.
1.00
Animals Biodiversity Insecta Algorithms Cluster Analysis
1.00
Animals Fishes Biodiversity Conservation of Natural Resources Ecosystem

Classifications MeSH