Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2021
2021
Historique:
received:
05
07
2019
accepted:
13
01
2021
entrez:
24
3
2021
pubmed:
25
3
2021
medline:
27
8
2021
Statut:
epublish
Résumé
The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.
Identifiants
pubmed: 33760822
doi: 10.1371/journal.pone.0246099
pii: PONE-D-19-18843
pmc: PMC7990268
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0246099Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
PLoS Comput Biol. 2015 Oct 22;11(10):e1004525
pubmed: 26492633
Database (Oxford). 2017 Jan 1;2017:
pubmed: 29220453
Nat Ecol Evol. 2018 Jul;2(7):1093-1103
pubmed: 29915341
Br J Math Stat Psychol. 2008 May;61(Pt 1):29-48
pubmed: 18482474
BMC Bioinformatics. 2005;6 Suppl 1:S6
pubmed: 15960840
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W783-6
pubmed: 15980585
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2
pubmed: 25810773
Database (Oxford). 2017 Jan 1;2017:
pubmed: 29220451
PLoS One. 2017 Oct 12;12(10):e0186170
pubmed: 29023519
BMC Bioinformatics. 2018 Feb 06;19(1):34
pubmed: 29409442
PLoS Comput Biol. 2014 Apr 24;10(4):e1003542
pubmed: 24763340
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Trends Ecol Evol. 2016 Sep;31(9):711-719
pubmed: 27461041
Nat Ecol Evol. 2018 Mar;2(3):420-426
pubmed: 29453350
PLoS Comput Biol. 2016 Oct 20;12(10):e1005097
pubmed: 27764088
Sci Data. 2019 Apr 10;6(1):28
pubmed: 30971690
PLoS Comput Biol. 2013;9(2):e1002854
pubmed: 23408875
Nat Biotechnol. 2019 Apr;37(4):358-367
pubmed: 30940948
Bioinformatics. 2011 Oct 1;27(19):2721-9
pubmed: 21828087
Commun Biol. 2020 Aug 28;3(1):474
pubmed: 32859925
Nat Biotechnol. 2008 Aug;26(8):889-96
pubmed: 18688244
Adv Bioinformatics. 2012;2012:391574
pubmed: 22685456
Biometrics. 1977 Mar;33(1):159-74
pubmed: 843571