From Planning Stage Towards FAIR Data: A Practical Metadatasheet For Biomedical Scientists.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
22 May 2024
22 May 2024
Historique:
received:
07
12
2023
accepted:
08
05
2024
medline:
23
5
2024
pubmed:
23
5
2024
entrez:
22
5
2024
Statut:
epublish
Résumé
Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.
Identifiants
pubmed: 38778016
doi: 10.1038/s41597-024-03349-2
pii: 10.1038/s41597-024-03349-2
doi:
Types de publication
Journal Article
Dataset
Langues
eng
Sous-ensembles de citation
IM
Pagination
524Subventions
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 390685813
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 458597554
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : FE 1159/6-1
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : FE 1159/5-1
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : E 1159/2-1
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 432325352
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : 450149205
Informations de copyright
© 2024. The Author(s).
Références
Morillo, F., Bordons, M. & Gómez, I. Interdisciplinarity in science: A tentative typology of disciplines and research areas. Journal of the American Society for Information Science and Technology 54, 1237–1249, https://doi.org/10.1002/asi.10326 (2003).
doi: 10.1002/asi.10326
Cioffi, M., Goldman, J. & Marchese, S. Harvard biomedical research data lifecycle. Zenodo https://doi.org/10.5281/zenodo.8076168 (2023).
Habermann, T. Metadata life cycles, use cases and hierarchies. Geosciences 8, https://doi.org/10.3390/geosciences8050179 (2018).
Stevens, I. et al. Ten simple rules for annotating sequencing experiments. PLOS Computational Biology 16, 1–7, https://doi.org/10.1371/journal.pcbi.1008260 (2020).
doi: 10.1371/journal.pcbi.1008260
Shaw, F. et al. Copo: a metadata platform for brokering fair data in the life sciences. F1000Research 9, 495, https://doi.org/10.12688/f1000research.23889.1 (2020).
doi: 10.12688/f1000research.23889.1
Ulrich, H. et al. Understanding the nature of metadata: Systematic review. J Med Internet Res 24, e25440, https://doi.org/10.2196/25440 (2022).
doi: 10.2196/25440
pubmed: 35014967
pmcid: 8790684
Wilkinson, M. D. et al. Comment: The fair guiding principles for scientific data management and stewardship. Scientific Data 3, https://doi.org/10.1038/sdata.2016.18 (2016).
Wolstencroft, K. et al. Rightfield: Embedding ontology annotation in spreadsheets. Bioinformatics 27, 2021–2022, https://doi.org/10.1093/bioinformatics/btr312 (2011).
doi: 10.1093/bioinformatics/btr312
pubmed: 21622664
Leipzig, J., Nüst, D., Hoyt, C. T., Ram, K. & Greenberg, J. The role of metadata in reproducible computational research. Patterns 2, https://doi.org/10.1016/j.patter.2021.100322 (2021).
Researchspace. https://www.researchspace.com/ . Accessed: 12th March 2024 (2024).
Revvity signals notebook eln. https://revvitysignals.com/products/research/signals-notebook-eln . Accessed: 12th March 2024 (2024).
Kowalczyk, S. T. Before the repository: Defining the preservation threats to research data in the lab. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ‘15, 215–222, https://doi.org/10.1145/2756406.2756909 (Association for Computing Machinery, New York, NY, USA, 2015).
Rocca-Serra, P. et al. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics 26, 2354–2356, https://doi.org/10.1093/bioinformatics/btq415 (2010).
doi: 10.1093/bioinformatics/btq415
pubmed: 20679334
pmcid: 2935443
Lin, D. et al. The trust principles for digital repositories. Scientific Data 7, 144, https://doi.org/10.1038/s41597-020-0486-7 (2020).
doi: 10.1038/s41597-020-0486-7
pubmed: 32409645
pmcid: 7224370
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Research 41, D991–D995, https://doi.org/10.1093/nar/gks1193 (2012).
doi: 10.1093/nar/gks1193
pubmed: 23193258
pmcid: 3531084
VizcaÃno, J. A. et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Research 44, D447–D456, https://doi.org/10.1093/nar/gkv1145 (2015).
doi: 10.1093/nar/gkv1145
Malik-Sheriff, R. S. et al. BioModels—15 years of sharing computational models in life science. Nucleic Acids Research 48, D407–D415, https://doi.org/10.1093/nar/gkz1055 (2019).
doi: 10.1093/nar/gkz1055
pmcid: 7145643
Glont, M. et al. BioModels: expanding horizons to include more modelling approaches and formats. Nucleic Acids Research 46, D1248–D1253, https://doi.org/10.1093/nar/gkx1023 (2017).
doi: 10.1093/nar/gkx1023
pmcid: 5753244
Consortium, T. G. O. et al. The Gene Ontology knowledgebase in 2023. Genetics 224, iyad031, https://doi.org/10.1093/genetics/iyad031 (2023).
doi: 10.1093/genetics/iyad031
Percie du Sert, N. et al. The arrive guidelines 2.0: Updated guidelines for reporting animal research. PLOS Biology 18, 1–12, https://doi.org/10.1371/journal.pbio.3000410 (2020).
doi: 10.1371/journal.pbio.3000410
Novère, N. L. et al. Minimum information requested in the annotation of biochemical models (miriam. Nature Biotechnology 23, 1509–1515, https://doi.org/10.1038/nbt1156 (2005).
doi: 10.1038/nbt1156
pubmed: 16333295
Gil Press. Cleaning big data: Most time-consuming, least enjoyable data science task, survey says. https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/?sh=27709ef76f63 . Accessed: 2024-4-3 (2016).
Hughes, L. D. et al. Addressing barriers in fair data practices for biomedical data. Scientific Data 10, 98, https://doi.org/10.1038/s41597-023-01969-8 (2023).
doi: 10.1038/s41597-023-01969-8
pubmed: 36823198
pmcid: 9950056
The metabolomics workbench, https://www.metabolomicsworkbench.org/ .
EMBL. Ontology lookup service, https://www.ebi.ac.uk/ols4 .
Xiang, Z., Mungall, C. J., Ruttenberg, A. & He, Y. O. Ontobee: A linked data server and browser for ontology terms. In International Conference on Biomedical Ontology (2011).
Huber, W. et al. Orchestrating high-throughput genomic analysis with bioconductor. Nature Methods 12, 115–121, https://doi.org/10.1038/nmeth.3252 (2015).
doi: 10.1038/nmeth.3252
pubmed: 25633503
pmcid: 4509590
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5, R80, https://doi.org/10.1186/gb-2004-5-10-r80 (2004).
doi: 10.1186/gb-2004-5-10-r80
pubmed: 15461798
pmcid: 545600
Hunt, A. & Thomas, D. The pragmatic programmer: From journeyman to master. (Addison Wesley, Boston, MA, 1999).
Morgan, M., Obenchain, V., Hester, J. & Pages, H. Summarizedexperiment: Summarizedexperiment container. Bioconductor (2003).
Mass, E. et al. Developmental programming of kupffer cells by maternal obesity causes fatty liver disease in the offspring. Research Square Platform LLC https://doi.org/10.21203/rs.3.rs-3242837/v1 (2023).
Davis, S. & Meltzer, P. S. Geoquery: a bridge between the gene expression omnibus (geo) and bioconductor. Bioinformatics 23, 1846–1847, https://doi.org/10.1093/bioinformatics/btm254 (2007).
doi: 10.1093/bioinformatics/btm254
pubmed: 17496320
Zhu, Y., Davis, S., Stephens, R., Meltzer, P. S. & Chen, Y. Geometadb: powerful alternative search engine for the gene expression omnibus. Bioinformatics 24, 2798–2800, https://doi.org/10.1093/bioinformatics/btn520 (2008).
doi: 10.1093/bioinformatics/btn520
pubmed: 18842599
pmcid: 2639278
National Center for Biotechnology Information (US). Entrez programming utilities help. Internet. Accessed on 02.04.2024 (2010).
SciBite, CENtree, https://scibite.com/platform/centree-ontology-management-platform/
Ravagli, C., Pognan, F. & Marc, P. Ontobrowser: a collaborative tool for curation of ontologies by subject matter experts. Bioinformatics 33, 148–149, https://doi.org/10.1093/bioinformatics/btw579 (2016).
doi: 10.1093/bioinformatics/btw579
pubmed: 27605099
pmcid: 5408772
Sasse, J., Darms, J. & Fluck, J. Semantic metadata annotation services in the biomedical domain—a literature review. Applied Sciences (Switzerland) 12, https://doi.org/10.3390/app12020796 (2022).
Tedersoo, L. et al. Data sharing practices and data availability upon request differ across scientific disciplines. Scientific Data 8, 192, https://doi.org/10.1038/s41597-021-00981-0 (2021).
doi: 10.1038/s41597-021-00981-0
pubmed: 34315906
pmcid: 8381906
Menzel, J. & Weil, P. Metadata capture in an electronic notebook: How to make it as simple as possible? Metadatenerfassung in einem elektronischen laborbuch: Wie macht man es so einfach wie möglich? GMS Medizinische Informatik, Biometrie Epidemiologie 5, 11, https://doi.org/10.3205/mibe000162 (2015).
Musen, M. A. The protégé project: A look back and a look forward. AI Matters 1, 4–12, https://doi.org/10.1145/2757001.2757003 (2015).
doi: 10.1145/2757001.2757003
pubmed: 27239556
pmcid: 4883684
Seep, L. METADATASHEET - Showcases, Zenodo, https://doi.org/10.5281/zenodo.10278069 (2023).