COPO - Managing sample metadata for biodiversity: considerations from the Darwin Tree of Life project.

Darwin Tree of Life LIMS biodiversity data management metadata metadata standards samples sharing standards taxonomic domains

Journal

Wellcome open research
ISSN: 2398-502X
Titre abrégé: Wellcome Open Res
Pays: England
ID NLM: 101696457

Informations de publication

Date de publication:
2022
Historique:
accepted: 26 07 2023
medline: 2 8 2024
pubmed: 2 8 2024
entrez: 2 8 2024
Statut: epublish

Résumé

Large-scale reference genome sequencing projects for all of biodiversity are underway and common standards have been in place for some years to enable the understanding and sharing of sequence data. However, the metadata that describes the collection, processing and management of samples, and link to the associated sequencing and genome data, are not yet adequately developed and standardised for these projects. At the time of writing, the Darwin Tree of Life (DToL) Project is over two years into its ten-year ambition to sequence all described eukaryotic species in Britain and Ireland. We have sought consensus from a wide range of scientists across taxonomic domains to determine the minimal set of metadata that we collectively deem as critically important to accompany each sequenced specimen. These metadata are made available throughout the subsequent laboratory processes, and once collected, need to be adequately managed to fulfil the requirements of good data management practice. Due to the size and scale of management required, software tools are needed. These tools need to implement rigorous development pathways and change management procedures to ensure that effective research data management of key project and sample metadata is maintained. Tracking of sample properties through the sequencing process is handled by Lab Information Management Systems (LIMS), so publication of the sequenced data is achieved via technical integration of LIMS and data management tools. Discussions with community members on how metadata standards need to be managed within large-scale programmes is a priority in the planning process. Here we report on the standards we developed with respect to a robust and reusable mechanism of metadata collection, in the hopes that other projects forthcoming or underway will adopt these practices for metadata.

Identifiants

pubmed: 39091415
doi: 10.12688/wellcomeopenres.18499.2
pmc: PMC11292180
doi:

Types de publication

Journal Article

Langues

eng

Pagination

279

Informations de copyright

Copyright: © 2023 Shaw F et al.

Déclaration de conflit d'intérêts

No competing interests were disclosed.

Auteurs

Felix Shaw (F)

Earlham Institute, Norwich, Norfolk, NR4 7UH, UK.

Alice Minotto (A)

Earlham Institute, Norwich, Norfolk, NR4 7UH, UK.

Seanna McTaggart (S)

Earlham Institute, Norwich, Norfolk, NR4 7UH, UK.

Aaliyah Providence (A)

Earlham Institute, Norwich, Norfolk, NR4 7UH, UK.

Peter Harrison (P)

EMBL European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK.

Joana Paupério (J)

EMBL European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK.

Jeena Rajan (J)

EMBL European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK.

Josephine Burgin (J)

EMBL European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK.

Guy Cochrane (G)

EMBL European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK.

Estelle Kilias (E)

Department of Zoology, University of Oxford, Oxford, Oxfordshire, OX1 2JD, UK.

Mara K N Lawniczak (MKN)

Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1RQ, UK.

Robert Davey (R)

Earlham Institute, Norwich, Norfolk, NR4 7UH, UK.

Classifications MeSH