Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data.


Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
07 Mar 2023
Historique:
pubmed: 23 3 2023
medline: 23 3 2023
entrez: 22 3 2023
Statut: epublish

Résumé

A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While much effort has been devoted to capturing information related to biospecimen information and experimental procedures, the metadata standards that describe data matrices and the analysis workflows that produced them are relatively lacking. Detailed metadata schema related to data analysis are needed to facilitate sharing and interoperability across groups and to promote data provenance for reproducibility. To address this need, we developed the Matrix and Analysis Metadata Standards (MAMS) to serve as a resource for data coordinating centers and tool developers. We first curated several simple and complex "use cases" to characterize the types of feature-observation matrices (FOMs), annotations, and analysis metadata produced in different workflows. Based on these use cases, metadata fields were defined to describe the data contained within each matrix including those related to processing, modality, and subsets. Suggested terms were created for the majority of fields to aid in harmonization of metadata terms across groups. Additional provenance metadata fields were also defined to describe the software and workflows that produced each FOM. Finally, we developed a simple list-like schema that can be used to store MAMS information and implemented in multiple formats. Overall, MAMS can be used as a guide to harmonize analysis-related metadata which will ultimately facilitate integration of datasets across tools and consortia. MAMS specifications, use cases, and examples can be found at https://github.com/single-cell-mams/mams/.

Identifiants

pubmed: 36945543
doi: 10.1101/2023.03.06.531314
pmc: PMC10028847
pii:
doi:

Types de publication

Preprint

Langues

eng

Déclaration de conflit d'intérêts

DECLARATIONS The authors declare that they have no competing interests.

Auteurs

Yichen Wang (Y)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Irzam Sarfraz (I)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Wei Kheng Teh (WK)

European Bioinformatics Institute, European Molecular Biology Laboratory, Hinxton, Cambridgeshire, UK.

Artem Sokolov (A)

Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.

Brian R Herb (BR)

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Heather H Creasy (HH)

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Isaac Virshup (I)

Department of Computational Health, Helmholtz Munich, Oberschleißheim, Germany.

Ruben Dries (R)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Kylee Degatano (K)

Data Sciences Platform, Broad Institute, Cambridge, MA, USA.

Anup Mahurkar (A)

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Daniel J Schnell (DJ)

Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

Pedro Madrigal (P)

European Bioinformatics Institute, European Molecular Biology Laboratory, Hinxton, Cambridgeshire, UK.

Jason Hilton (J)

Department of Genetics, Stanford University, Stanford, CA, USA.

Nils Gehlenborg (N)

Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Timothy Tickle (T)

Data Sciences Platform, Broad Institute, Cambridge, MA, USA.

Joshua D Campbell (JD)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Classifications MeSH