MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data.


Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
01 Aug 2024
Historique:
received: 06 03 2023
accepted: 24 07 2024
medline: 2 8 2024
pubmed: 2 8 2024
entrez: 1 8 2024
Statut: epublish

Résumé

Many datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While biospecimen and experimental information is often captured, detailed metadata standards related to data matrices and analysis workflows are currently lacking. To address this, we develop the matrix and analysis metadata standards (MAMS) to serve as a resource for data centers, repositories, and tool developers. We define metadata fields for matrices and parameters commonly utilized in analytical workflows and developed the rmams package to extract MAMS from single-cell objects. Overall, MAMS promotes the harmonization, integration, and reproducibility of single-cell data across platforms.

Identifiants

pubmed: 39090672
doi: 10.1186/s13059-024-03349-w
pii: 10.1186/s13059-024-03349-w
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

205

Subventions

Organisme : Cancer Moonshot
ID : 1U2CCA233238-01
Organisme : Wellcome Trust
ID : 108437/Z/15/Z
Pays : United Kingdom
Organisme : NHLBI NIH HHS
ID : U24HL148865
Pays : United States

Informations de copyright

© 2024. The Author(s).

Références

Regev A, et al. The human cell atlas. Elife. 2017;6:71.
doi: 10.7554/eLife.27041
HuBMAP Consortium. The human body at cellular resolution: the NIH human biomolecular Atlas program. Nature. 2019;574:187–92.
doi: 10.1038/s41586-019-1629-x
Rozenblatt-Rosen O, et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell. 2020;181:236–49.
doi: 10.1016/j.cell.2020.03.053 pubmed: 32302568 pmcid: 7376497
Li H, et al. Fly Cell Atlas: a single-nucleus transcriptomic atlas of the adult fruit fly. Science. 2022;375:eabk2432.
doi: 10.1126/science.abk2432 pubmed: 35239393 pmcid: 8944923
Plant Cell Atlas Consortium, et al. Vision, challenges and opportunities for a Plant Cell Atlas. Elife. 2021;10:e66877.
doi: 10.7554/eLife.66877
Gaddis N, et al. LungMAP portal ecosystem: systems-level exploration of the lung. Am J Respir Cell Mol Biol. 2022. https://doi.org/10.1165/rcmb.2022-0165OC .
Ardini-Poleske ME, et al. LungMAP: the molecular atlas of lung development program. Am J Physiol Lung Cell Mol Physiol. 2017;313:L733–40.
doi: 10.1152/ajplung.00139.2017 pubmed: 28798251 pmcid: 5792185
Clough E, Barrett T. The Gene Expression Omnibus Database. Methods Mol Biol. 2016;1418:93–110.
doi: 10.1007/978-1-4939-3578-9_5 pubmed: 27008011 pmcid: 4944384
Sarkans U, et al. From ArrayExpress to BioStudies. Nucleic Acids Res. 2021;49:D1502–6.
doi: 10.1093/nar/gkaa1062 pubmed: 33211879
Puntambekar S, Hesselberth JR, Riemondy KA, Fu R. Cell-level metadata are indispensable for documenting single-cell sequencing datasets. PLoS Biol. 2021;19:e3001077.
doi: 10.1371/journal.pbio.3001077 pubmed: 33945522 pmcid: 8121533
Bolewski J, Papadopoulos S. Managing massive multi-dimensional array data with TileDB: — invited demo paper. In: 2017 IEEE International Conference on Big Data (Big Data). 2017. p. 3175–3176. https://doi.org/10.1109/BigData.2017.8258296 .
Virshup I, Rybakov S, Theis FJ, Angerer P, Wolf FA. anndata: annotated data. bioRxiv. 2021.12.16.473007. https://doi.org/10.1101/2021.12.16.473007 .
Bredikhin D, Kats I, Stegle O. MUON: multimodal omics analysis framework. Genome Biol. 2022;23:42.
doi: 10.1186/s13059-021-02577-8 pubmed: 35105358 pmcid: 8805324
Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20.
doi: 10.1038/nbt.4096 pubmed: 29608179 pmcid: 6700744
Stuart T, et al. Comprehensive integration of single-cell data. Cell. 2019;177:1888-1902.e21.
doi: 10.1016/j.cell.2019.05.031 pubmed: 31178118 pmcid: 6687398
Amezquita RA, et al. Orchestrating single-cell analysis with Bioconductor. Nat Methods. 2020;17:137–45.
doi: 10.1038/s41592-019-0654-x pubmed: 31792435
Sarfraz I, Asif M, Campbell JD. ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects. Bioinformatics. 2021. https://doi.org/10.1093/bioinformatics/btab179 .
doi: 10.1093/bioinformatics/btab179 pubmed: 33715007 pmcid: 9940906
Ramos M, et al. Software for the integration of multiomics experiments in Bioconductor. Cancer Res. 2017;77:e39–42.
doi: 10.1158/0008-5472.CAN-17-0344 pubmed: 29092936 pmcid: 5679241
Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15.
doi: 10.1186/s13059-017-1382-0 pubmed: 29409532 pmcid: 5802054
Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet. 2019;20:257–72.
doi: 10.1038/s41576-019-0093-7 pubmed: 30696980
Adossa N, Khan S, Rytkönen KT, Elo LL. Computational strategies for single-cell multi-omics integration. Comput Struct Biotechnol J. 2021;19:2588–96.
doi: 10.1016/j.csbj.2021.04.060 pubmed: 34025945 pmcid: 8114078
Wilkinson MD, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018.
doi: 10.1038/sdata.2016.18 pubmed: 26978244 pmcid: 4792175
di Tommaso P, et al. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316–9.
doi: 10.1038/nbt.3820 pubmed: 28398311
Ahmed AE, et al. Design considerations for workflow management systems use in production genomics research and the clinic. Sci Rep. 2021;11:21680.
doi: 10.1038/s41598-021-99288-8 pubmed: 34737383 pmcid: 8569008
Mölder F, et al. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33.
doi: 10.12688/f1000research.29032.2 pubmed: 34035898 pmcid: 8114187
Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495–502. https://doi.org/10.1038/nbt.3192 .
doi: 10.1038/nbt.3192 pubmed: 25867923 pmcid: 4430369
Schapiro D, et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods. 2022;19:311–5.
doi: 10.1038/s41592-021-01308-y pubmed: 34824477
Sarfraz I, Wang Y, Shastry A, Teh WK, Sokolov A, Herb BR, Creasy HH, Virshup I, Dries R, Degatano K, Mahurkar A, Schnell DJ, Madrigal P, Hilton J, Gehlenborg N, Tickle T, Campbell JD. MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data. Zenodo; 2024. https://doi.org/10.5281/zenodo.12724192 .
Sarfraz I, Wang Y, Shastry A, Teh WK, Sokolov A, Herb BR, Creasy HH, Virshup I, Dries R, Degatano K, Mahurkar A, Schnell DJ, Madrigal P, Hilton J, Gehlenborg N, Tickle T, Campbell JD. MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data. GitHub; 2024. https://github.com/single-cell-mams/rmams .
Lause J. scverse tutorial data: getting started with AnnData. Figshare; 2023. https://doi.org/10.6084/m9.figshare.22577536.v2 .

Auteurs

Irzam Sarfraz (I)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Yichen Wang (Y)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Amulya Shastry (A)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Wei Kheng Teh (WK)

European Bioinformatics Institute, European Molecular Biology Laboratory, Hinxton, Cambridgeshire, UK.

Artem Sokolov (A)

Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.

Brian R Herb (BR)

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Heather H Creasy (HH)

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Isaac Virshup (I)

Department of Computational Health, Helmholtz Munich, Oberschleißheim, Germany.

Ruben Dries (R)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Kylee Degatano (K)

Data Sciences Platform, Broad Institute, Cambridge, MA, USA.

Anup Mahurkar (A)

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Daniel J Schnell (DJ)

Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

Pedro Madrigal (P)

European Bioinformatics Institute, European Molecular Biology Laboratory, Hinxton, Cambridgeshire, UK.

Jason Hilton (J)

Department of Genetics, Stanford University, Stanford, CA, USA.

Nils Gehlenborg (N)

Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Timothy Tickle (T)

Data Sciences Platform, Broad Institute, Cambridge, MA, USA.

Joshua D Campbell (JD)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA. camp@bu.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH