MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data.
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
01 Aug 2024
01 Aug 2024
Historique:
received:
06
03
2023
accepted:
24
07
2024
medline:
2
8
2024
pubmed:
2
8
2024
entrez:
1
8
2024
Statut:
epublish
Résumé
Many datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While biospecimen and experimental information is often captured, detailed metadata standards related to data matrices and analysis workflows are currently lacking. To address this, we develop the matrix and analysis metadata standards (MAMS) to serve as a resource for data centers, repositories, and tool developers. We define metadata fields for matrices and parameters commonly utilized in analytical workflows and developed the rmams package to extract MAMS from single-cell objects. Overall, MAMS promotes the harmonization, integration, and reproducibility of single-cell data across platforms.
Identifiants
pubmed: 39090672
doi: 10.1186/s13059-024-03349-w
pii: 10.1186/s13059-024-03349-w
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
205Subventions
Organisme : Cancer Moonshot
ID : 1U2CCA233238-01
Organisme : Wellcome Trust
ID : 108437/Z/15/Z
Pays : United Kingdom
Organisme : NHLBI NIH HHS
ID : U24HL148865
Pays : United States
Informations de copyright
© 2024. The Author(s).
Références
Regev A, et al. The human cell atlas. Elife. 2017;6:71.
doi: 10.7554/eLife.27041
HuBMAP Consortium. The human body at cellular resolution: the NIH human biomolecular Atlas program. Nature. 2019;574:187–92.
doi: 10.1038/s41586-019-1629-x
Rozenblatt-Rosen O, et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell. 2020;181:236–49.
doi: 10.1016/j.cell.2020.03.053
pubmed: 32302568
pmcid: 7376497
Li H, et al. Fly Cell Atlas: a single-nucleus transcriptomic atlas of the adult fruit fly. Science. 2022;375:eabk2432.
doi: 10.1126/science.abk2432
pubmed: 35239393
pmcid: 8944923
Plant Cell Atlas Consortium, et al. Vision, challenges and opportunities for a Plant Cell Atlas. Elife. 2021;10:e66877.
doi: 10.7554/eLife.66877
Gaddis N, et al. LungMAP portal ecosystem: systems-level exploration of the lung. Am J Respir Cell Mol Biol. 2022. https://doi.org/10.1165/rcmb.2022-0165OC .
Ardini-Poleske ME, et al. LungMAP: the molecular atlas of lung development program. Am J Physiol Lung Cell Mol Physiol. 2017;313:L733–40.
doi: 10.1152/ajplung.00139.2017
pubmed: 28798251
pmcid: 5792185
Clough E, Barrett T. The Gene Expression Omnibus Database. Methods Mol Biol. 2016;1418:93–110.
doi: 10.1007/978-1-4939-3578-9_5
pubmed: 27008011
pmcid: 4944384
Sarkans U, et al. From ArrayExpress to BioStudies. Nucleic Acids Res. 2021;49:D1502–6.
doi: 10.1093/nar/gkaa1062
pubmed: 33211879
Puntambekar S, Hesselberth JR, Riemondy KA, Fu R. Cell-level metadata are indispensable for documenting single-cell sequencing datasets. PLoS Biol. 2021;19:e3001077.
doi: 10.1371/journal.pbio.3001077
pubmed: 33945522
pmcid: 8121533
Bolewski J, Papadopoulos S. Managing massive multi-dimensional array data with TileDB: — invited demo paper. In: 2017 IEEE International Conference on Big Data (Big Data). 2017. p. 3175–3176. https://doi.org/10.1109/BigData.2017.8258296 .
Virshup I, Rybakov S, Theis FJ, Angerer P, Wolf FA. anndata: annotated data. bioRxiv. 2021.12.16.473007. https://doi.org/10.1101/2021.12.16.473007 .
Bredikhin D, Kats I, Stegle O. MUON: multimodal omics analysis framework. Genome Biol. 2022;23:42.
doi: 10.1186/s13059-021-02577-8
pubmed: 35105358
pmcid: 8805324
Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20.
doi: 10.1038/nbt.4096
pubmed: 29608179
pmcid: 6700744
Stuart T, et al. Comprehensive integration of single-cell data. Cell. 2019;177:1888-1902.e21.
doi: 10.1016/j.cell.2019.05.031
pubmed: 31178118
pmcid: 6687398
Amezquita RA, et al. Orchestrating single-cell analysis with Bioconductor. Nat Methods. 2020;17:137–45.
doi: 10.1038/s41592-019-0654-x
pubmed: 31792435
Sarfraz I, Asif M, Campbell JD. ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects. Bioinformatics. 2021. https://doi.org/10.1093/bioinformatics/btab179 .
doi: 10.1093/bioinformatics/btab179
pubmed: 33715007
pmcid: 9940906
Ramos M, et al. Software for the integration of multiomics experiments in Bioconductor. Cancer Res. 2017;77:e39–42.
doi: 10.1158/0008-5472.CAN-17-0344
pubmed: 29092936
pmcid: 5679241
Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15.
doi: 10.1186/s13059-017-1382-0
pubmed: 29409532
pmcid: 5802054
Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet. 2019;20:257–72.
doi: 10.1038/s41576-019-0093-7
pubmed: 30696980
Adossa N, Khan S, Rytkönen KT, Elo LL. Computational strategies for single-cell multi-omics integration. Comput Struct Biotechnol J. 2021;19:2588–96.
doi: 10.1016/j.csbj.2021.04.060
pubmed: 34025945
pmcid: 8114078
Wilkinson MD, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018.
doi: 10.1038/sdata.2016.18
pubmed: 26978244
pmcid: 4792175
di Tommaso P, et al. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316–9.
doi: 10.1038/nbt.3820
pubmed: 28398311
Ahmed AE, et al. Design considerations for workflow management systems use in production genomics research and the clinic. Sci Rep. 2021;11:21680.
doi: 10.1038/s41598-021-99288-8
pubmed: 34737383
pmcid: 8569008
Mölder F, et al. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33.
doi: 10.12688/f1000research.29032.2
pubmed: 34035898
pmcid: 8114187
Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495–502. https://doi.org/10.1038/nbt.3192 .
doi: 10.1038/nbt.3192
pubmed: 25867923
pmcid: 4430369
Schapiro D, et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods. 2022;19:311–5.
doi: 10.1038/s41592-021-01308-y
pubmed: 34824477
Sarfraz I, Wang Y, Shastry A, Teh WK, Sokolov A, Herb BR, Creasy HH, Virshup I, Dries R, Degatano K, Mahurkar A, Schnell DJ, Madrigal P, Hilton J, Gehlenborg N, Tickle T, Campbell JD. MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data. Zenodo; 2024. https://doi.org/10.5281/zenodo.12724192 .
Sarfraz I, Wang Y, Shastry A, Teh WK, Sokolov A, Herb BR, Creasy HH, Virshup I, Dries R, Degatano K, Mahurkar A, Schnell DJ, Madrigal P, Hilton J, Gehlenborg N, Tickle T, Campbell JD. MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data. GitHub; 2024. https://github.com/single-cell-mams/rmams .
Lause J. scverse tutorial data: getting started with AnnData. Figshare; 2023. https://doi.org/10.6084/m9.figshare.22577536.v2 .