Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN.
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
16 Feb 2024
16 Feb 2024
Historique:
received:
03
02
2023
accepted:
22
01
2024
medline:
17
2
2024
pubmed:
17
2
2024
entrez:
17
2
2024
Statut:
aheadofprint
Résumé
Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, interspecies genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN can detect functionally related genes coexpressed across species, redefining differential expression for cross-species analysis. Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene functions between glaucoma-associated genes in humans and four other species.
Identifiants
pubmed: 38366243
doi: 10.1038/s41592-024-02191-z
pii: 10.1038/s41592-024-02191-z
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : United States Department of Defense | Defense Advanced Research Projects Agency (DARPA)
ID : HR00112190039
Informations de copyright
© 2024. The Author(s).
Références
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
doi: 10.7554/eLife.27041
pubmed: 29206104
pmcid: 5762154
Tabula Sapiens Consortium. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).
doi: 10.1126/science.abl4896
Tabula Muris Consortium. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
doi: 10.1038/s41586-018-0590-4
Li, H. et al. Fly Cell Atlas: a single-nucleus transcriptomic atlas of the adult fruit fly. Science 375, eabk2432 (2022).
doi: 10.1126/science.abk2432
pubmed: 35239393
pmcid: 8944923
Lu, T.-C. et al. Aging Fly Cell Atlas identifies exhaustive aging features at cellular resolution. Science 380, eadg0934 (2022).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
doi: 10.1038/s41592-019-0619-0
pubmed: 31740819
pmcid: 6884693
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
doi: 10.1038/s41587-019-0113-3
pubmed: 31061482
pmcid: 6551256
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
doi: 10.1038/s41592-018-0229-2
pubmed: 30504886
pmcid: 6289068
Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nat. Methods 16, 1139–1145 (2019).
doi: 10.1038/s41592-019-0576-7
pubmed: 31591579
pmcid: 10164410
Brbić, M. et al. MARS: discovering novel cell types across heterogeneous single-cell experiments. Nat. Methods 17, 1200–1206 (2020).
doi: 10.1038/s41592-020-00979-3
pubmed: 33077966
Tarashansky, A. J. et al. Mapping single-cell atlases throughout metazoa unravels cell type evolution. eLife 10, e66747 (2021).
doi: 10.7554/eLife.66747
pubmed: 33944782
pmcid: 8139856
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
doi: 10.1073/pnas.2016239118
pubmed: 33876751
pmcid: 8053943
Elnaggar, A. et al. ProtTrans: Toward understanding the language of life through self- supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2022).
doi: 10.1109/TPAMI.2021.3095381
pubmed: 34232869
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Kilinc, M., Jia, K., & Jernigan, R. L. Improved global protein homolog detection with major gains in function identification. Proc. Natl Acad. Sci. USA 120, e2211823120 (2023).
The Tabula Microcebus Consortium et al. Tabula Microcebus: a transcriptomic cell atlas of mouse lemur, an emerging primate model organism. Preprint at BioRxiv https://doi.org/10.1101/2021.12.12.469460 (2021).
Briggs, J. A. et al. The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. Science 360, eaar5780 (2018).
doi: 10.1126/science.aar5780
pubmed: 29700227
pmcid: 6038144
van Zyl, T. et al. Cell atlas of aqueous humor outflow pathways in eyes of humans and four model species provides insight into glaucoma pathogenesis. Proc. Natl Acad. Sci. USA 117, 10339–10349 (2020).
doi: 10.1073/pnas.2001250117
pubmed: 32341164
pmcid: 7229661
Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
doi: 10.1126/science.1260419
pubmed: 25613900
The Human Protein Atlas. https://www.proteinatlas.org/
Weisel, N. M. et al. Surface phenotypes of naive and memory B cells in mouse and human tissues. Nat. Immunol. 23, 135–145 (2022).
doi: 10.1038/s41590-021-01078-x
pubmed: 34937918
Sprague, J. et al. The zebrafish information network (ZFIN): the zebrafish model organism database. Nucleic Acids Research 31, 241–243 (2003).
doi: 10.1093/nar/gkg027
pubmed: 12519991
pmcid: 165474
Bradford, Y. M. et al. Zebrafish information network, the knowledgebase for Danio rerio research. Genetics 220, iyac016 (2022).
doi: 10.1093/genetics/iyac016
pubmed: 35166825
pmcid: 8982015
Cancelas, J. A. & Williams, D. A. Rho GTPases in hematopoietic stem cell functions. Curr. Opin. Hematol. 16, 249–254 (2009).
doi: 10.1097/MOH.0b013e32832c4b80
pubmed: 19417647
pmcid: 3908896
Montoro, D. T. et al. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature 560, 319–324 (2018).
doi: 10.1038/s41586-018-0393-7
pubmed: 30069044
pmcid: 6295155
Deprez, M. et al. A single-cell atlas of the human healthy airways. Am. J. Respir. Crit. Care Med. 202, 1636–1645 (2020).
doi: 10.1164/rccm.201911-2199OC
pubmed: 32726565
Kolosov, D., Bui, P., Chasiotis, H. & Kelly, S. P. Claudins in teleost fishes. Tissue Barriers 1, e25391 (2013).
doi: 10.4161/tisb.25391
pubmed: 24665402
pmcid: 3875606
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2
pubmed: 2231712
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).
doi: 10.1038/75556
pubmed: 10802651
Song, Y., Miao, Z., Brazma, A., & Papatheodorou, I., Benchmarking strategies for cross-species integration of single-cell RNA sequencing data. Nat. Commun. 14, 6495 (2023).
Yates, A. et al. The ensembl REST API: ensembl data for any language. Bioinformatics 31, 143–145 (2015).
doi: 10.1093/bioinformatics/btu613
pubmed: 25236461
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
doi: 10.1038/s41592-021-01336-8
pubmed: 34949812
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. J. Open Source Softw. 3, 861 (2018).
doi: 10.21105/joss.00861
Bai, Y. et al. During glaucoma, alpha2-macroglobulin accumulates in aqueous humor and binds to nerve growth factor, neutralizing neuroprotection. Invest. Ophthalmol. Vis. Sci. 52, 5260–5265 (2011).
doi: 10.1167/iovs.10-6691
pubmed: 21642630
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
doi: 10.1038/nmeth.4380
pubmed: 28759029
pmcid: 5669064
Xia, C., Fan, J., Emanuel, G., Hao, J. & Zhuang, X. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc. Natl Acad. Sci. USA 116, 19490–19499 (2019).
doi: 10.1073/pnas.1912459116
pubmed: 31501331
pmcid: 6765259
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
doi: 10.1038/s41586-023-05896-x
pubmed: 37165242
pmcid: 10172123
Jones, M. G., Rosen, Y. & Yosef, N. Interactive, integrated analysis of single-cell transcriptomic and phylogenetic data with PhyloVision. Cell Rep. Methods 2, 100200 (2022).
doi: 10.1016/j.crmeth.2022.100200
pubmed: 35497495
pmcid: 9046453
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
doi: 10.1038/s41587-019-0071-9
pubmed: 30936559
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inform. Theory 28, 129–137 (1982).
doi: 10.1109/TIT.1982.1056489
Ba, J. L., Kiros, J. R., & Hinton, G. E., Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016).
Traag, V. A., Waltman, L. & Van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Rep. 9, 5233 (2019).
doi: 10.1038/s41598-019-41695-z
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
doi: 10.1186/s13059-017-1382-0
pubmed: 29409532
pmcid: 5802054
Rosen, Y. et al. Towards universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN. Preprint at BioRxiv https://doi.org/10.1101/2023.02.03.526939 (2023).
Stelzer, G. et al. The genecards suite: from gene data mining to disease genome sequence analyses. Curr. Protoc. Bioinformatics 54, 1.30.1–1.30.33 (2016).
doi: 10.1002/cpbi.5
pubmed: 27322403
Safran, M. et al. The GeneCards suite. in Practical Guide to Life Science Databases 27–56 (Springer, 2021).