Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN.


Journal

Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604

Informations de publication

Date de publication:
16 Feb 2024
Historique:
received: 03 02 2023
accepted: 22 01 2024
medline: 17 2 2024
pubmed: 17 2 2024
entrez: 17 2 2024
Statut: aheadofprint

Résumé

Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, interspecies genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN can detect functionally related genes coexpressed across species, redefining differential expression for cross-species analysis. Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene functions between glaucoma-associated genes in humans and four other species.

Identifiants

pubmed: 38366243
doi: 10.1038/s41592-024-02191-z
pii: 10.1038/s41592-024-02191-z
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : United States Department of Defense | Defense Advanced Research Projects Agency (DARPA)
ID : HR00112190039

Informations de copyright

© 2024. The Author(s).

Références

Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
doi: 10.7554/eLife.27041 pubmed: 29206104 pmcid: 5762154
Tabula Sapiens Consortium. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).
doi: 10.1126/science.abl4896
Tabula Muris Consortium. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
doi: 10.1038/s41586-018-0590-4
Li, H. et al. Fly Cell Atlas: a single-nucleus transcriptomic atlas of the adult fruit fly. Science 375, eabk2432 (2022).
doi: 10.1126/science.abk2432 pubmed: 35239393 pmcid: 8944923
Lu, T.-C. et al. Aging Fly Cell Atlas identifies exhaustive aging features at cellular resolution. Science 380, eadg0934 (2022).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
doi: 10.1038/s41592-019-0619-0 pubmed: 31740819 pmcid: 6884693
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
doi: 10.1038/s41587-019-0113-3 pubmed: 31061482 pmcid: 6551256
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
doi: 10.1038/s41592-018-0229-2 pubmed: 30504886 pmcid: 6289068
Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nat. Methods 16, 1139–1145 (2019).
doi: 10.1038/s41592-019-0576-7 pubmed: 31591579 pmcid: 10164410
Brbić, M. et al. MARS: discovering novel cell types across heterogeneous single-cell experiments. Nat. Methods 17, 1200–1206 (2020).
doi: 10.1038/s41592-020-00979-3 pubmed: 33077966
Tarashansky, A. J. et al. Mapping single-cell atlases throughout metazoa unravels cell type evolution. eLife 10, e66747 (2021).
doi: 10.7554/eLife.66747 pubmed: 33944782 pmcid: 8139856
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
doi: 10.1073/pnas.2016239118 pubmed: 33876751 pmcid: 8053943
Elnaggar, A. et al. ProtTrans: Toward understanding the language of life through self- supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2022).
doi: 10.1109/TPAMI.2021.3095381 pubmed: 34232869
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Kilinc, M., Jia, K., & Jernigan, R. L. Improved global protein homolog detection with major gains in function identification. Proc. Natl Acad. Sci. USA 120, e2211823120 (2023).
The Tabula Microcebus Consortium et al. Tabula Microcebus: a transcriptomic cell atlas of mouse lemur, an emerging primate model organism. Preprint at BioRxiv https://doi.org/10.1101/2021.12.12.469460 (2021).
Briggs, J. A. et al. The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. Science 360, eaar5780 (2018).
doi: 10.1126/science.aar5780 pubmed: 29700227 pmcid: 6038144
van Zyl, T. et al. Cell atlas of aqueous humor outflow pathways in eyes of humans and four model species provides insight into glaucoma pathogenesis. Proc. Natl Acad. Sci. USA 117, 10339–10349 (2020).
doi: 10.1073/pnas.2001250117 pubmed: 32341164 pmcid: 7229661
Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
doi: 10.1126/science.1260419 pubmed: 25613900
The Human Protein Atlas. https://www.proteinatlas.org/
Weisel, N. M. et al. Surface phenotypes of naive and memory B cells in mouse and human tissues. Nat. Immunol. 23, 135–145 (2022).
doi: 10.1038/s41590-021-01078-x pubmed: 34937918
Sprague, J. et al. The zebrafish information network (ZFIN): the zebrafish model organism database. Nucleic Acids Research 31, 241–243 (2003).
doi: 10.1093/nar/gkg027 pubmed: 12519991 pmcid: 165474
Bradford, Y. M. et al. Zebrafish information network, the knowledgebase for Danio rerio research. Genetics 220, iyac016 (2022).
doi: 10.1093/genetics/iyac016 pubmed: 35166825 pmcid: 8982015
Cancelas, J. A. & Williams, D. A. Rho GTPases in hematopoietic stem cell functions. Curr. Opin. Hematol. 16, 249–254 (2009).
doi: 10.1097/MOH.0b013e32832c4b80 pubmed: 19417647 pmcid: 3908896
Montoro, D. T. et al. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature 560, 319–324 (2018).
doi: 10.1038/s41586-018-0393-7 pubmed: 30069044 pmcid: 6295155
Deprez, M. et al. A single-cell atlas of the human healthy airways. Am. J. Respir. Crit. Care Med. 202, 1636–1645 (2020).
doi: 10.1164/rccm.201911-2199OC pubmed: 32726565
Kolosov, D., Bui, P., Chasiotis, H. & Kelly, S. P. Claudins in teleost fishes. Tissue Barriers 1, e25391 (2013).
doi: 10.4161/tisb.25391 pubmed: 24665402 pmcid: 3875606
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2 pubmed: 2231712
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).
doi: 10.1038/75556 pubmed: 10802651
Song, Y., Miao, Z., Brazma, A., & Papatheodorou, I., Benchmarking strategies for cross-species integration of single-cell RNA sequencing data. Nat. Commun. 14, 6495 (2023).
Yates, A. et al. The ensembl REST API: ensembl data for any language. Bioinformatics 31, 143–145 (2015).
doi: 10.1093/bioinformatics/btu613 pubmed: 25236461
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
doi: 10.1038/s41592-021-01336-8 pubmed: 34949812
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. J. Open Source Softw. 3, 861 (2018).
doi: 10.21105/joss.00861
Bai, Y. et al. During glaucoma, alpha2-macroglobulin accumulates in aqueous humor and binds to nerve growth factor, neutralizing neuroprotection. Invest. Ophthalmol. Vis. Sci. 52, 5260–5265 (2011).
doi: 10.1167/iovs.10-6691 pubmed: 21642630
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
doi: 10.1038/nmeth.4380 pubmed: 28759029 pmcid: 5669064
Xia, C., Fan, J., Emanuel, G., Hao, J. & Zhuang, X. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc. Natl Acad. Sci. USA 116, 19490–19499 (2019).
doi: 10.1073/pnas.1912459116 pubmed: 31501331 pmcid: 6765259
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
doi: 10.1038/s41586-023-05896-x pubmed: 37165242 pmcid: 10172123
Jones, M. G., Rosen, Y. & Yosef, N. Interactive, integrated analysis of single-cell transcriptomic and phylogenetic data with PhyloVision. Cell Rep. Methods 2, 100200 (2022).
doi: 10.1016/j.crmeth.2022.100200 pubmed: 35497495 pmcid: 9046453
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
doi: 10.1038/s41587-019-0071-9 pubmed: 30936559
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inform. Theory 28, 129–137 (1982).
doi: 10.1109/TIT.1982.1056489
Ba, J. L., Kiros, J. R., & Hinton, G. E., Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016).
Traag, V. A., Waltman, L. & Van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Rep. 9, 5233 (2019).
doi: 10.1038/s41598-019-41695-z
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
doi: 10.1186/s13059-017-1382-0 pubmed: 29409532 pmcid: 5802054
Rosen, Y. et al. Towards universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN. Preprint at BioRxiv https://doi.org/10.1101/2023.02.03.526939 (2023).
Stelzer, G. et al. The genecards suite: from gene data mining to disease genome sequence analyses. Curr. Protoc. Bioinformatics 54, 1.30.1–1.30.33 (2016).
doi: 10.1002/cpbi.5 pubmed: 27322403
Safran, M. et al. The GeneCards suite. in Practical Guide to Life Science Databases 27–56 (Springer, 2021).

Auteurs

Yanay Rosen (Y)

Department of Computer Science, Stanford University, Stanford, CA, USA.

Maria Brbić (M)

School of Computer and Communication Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.

Yusuf Roohani (Y)

Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.

Kyle Swanson (K)

Department of Computer Science, Stanford University, Stanford, CA, USA.

Ziang Li (Z)

Department of Computer Science and Technology, Tsinghua University, Beijing, China.

Jure Leskovec (J)

Department of Computer Science, Stanford University, Stanford, CA, USA. jure@cs.stanford.edu.

Classifications MeSH