Mapping single-cell data to reference atlases by transfer learning.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
01 2022
Historique:
received: 30 07 2020
accepted: 28 06 2021
pubmed: 1 9 2021
medline: 1 2 2022
entrez: 31 8 2021
Statut: ppublish

Résumé

Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.

Identifiants

pubmed: 34462589
doi: 10.1038/s41587-021-01001-7
pii: 10.1038/s41587-021-01001-7
pmc: PMC8763644
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

121-130

Subventions

Organisme : NHGRI NIH HHS
ID : T32 HG000047
Pays : United States
Organisme : NIAID NIH HHS
ID : U19 AI135964
Pays : United States
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : ZT-I-0007

Informations de copyright

© 2021. The Author(s).

Références

Schaum, N., Karkanias, J., Neff, N. & Pisco, A. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
pmcid: 6642641 doi: 10.1038/s41586-018-0590-4
Han, X. et al. Mapping the mouse cell atlas by Microwell-seq. Cell 172, 1091–1107 (2018).
pubmed: 29474909 doi: 10.1016/j.cell.2018.02.001
The Tabula Muris Consortium et al. A single cell transcriptomic atlas characterizes aging tissues in the mouse. Preprint at bioRxiv https://doi.org/10.1101/661728 (2020).
Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).
pubmed: 32214235 doi: 10.1038/s41586-020-2157-4
10x Genomics. 10x Datasets Single Cell Gene Expression, Official 10x Genomics Support. https://www.10xgenomics.com/resources/datasets/
Regev, A. et al. Science forum: the human cell atlas. eLife 6, e27041 (2017).
pubmed: 29206104 pmcid: 5762154 doi: 10.7554/eLife.27041
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Preprint at bioRxiv https://doi.org/10.1101/2020.05.22.111161 (2020).
Zheng, H. et al. Cross-domain fault diagnosis using knowledge transfer strategy: a review. IEEE Access 7, 129260–129290 (2019).
doi: 10.1109/ACCESS.2019.2939876
Ruder, S., Peters, M. E., Swayamdipta, S. & Wolf, T. Transfer learning in natural language processing. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics 15–18 (ACL, 2019).
Yang, L., Hanneke, S. & Carbonell, J. A theory of transfer learning with applications to active learning. Mach. Learn. 90, 161–189 (2013).
doi: 10.1007/s10994-012-5310-y
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. in Proceedings of the 25th International Conference on Neural Information Processing Systems 1097–1105 (NIPS, 2012).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805v2 (2018).
Hsu, Y.-C., Lv, Z. & Kira, Z. Learning to cluster in order to transfer across domains and tasks. Preprint at https://arxiv.org/abs/1711.10125 (2017).
Shin, H.-C. et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35, 1285–1298 (2016).
pubmed: 26886976 doi: 10.1109/TMI.2016.2528162
Dahl, G. E., Yu, D., Deng, L. & Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2011).
doi: 10.1109/TASL.2011.2134090
Ker, J., Wang, L., Rao, J. & Lim, T. Deep learning applications in medical image analysis. IEEE Access 6, 9375–9389 (2017).
doi: 10.1109/ACCESS.2017.2788044
Avsec, Ž. et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600 (2019).
pubmed: 31138913 pmcid: 6777348 doi: 10.1038/s41587-019-0140-0
Gayoso, A. et al. scvi-tools: a library for deep probabilistic analysis of single-cell omics data. Preprint at bioRxiv https://doi.org/10.1101/2021.04.28.441833 (2021).
Wang, J. et al. Data denoising with transfer learning in single-cell transcriptomics. Nat. Methods 16, 875–878 (2019).
pubmed: 31471617 pmcid: 7781045 doi: 10.1038/s41592-019-0537-1
Stein-O’Brien, G. L. et al. Decomposing cell identity for transfer learning across cellular measurements, platforms, tissues, and species. Cell Syst. 8, 395–411 (2019).
pubmed: 31121116 pmcid: 6588402 doi: 10.1016/j.cels.2019.04.004
Lieberman, Y., Rokach, L. & Shay, T. CaSTLe—classification of single cells by transfer learning: harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments. PLoS ONE 13, e0205499 (2018).
pubmed: 30304022 pmcid: 6179251 doi: 10.1371/journal.pone.0205499
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
pubmed: 31178118 pmcid: 6687398 doi: 10.1016/j.cell.2019.05.031
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2020).
doi: 10.1016/j.cell.2021.04.048
Wang, X., Huang, T.-K. & Schneider, J. Active transfer learning under model shift. in Proceedings of the 31st International Conference on Machine Learning 1305–1313 (PMLR, 2014).
Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. Preprint at https://arxiv.org/abs/1907.02893 (2019).
Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).
pubmed: 31363220 doi: 10.1038/s41592-019-0494-8
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
pubmed: 30504886 pmcid: 6289068 doi: 10.1038/s41592-018-0229-2
Litvinukova, M. et al. Cells and gene expression programs in the adult human heart. Preprint at bioRxiv https://doi.org/10.1101/2020.04.03.024075 (2020).
Lopez, R., Regier, J., Jordan, M. I. & Yosef, N. Information constraints on auto-encoding variational Bayes. in Advances in Neural Information Processing Systems 6114–6125 (NIPS, 2018).
Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A. Conditional out-of-distribution generation for unpaired data using transfer VAE. Bioinformatics 36, i610–i617 (2020).
pubmed: 33381839 doi: 10.1093/bioinformatics/btaa800
Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021).
pubmed: 33491336 pmcid: 7829634 doi: 10.15252/msb.20209620
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).
pubmed: 33589839 pmcid: 7954949 doi: 10.1038/s41592-020-01050-x
Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030 (2018).
pubmed: 30096299 pmcid: 6447408 doi: 10.1016/j.cell.2018.07.028
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
pubmed: 29545511 pmcid: 7643870 doi: 10.1126/science.aam8999
Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014 (2018).
pubmed: 30096314 pmcid: 6086934 doi: 10.1016/j.cell.2018.06.021
Oetjen, K. A. et al. Human bone marrow assessment by single-cell RNA sequencing, mass cytometry, and flow cytometry. JCI Insight 3, e124928 (2018).
pmcid: 6328018 doi: 10.1172/jci.insight.124928
Freytag, S., Tian, L., Lönnstedt, I., Ng, M. & Bahlo, M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res. 7, 1297 (2018).
pubmed: 30228881 pmcid: 6124389 doi: 10.12688/f1000research.15809.1
Sun, Z. et al. A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies. Nat. Commun. 10, 1649 (2019).
pubmed: 30967541 pmcid: 6456731 doi: 10.1038/s41467-019-09639-3
10x Genomics. 10x Datasets Single Cell Gene Expression, Official 10x Genomics Support https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
pubmed: 31740819 pmcid: 6884693 doi: 10.1038/s41592-019-0619-0
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).
pubmed: 31178122 pmcid: 6716797 doi: 10.1016/j.cell.2019.05.006
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
pubmed: 31061482 pmcid: 6551256 doi: 10.1038/s41587-019-0113-3
Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
pubmed: 29608177 pmcid: 6152897 doi: 10.1038/nbt.4091
Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).
pubmed: 31308548 pmcid: 6684315 doi: 10.1038/s41592-019-0466-z
Bastidas-Ponce, A. et al. Comprehensive single cell mRNA profiling reveals a detailed roadmap for pancreatic endocrinogenesis. Development 146, dev173849 (2019).
pubmed: 31160421 doi: 10.1242/dev.173849
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Abdelall, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).
doi: 10.1186/s13059-019-1795-z
Stuart, T. et al. Comprehensive integration of single cell data. Cell 177, 1888–1902 (2019).
pubmed: 31178118 pmcid: 6687398 doi: 10.1016/j.cell.2019.05.031
Zhou, Z., Ye, C., Wang, J. & Zhang, N. R. Surface protein imputation from single cell transcriptomes by deep neural networks. Nat. Commun. 11, 651 (2020).
pubmed: 32005835 pmcid: 6994606 doi: 10.1038/s41467-020-14391-0
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
pubmed: 28759029 pmcid: 5669064 doi: 10.1038/nmeth.4380
Travaglini, K. J. et al. A molecular cell atlas of the human lung from single cell RNA sequencing. Nature 587, 619–625 (2020).
pubmed: 33208946 pmcid: 7704697 doi: 10.1038/s41586-020-2922-4
Reyfman, P. A. et al. Single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 199, 1517–1536 (2019).
pubmed: 30554520 pmcid: 6580683 doi: 10.1164/rccm.201712-2410OC
Madissoon, E. et al. scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation. Genome Biol. 21, 1 (2020).
doi: 10.1186/s13059-019-1906-x
Liao, M. et al. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat. Med. 26, 842–844 (2020).
pubmed: 32398875 doi: 10.1038/s41591-020-0901-9
Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 59 (2019).
pubmed: 30890159 pmcid: 6425583 doi: 10.1186/s13059-019-1663-x
Grant, R. A. et al. Circuits between infected macrophages and T cells in SARS-CoV-2 pneumonia. Nature 590, 635–641 (2021).
pubmed: 33429418 pmcid: 7987233 doi: 10.1038/s41586-020-03148-w
Muus, C. et al. Integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of SARS-CoV-2 viral entry and highlights inflammatory programs in putative target cells. Preprint at bioRxiv https://doi.org/10.1101/2020.04.19.049254 (2020).
Andrews, T. S. & Hemberg, M. False signals induced by single-cell imputation. F1000Res. 7, 1740 (2019).
pmcid: 6415334 doi: 10.12688/f1000research.16613.2
Schulte-Schrepping, J. et al. Severe COVID-19 is marked by a dysregulated myeloid cell compartment. Cell 182, 1419–1440 (2020).
pubmed: 32810438 pmcid: 7405822 doi: 10.1016/j.cell.2020.08.001
Wen, W. et al. Immune cell profiling of COVID-19 patients in the recovery stage by single-cell sequencing. Cell Discov. 6, 31 (2020).
pubmed: 32377375 pmcid: 7197635 doi: 10.1038/s41421-020-0168-9
Wilk, A. J. et al. A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med. 26, 1070–1076 (2020).
pubmed: 32514174 pmcid: 7382903 doi: 10.1038/s41591-020-0944-y
Lotfollahi, M. et al. Compositional perturbation autoencoder for single-cell response modeling. Preprint at bioRxiv https://doi.org/10.1101/2021.04.14.439903 (2021).
Lotfollahi, M., Dony, L., Agarwala, H. & Theis, F. Out-of-distribution prediction with disentangled representations for single-cell RNA sequencing data. in ICML 2020 Workshop on Computational Biology 37 (ICML, 2020).
Kelsey, G., Stegle, O. & Reik, W. Single-cell epigenomics: recording the past and predicting the future. Science 358, 69–75 (2017).
pubmed: 28983045 doi: 10.1126/science.aan6826
Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).
pubmed: 28825705 pmcid: 5764547 doi: 10.1038/nmeth.4402
Mirza, M. & Osindero, S. Conditional generative adversarial nets. Preprint at https://arxiv.org/abs/1411.1784 (2014).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at http://arxiv.org/abs/1312.6114 (2013).
Doersch, C. Tutorial on variational autoencoders. Preprint at https://arxiv.org/abs/1606.05908 (2016).
Sohn, K., Lee, H. & Yan, X. Learning structured output representation using deep conditional generative models. in Advances in Neural Information Processing Systems (eds. Cortes, C. et al.) 28, 3483–3491 (Curran Associates, 2015).
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B. & Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
pubmed: 30936559 doi: 10.1038/s41587-019-0071-9
Lun, A. T. L., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
pubmed: 27122128 doi: 10.1186/s13059-016-0947-7
Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).
pubmed: 27667365 pmcid: 5228327 doi: 10.1016/j.cels.2016.08.011
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).
pubmed: 27693023 pmcid: 5092539 doi: 10.1016/j.cels.2016.09.002
Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
pubmed: 27667667 pmcid: 5069352 doi: 10.1016/j.cmet.2016.08.020
Lawlor, N. et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 27, 208–222 (2016).
pubmed: 27864352 doi: 10.1101/gr.212720.116
Grün, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).
pubmed: 27345837 pmcid: 4985539 doi: 10.1016/j.stem.2016.05.010
Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).
pubmed: 28428369 pmcid: 5775029 doi: 10.1126/science.aah4573
Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
pubmed: 29409532 pmcid: 5802054 doi: 10.1186/s13059-017-1382-0
10x Genomics. 10k PBMCs from a Healthy Donor, Gene Expression and Cell Surface Protein https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_protein_v3 (2018).
10x Genomics. 5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor with Cell Surface Proteins (v3 Chemistry) https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3? (2019).
10x Genomics. 10k PBMCs from a Healthy Donor (v3 Chemistry) https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3?
Mould, K. J. et al. Airspace macrophages and monocytes exist in transcriptionally distinct subsets in healthy adults. Am. J. Respir. Crit. Care Med. 203, 946–956 (2020).
doi: 10.1164/rccm.202005-1989OC

Auteurs

Mohammad Lotfollahi (M)

Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.

Mohsen Naghipourfar (M)

Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.

Malte D Luecken (MD)

Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.

Matin Khajavi (M)

Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.

Maren Büttner (M)

Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.

Marco Wagenstetter (M)

Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.

Žiga Avsec (Ž)

Department of Computer Science, Technical University of Munich, Munich, Germany.

Adam Gayoso (A)

Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.

Nir Yosef (N)

Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
Chan Zuckerberg Biohub, San Francisco, CA, USA.
Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA.

Marta Interlandi (M)

Institute of Medical Informatics, University of Münster, Münster, Germany.

Sergei Rybakov (S)

Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
Department of Mathematics, Technical University of Munich, Munich, Germany.

Alexander V Misharin (AV)

Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.

Fabian J Theis (FJ)

Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany. fabian.theis@helmholtz-muenchen.de.
School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany. fabian.theis@helmholtz-muenchen.de.
Department of Mathematics, Technical University of Munich, Munich, Germany. fabian.theis@helmholtz-muenchen.de.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH