Mapping single-cell data to reference atlases by transfer learning.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
01 2022
01 2022
Historique:
received:
30
07
2020
accepted:
28
06
2021
pubmed:
1
9
2021
medline:
1
2
2022
entrez:
31
8
2021
Statut:
ppublish
Résumé
Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
Identifiants
pubmed: 34462589
doi: 10.1038/s41587-021-01001-7
pii: 10.1038/s41587-021-01001-7
pmc: PMC8763644
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
121-130Subventions
Organisme : NHGRI NIH HHS
ID : T32 HG000047
Pays : United States
Organisme : NIAID NIH HHS
ID : U19 AI135964
Pays : United States
Organisme : Deutsche Forschungsgemeinschaft (German Research Foundation)
ID : ZT-I-0007
Informations de copyright
© 2021. The Author(s).
Références
Schaum, N., Karkanias, J., Neff, N. & Pisco, A. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
pmcid: 6642641
doi: 10.1038/s41586-018-0590-4
Han, X. et al. Mapping the mouse cell atlas by Microwell-seq. Cell 172, 1091–1107 (2018).
pubmed: 29474909
doi: 10.1016/j.cell.2018.02.001
The Tabula Muris Consortium et al. A single cell transcriptomic atlas characterizes aging tissues in the mouse. Preprint at bioRxiv https://doi.org/10.1101/661728 (2020).
Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).
pubmed: 32214235
doi: 10.1038/s41586-020-2157-4
10x Genomics. 10x Datasets Single Cell Gene Expression, Official 10x Genomics Support. https://www.10xgenomics.com/resources/datasets/
Regev, A. et al. Science forum: the human cell atlas. eLife 6, e27041 (2017).
pubmed: 29206104
pmcid: 5762154
doi: 10.7554/eLife.27041
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Preprint at bioRxiv https://doi.org/10.1101/2020.05.22.111161 (2020).
Zheng, H. et al. Cross-domain fault diagnosis using knowledge transfer strategy: a review. IEEE Access 7, 129260–129290 (2019).
doi: 10.1109/ACCESS.2019.2939876
Ruder, S., Peters, M. E., Swayamdipta, S. & Wolf, T. Transfer learning in natural language processing. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics 15–18 (ACL, 2019).
Yang, L., Hanneke, S. & Carbonell, J. A theory of transfer learning with applications to active learning. Mach. Learn. 90, 161–189 (2013).
doi: 10.1007/s10994-012-5310-y
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. in Proceedings of the 25th International Conference on Neural Information Processing Systems 1097–1105 (NIPS, 2012).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805v2 (2018).
Hsu, Y.-C., Lv, Z. & Kira, Z. Learning to cluster in order to transfer across domains and tasks. Preprint at https://arxiv.org/abs/1711.10125 (2017).
Shin, H.-C. et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35, 1285–1298 (2016).
pubmed: 26886976
doi: 10.1109/TMI.2016.2528162
Dahl, G. E., Yu, D., Deng, L. & Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2011).
doi: 10.1109/TASL.2011.2134090
Ker, J., Wang, L., Rao, J. & Lim, T. Deep learning applications in medical image analysis. IEEE Access 6, 9375–9389 (2017).
doi: 10.1109/ACCESS.2017.2788044
Avsec, Ž. et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600 (2019).
pubmed: 31138913
pmcid: 6777348
doi: 10.1038/s41587-019-0140-0
Gayoso, A. et al. scvi-tools: a library for deep probabilistic analysis of single-cell omics data. Preprint at bioRxiv https://doi.org/10.1101/2021.04.28.441833 (2021).
Wang, J. et al. Data denoising with transfer learning in single-cell transcriptomics. Nat. Methods 16, 875–878 (2019).
pubmed: 31471617
pmcid: 7781045
doi: 10.1038/s41592-019-0537-1
Stein-O’Brien, G. L. et al. Decomposing cell identity for transfer learning across cellular measurements, platforms, tissues, and species. Cell Syst. 8, 395–411 (2019).
pubmed: 31121116
pmcid: 6588402
doi: 10.1016/j.cels.2019.04.004
Lieberman, Y., Rokach, L. & Shay, T. CaSTLe—classification of single cells by transfer learning: harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments. PLoS ONE 13, e0205499 (2018).
pubmed: 30304022
pmcid: 6179251
doi: 10.1371/journal.pone.0205499
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
pubmed: 31178118
pmcid: 6687398
doi: 10.1016/j.cell.2019.05.031
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2020).
doi: 10.1016/j.cell.2021.04.048
Wang, X., Huang, T.-K. & Schneider, J. Active transfer learning under model shift. in Proceedings of the 31st International Conference on Machine Learning 1305–1313 (PMLR, 2014).
Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. Preprint at https://arxiv.org/abs/1907.02893 (2019).
Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).
pubmed: 31363220
doi: 10.1038/s41592-019-0494-8
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
pubmed: 30504886
pmcid: 6289068
doi: 10.1038/s41592-018-0229-2
Litvinukova, M. et al. Cells and gene expression programs in the adult human heart. Preprint at bioRxiv https://doi.org/10.1101/2020.04.03.024075 (2020).
Lopez, R., Regier, J., Jordan, M. I. & Yosef, N. Information constraints on auto-encoding variational Bayes. in Advances in Neural Information Processing Systems 6114–6125 (NIPS, 2018).
Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A. Conditional out-of-distribution generation for unpaired data using transfer VAE. Bioinformatics 36, i610–i617 (2020).
pubmed: 33381839
doi: 10.1093/bioinformatics/btaa800
Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021).
pubmed: 33491336
pmcid: 7829634
doi: 10.15252/msb.20209620
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).
pubmed: 33589839
pmcid: 7954949
doi: 10.1038/s41592-020-01050-x
Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030 (2018).
pubmed: 30096299
pmcid: 6447408
doi: 10.1016/j.cell.2018.07.028
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
pubmed: 29545511
pmcid: 7643870
doi: 10.1126/science.aam8999
Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014 (2018).
pubmed: 30096314
pmcid: 6086934
doi: 10.1016/j.cell.2018.06.021
Oetjen, K. A. et al. Human bone marrow assessment by single-cell RNA sequencing, mass cytometry, and flow cytometry. JCI Insight 3, e124928 (2018).
pmcid: 6328018
doi: 10.1172/jci.insight.124928
Freytag, S., Tian, L., Lönnstedt, I., Ng, M. & Bahlo, M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res. 7, 1297 (2018).
pubmed: 30228881
pmcid: 6124389
doi: 10.12688/f1000research.15809.1
Sun, Z. et al. A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies. Nat. Commun. 10, 1649 (2019).
pubmed: 30967541
pmcid: 6456731
doi: 10.1038/s41467-019-09639-3
10x Genomics. 10x Datasets Single Cell Gene Expression, Official 10x Genomics Support https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
pubmed: 31740819
pmcid: 6884693
doi: 10.1038/s41592-019-0619-0
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).
pubmed: 31178122
pmcid: 6716797
doi: 10.1016/j.cell.2019.05.006
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
pubmed: 31061482
pmcid: 6551256
doi: 10.1038/s41587-019-0113-3
Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
pubmed: 29608177
pmcid: 6152897
doi: 10.1038/nbt.4091
Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).
pubmed: 31308548
pmcid: 6684315
doi: 10.1038/s41592-019-0466-z
Bastidas-Ponce, A. et al. Comprehensive single cell mRNA profiling reveals a detailed roadmap for pancreatic endocrinogenesis. Development 146, dev173849 (2019).
pubmed: 31160421
doi: 10.1242/dev.173849
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Abdelall, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).
doi: 10.1186/s13059-019-1795-z
Stuart, T. et al. Comprehensive integration of single cell data. Cell 177, 1888–1902 (2019).
pubmed: 31178118
pmcid: 6687398
doi: 10.1016/j.cell.2019.05.031
Zhou, Z., Ye, C., Wang, J. & Zhang, N. R. Surface protein imputation from single cell transcriptomes by deep neural networks. Nat. Commun. 11, 651 (2020).
pubmed: 32005835
pmcid: 6994606
doi: 10.1038/s41467-020-14391-0
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
pubmed: 28759029
pmcid: 5669064
doi: 10.1038/nmeth.4380
Travaglini, K. J. et al. A molecular cell atlas of the human lung from single cell RNA sequencing. Nature 587, 619–625 (2020).
pubmed: 33208946
pmcid: 7704697
doi: 10.1038/s41586-020-2922-4
Reyfman, P. A. et al. Single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 199, 1517–1536 (2019).
pubmed: 30554520
pmcid: 6580683
doi: 10.1164/rccm.201712-2410OC
Madissoon, E. et al. scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation. Genome Biol. 21, 1 (2020).
doi: 10.1186/s13059-019-1906-x
Liao, M. et al. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat. Med. 26, 842–844 (2020).
pubmed: 32398875
doi: 10.1038/s41591-020-0901-9
Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 59 (2019).
pubmed: 30890159
pmcid: 6425583
doi: 10.1186/s13059-019-1663-x
Grant, R. A. et al. Circuits between infected macrophages and T cells in SARS-CoV-2 pneumonia. Nature 590, 635–641 (2021).
pubmed: 33429418
pmcid: 7987233
doi: 10.1038/s41586-020-03148-w
Muus, C. et al. Integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of SARS-CoV-2 viral entry and highlights inflammatory programs in putative target cells. Preprint at bioRxiv https://doi.org/10.1101/2020.04.19.049254 (2020).
Andrews, T. S. & Hemberg, M. False signals induced by single-cell imputation. F1000Res. 7, 1740 (2019).
pmcid: 6415334
doi: 10.12688/f1000research.16613.2
Schulte-Schrepping, J. et al. Severe COVID-19 is marked by a dysregulated myeloid cell compartment. Cell 182, 1419–1440 (2020).
pubmed: 32810438
pmcid: 7405822
doi: 10.1016/j.cell.2020.08.001
Wen, W. et al. Immune cell profiling of COVID-19 patients in the recovery stage by single-cell sequencing. Cell Discov. 6, 31 (2020).
pubmed: 32377375
pmcid: 7197635
doi: 10.1038/s41421-020-0168-9
Wilk, A. J. et al. A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med. 26, 1070–1076 (2020).
pubmed: 32514174
pmcid: 7382903
doi: 10.1038/s41591-020-0944-y
Lotfollahi, M. et al. Compositional perturbation autoencoder for single-cell response modeling. Preprint at bioRxiv https://doi.org/10.1101/2021.04.14.439903 (2021).
Lotfollahi, M., Dony, L., Agarwala, H. & Theis, F. Out-of-distribution prediction with disentangled representations for single-cell RNA sequencing data. in ICML 2020 Workshop on Computational Biology 37 (ICML, 2020).
Kelsey, G., Stegle, O. & Reik, W. Single-cell epigenomics: recording the past and predicting the future. Science 358, 69–75 (2017).
pubmed: 28983045
doi: 10.1126/science.aan6826
Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).
pubmed: 28825705
pmcid: 5764547
doi: 10.1038/nmeth.4402
Mirza, M. & Osindero, S. Conditional generative adversarial nets. Preprint at https://arxiv.org/abs/1411.1784 (2014).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at http://arxiv.org/abs/1312.6114 (2013).
Doersch, C. Tutorial on variational autoencoders. Preprint at https://arxiv.org/abs/1606.05908 (2016).
Sohn, K., Lee, H. & Yan, X. Learning structured output representation using deep conditional generative models. in Advances in Neural Information Processing Systems (eds. Cortes, C. et al.) 28, 3483–3491 (Curran Associates, 2015).
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B. & Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
pubmed: 30936559
doi: 10.1038/s41587-019-0071-9
Lun, A. T. L., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
pubmed: 27122128
doi: 10.1186/s13059-016-0947-7
Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).
pubmed: 27667365
pmcid: 5228327
doi: 10.1016/j.cels.2016.08.011
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).
pubmed: 27693023
pmcid: 5092539
doi: 10.1016/j.cels.2016.09.002
Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
pubmed: 27667667
pmcid: 5069352
doi: 10.1016/j.cmet.2016.08.020
Lawlor, N. et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 27, 208–222 (2016).
pubmed: 27864352
doi: 10.1101/gr.212720.116
Grün, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).
pubmed: 27345837
pmcid: 4985539
doi: 10.1016/j.stem.2016.05.010
Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).
pubmed: 28428369
pmcid: 5775029
doi: 10.1126/science.aah4573
Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
pubmed: 29409532
pmcid: 5802054
doi: 10.1186/s13059-017-1382-0
10x Genomics. 10k PBMCs from a Healthy Donor, Gene Expression and Cell Surface Protein https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_protein_v3 (2018).
10x Genomics. 5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor with Cell Surface Proteins (v3 Chemistry) https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3? (2019).
10x Genomics. 10k PBMCs from a Healthy Donor (v3 Chemistry) https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3?
Mould, K. J. et al. Airspace macrophages and monocytes exist in transcriptionally distinct subsets in healthy adults. Am. J. Respir. Crit. Care Med. 203, 946–956 (2020).
doi: 10.1164/rccm.202005-1989OC