A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
09 2021
09 2021
Historique:
received:
10
05
2019
accepted:
22
10
2020
pubmed:
23
12
2020
medline:
23
9
2021
entrez:
22
12
2020
Statut:
ppublish
Résumé
Comparing diverse single-cell RNA sequencing (scRNA-seq) datasets generated by different technologies and in different laboratories remains a major challenge. Here we address the need for guidance in choosing algorithms leading to accurate biological interpretations of varied data types acquired with different platforms. Using two well-characterized cellular reference samples (breast cancer cells and B cells), captured either separately or in mixtures, we compared different scRNA-seq platforms and several preprocessing, normalization and batch-effect correction methods at multiple centers. Although preprocessing and normalization contributed to variability in gene detection and cell classification, batch-effect correction was by far the most important factor in correctly classifying the cells. Moreover, scRNA-seq dataset characteristics (for example, sample and cellular heterogeneity and platform used) were critical in determining the optimal bioinformatic method. However, reproducibility across centers and platforms was high when appropriate bioinformatic methods were applied. Our findings offer practical guidance for optimizing platform and software selection when designing an scRNA-seq study.
Identifiants
pubmed: 33349700
doi: 10.1038/s41587-020-00748-9
pii: 10.1038/s41587-020-00748-9
doi:
Types de publication
Journal Article
Multicenter Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1103-1114Subventions
Organisme : NIH HHS
ID : S10 OD019960
Pays : United States
Organisme : U.S. Department of Health & Human Services | NIH | NIH Office of the Director (OD)
ID : S10OD019960
Informations de copyright
© 2020. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
pubmed: 26000487
pmcid: 4441768
doi: 10.1016/j.cell.2015.04.044
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
pubmed: 26000488
pmcid: 4481139
doi: 10.1016/j.cell.2015.05.002
Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395–398 (2017).
pubmed: 28192419
pmcid: 5376227
doi: 10.1038/nmeth.4179
Liu, T., Wu, H., Wu, S. & Wang, C. Single-cell sequencing technologies for cardiac stem cell studies. Stem Cells Dev. 26, 1540–1551 (2017).
pubmed: 28859577
doi: 10.1089/scd.2017.0050
Wu, H., Wang, C. & Wu, S. Single-cell sequencing for drug discovery and drug development. Curr. Top. Med. Chem. 17, 1769–1777 (2017).
pubmed: 27848892
doi: 10.2174/1568026617666161116145358
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
pubmed: 29608177
pmcid: 6152897
doi: 10.1038/nbt.4091
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
pubmed: 29608179
pmcid: 6700744
doi: 10.1038/nbt.4096
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
pubmed: 31061482
pmcid: 6551256
doi: 10.1038/s41587-019-0113-3
Polanski, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics. 36, 964–965 (2019).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
pubmed: 31740819
pmcid: 6884693
doi: 10.1038/s41592-019-0619-0
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
pubmed: 30936559
doi: 10.1038/s41587-019-0071-9
Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643 (2017).
pubmed: 28212749
doi: 10.1016/j.molcel.2017.01.023
Zhang, X. et al. Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems. Mol. Cell 73, 130–142 (2019).
pubmed: 30472192
doi: 10.1016/j.molcel.2018.10.020
Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017).
pubmed: 28263961
pmcid: 5376499
doi: 10.1038/nmeth.4220
Mereu, E. et al. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat. Biotechnol. 38, 747–755 (2020).
pubmed: 32518403
doi: 10.1038/s41587-020-0469-4
Tian, L. et al. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat. Methods 16, 479–487 (2019).
doi: 10.1038/s41592-019-0425-8
pubmed: 31133762
Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).
pubmed: 31948481
pmcid: 6964114
doi: 10.1186/s13059-019-1850-9
Gazdar, A. F. et al. Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78, 766–774 (1998).
pubmed: 9833771
doi: 10.1002/(SICI)1097-0215(19981209)78:6<766::AID-IJC15>3.0.CO;2-L
Xiao, W. et al. Towards best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat. Biotechnol. (in the press).
Zhang, J., Spath, S. S., Marjani, S. L., Zhang, W. & Pan, X. Characterization of cancer genomic heterogeneity by next-generation sequencing advances precision medicine in cancer treatment. Precis. Clin. Med. 1, 29–48 (2018).
pubmed: 30687561
pmcid: 6333046
doi: 10.1093/pcmedi/pby007
Chen, X. et al. A multi-center cross-platform single-cell RNA sequencing reference dataset. Preprint at bioRxiv https://doi.org/10.1101/2020.09.20.305474 (2020).
Zhang, M. J., Ntranos, V. & Tse, D. Determining sequencing depth in a single-cell RNA-seq experiment. Nat. Commun. 11, 774 (2020).
pubmed: 32034137
pmcid: 7005864
doi: 10.1038/s41467-020-14482-y
Li, B. et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat. Methods 17, 793–798 (2020).
pubmed: 32719530
pmcid: 7437817
doi: 10.1038/s41592-020-0905-x
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
pubmed: 28100584
pmcid: 5340976
doi: 10.1101/gr.209601.116
Parekh, S., Ziegenhain, C., Vieth, B., Enard, W. & Hellmann, I. zUMIs—a fast and flexible pipeline to process RNA sequencing data with UMIs. Gigascience 7, giy059 (2018).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
doi: 10.1093/bioinformatics/btt656
pubmed: 24227677
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
doi: 10.1038/nbt.3519
pubmed: 27043002
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
pubmed: 21816040
pmcid: 3163565
doi: 10.1186/1471-2105-12-323
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
doi: 10.14806/ej.17.1.200
Bolger, A. M., Lohse, M. & Usadel, B. J. B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
pubmed: 24695404
pmcid: 4103590
doi: 10.1093/bioinformatics/btu170
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886
doi: 10.1093/bioinformatics/bts635
Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics. 19, 562–578 (2017).
pmcid: 6215955
doi: 10.1093/biostatistics/kxx053
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
pubmed: 25150836
pmcid: 4404308
doi: 10.1038/nbt.2931
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
pubmed: 31870423
pmcid: 6927181
doi: 10.1186/s13059-019-1874-1
Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
pubmed: 27122128
doi: 10.1186/s13059-016-0947-7
Bacher, R. et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14, 584–586 (2017).
pubmed: 28418000
pmcid: 5473255
doi: 10.1038/nmeth.4263
Yip, S. H., Wang, P., Kocher, J.-P. A., Sham, P. C. & Wang, J. Linnorm: improved statistical analysis for single cell RNA-seq expression data. Nucleic Acids Res. 45, e179 (2017).
pubmed: 28981748
pmcid: 5727406
doi: 10.1093/nar/gkx828
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
pubmed: 31178118
pmcid: 6687398
doi: 10.1016/j.cell.2019.05.031
Yip, S. H., Sham, P. C. & Wang, J. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Brief Bioinform. 20, 1583–1589 (2018).
Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).
pubmed: 25599176
doi: 10.1038/nbt.3102
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
pubmed: 29227470
doi: 10.1038/nbt.4042
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
pubmed: 25605792
pmcid: 4402510
doi: 10.1093/nar/gkv007
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
pubmed: 22257669
pmcid: 3307112
doi: 10.1093/bioinformatics/bts034
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).
Buttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
pubmed: 30573817
doi: 10.1038/s41592-018-0254-1
Kaminski, D. A., Wei, C., Qian, Y., Rosenberg, A. F. & Sanz, I. Advances in human B cell phenotypic profiling. Front. Immunol. 3, 302 (2012).
pubmed: 23087687
pmcid: 3467643
doi: 10.3389/fimmu.2012.00302
Starlets, D. et al. Cell-surface CD74 initiates a signaling cascade leading to cell proliferation and survival. Blood 107, 4807–4816 (2006).
pubmed: 16484589
doi: 10.1182/blood-2005-11-4334
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
pubmed: 24531798
doi: 10.1038/nbt.2835
Alles, J. et al. Cell fixation and preservation for droplet-based single-cell transcriptomics. BMC Biol. 15, 44 (2017).
pubmed: 28526029
pmcid: 5438562
doi: 10.1186/s12915-017-0383-5
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Krueger, F. Trim Galore! http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (2015).
Cole, M. B. et al. Performance assessment and selection of normalization procedures for single-cell RNA-seq. Cell Syst. 8, 315–328 (2019).
pubmed: 31022373
pmcid: 6544759
doi: 10.1016/j.cels.2019.03.010
Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nat. Methods 14, 309–315 (2017).
pubmed: 28114287
pmcid: 5330805
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
pubmed: 24658644
pmcid: 4122333
doi: 10.1038/nbt.2859