A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
09 2021
Historique:
received: 10 05 2019
accepted: 22 10 2020
pubmed: 23 12 2020
medline: 23 9 2021
entrez: 22 12 2020
Statut: ppublish

Résumé

Comparing diverse single-cell RNA sequencing (scRNA-seq) datasets generated by different technologies and in different laboratories remains a major challenge. Here we address the need for guidance in choosing algorithms leading to accurate biological interpretations of varied data types acquired with different platforms. Using two well-characterized cellular reference samples (breast cancer cells and B cells), captured either separately or in mixtures, we compared different scRNA-seq platforms and several preprocessing, normalization and batch-effect correction methods at multiple centers. Although preprocessing and normalization contributed to variability in gene detection and cell classification, batch-effect correction was by far the most important factor in correctly classifying the cells. Moreover, scRNA-seq dataset characteristics (for example, sample and cellular heterogeneity and platform used) were critical in determining the optimal bioinformatic method. However, reproducibility across centers and platforms was high when appropriate bioinformatic methods were applied. Our findings offer practical guidance for optimizing platform and software selection when designing an scRNA-seq study.

Identifiants

pubmed: 33349700
doi: 10.1038/s41587-020-00748-9
pii: 10.1038/s41587-020-00748-9
doi:

Types de publication

Journal Article Multicenter Study Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1103-1114

Subventions

Organisme : NIH HHS
ID : S10 OD019960
Pays : United States
Organisme : U.S. Department of Health & Human Services | NIH | NIH Office of the Director (OD)
ID : S10OD019960

Informations de copyright

© 2020. The Author(s), under exclusive licence to Springer Nature America, Inc.

Références

Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
pubmed: 26000487 pmcid: 4441768 doi: 10.1016/j.cell.2015.04.044
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
pubmed: 26000488 pmcid: 4481139 doi: 10.1016/j.cell.2015.05.002
Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395–398 (2017).
pubmed: 28192419 pmcid: 5376227 doi: 10.1038/nmeth.4179
Liu, T., Wu, H., Wu, S. & Wang, C. Single-cell sequencing technologies for cardiac stem cell studies. Stem Cells Dev. 26, 1540–1551 (2017).
pubmed: 28859577 doi: 10.1089/scd.2017.0050
Wu, H., Wang, C. & Wu, S. Single-cell sequencing for drug discovery and drug development. Curr. Top. Med. Chem. 17, 1769–1777 (2017).
pubmed: 27848892 doi: 10.2174/1568026617666161116145358
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
pubmed: 29608177 pmcid: 6152897 doi: 10.1038/nbt.4091
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
pubmed: 29608179 pmcid: 6700744 doi: 10.1038/nbt.4096
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
pubmed: 31061482 pmcid: 6551256 doi: 10.1038/s41587-019-0113-3
Polanski, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics. 36, 964–965 (2019).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
pubmed: 31740819 pmcid: 6884693 doi: 10.1038/s41592-019-0619-0
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
pubmed: 30936559 doi: 10.1038/s41587-019-0071-9
Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643 (2017).
pubmed: 28212749 doi: 10.1016/j.molcel.2017.01.023
Zhang, X. et al. Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems. Mol. Cell 73, 130–142 (2019).
pubmed: 30472192 doi: 10.1016/j.molcel.2018.10.020
Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017).
pubmed: 28263961 pmcid: 5376499 doi: 10.1038/nmeth.4220
Mereu, E. et al. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat. Biotechnol. 38, 747–755 (2020).
pubmed: 32518403 doi: 10.1038/s41587-020-0469-4
Tian, L. et al. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat. Methods 16, 479–487 (2019).
doi: 10.1038/s41592-019-0425-8 pubmed: 31133762
Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).
pubmed: 31948481 pmcid: 6964114 doi: 10.1186/s13059-019-1850-9
Gazdar, A. F. et al. Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78, 766–774 (1998).
pubmed: 9833771 doi: 10.1002/(SICI)1097-0215(19981209)78:6<766::AID-IJC15>3.0.CO;2-L
Xiao, W. et al. Towards best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat. Biotechnol. (in the press).
Zhang, J., Spath, S. S., Marjani, S. L., Zhang, W. & Pan, X. Characterization of cancer genomic heterogeneity by next-generation sequencing advances precision medicine in cancer treatment. Precis. Clin. Med. 1, 29–48 (2018).
pubmed: 30687561 pmcid: 6333046 doi: 10.1093/pcmedi/pby007
Chen, X. et al. A multi-center cross-platform single-cell RNA sequencing reference dataset. Preprint at bioRxiv https://doi.org/10.1101/2020.09.20.305474 (2020).
Zhang, M. J., Ntranos, V. & Tse, D. Determining sequencing depth in a single-cell RNA-seq experiment. Nat. Commun. 11, 774 (2020).
pubmed: 32034137 pmcid: 7005864 doi: 10.1038/s41467-020-14482-y
Li, B. et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat. Methods 17, 793–798 (2020).
pubmed: 32719530 pmcid: 7437817 doi: 10.1038/s41592-020-0905-x
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
pubmed: 28100584 pmcid: 5340976 doi: 10.1101/gr.209601.116
Parekh, S., Ziegenhain, C., Vieth, B., Enard, W. & Hellmann, I. zUMIs—a fast and flexible pipeline to process RNA sequencing data with UMIs. Gigascience 7, giy059 (2018).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
doi: 10.1093/bioinformatics/btt656 pubmed: 24227677
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
doi: 10.1038/nbt.3519 pubmed: 27043002
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
pubmed: 21816040 pmcid: 3163565 doi: 10.1186/1471-2105-12-323
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
doi: 10.14806/ej.17.1.200
Bolger, A. M., Lohse, M. & Usadel, B. J. B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
pubmed: 24695404 pmcid: 4103590 doi: 10.1093/bioinformatics/btu170
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886 doi: 10.1093/bioinformatics/bts635
Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics. 19, 562–578 (2017).
pmcid: 6215955 doi: 10.1093/biostatistics/kxx053
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
pubmed: 25150836 pmcid: 4404308 doi: 10.1038/nbt.2931
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
pubmed: 31870423 pmcid: 6927181 doi: 10.1186/s13059-019-1874-1
Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
pubmed: 27122128 doi: 10.1186/s13059-016-0947-7
Bacher, R. et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14, 584–586 (2017).
pubmed: 28418000 pmcid: 5473255 doi: 10.1038/nmeth.4263
Yip, S. H., Wang, P., Kocher, J.-P. A., Sham, P. C. & Wang, J. Linnorm: improved statistical analysis for single cell RNA-seq expression data. Nucleic Acids Res. 45, e179 (2017).
pubmed: 28981748 pmcid: 5727406 doi: 10.1093/nar/gkx828
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
pubmed: 31178118 pmcid: 6687398 doi: 10.1016/j.cell.2019.05.031
Yip, S. H., Sham, P. C. & Wang, J. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Brief Bioinform. 20, 1583–1589 (2018).
Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).
pubmed: 25599176 doi: 10.1038/nbt.3102
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
pubmed: 29227470 doi: 10.1038/nbt.4042
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
pubmed: 25605792 pmcid: 4402510 doi: 10.1093/nar/gkv007
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
pubmed: 22257669 pmcid: 3307112 doi: 10.1093/bioinformatics/bts034
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).
Buttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
pubmed: 30573817 doi: 10.1038/s41592-018-0254-1
Kaminski, D. A., Wei, C., Qian, Y., Rosenberg, A. F. & Sanz, I. Advances in human B cell phenotypic profiling. Front. Immunol. 3, 302 (2012).
pubmed: 23087687 pmcid: 3467643 doi: 10.3389/fimmu.2012.00302
Starlets, D. et al. Cell-surface CD74 initiates a signaling cascade leading to cell proliferation and survival. Blood 107, 4807–4816 (2006).
pubmed: 16484589 doi: 10.1182/blood-2005-11-4334
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
pubmed: 24531798 doi: 10.1038/nbt.2835
Alles, J. et al. Cell fixation and preservation for droplet-based single-cell transcriptomics. BMC Biol. 15, 44 (2017).
pubmed: 28526029 pmcid: 5438562 doi: 10.1186/s12915-017-0383-5
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Krueger, F. Trim Galore! http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (2015).
Cole, M. B. et al. Performance assessment and selection of normalization procedures for single-cell RNA-seq. Cell Syst. 8, 315–328 (2019).
pubmed: 31022373 pmcid: 6544759 doi: 10.1016/j.cels.2019.03.010
Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nat. Methods 14, 309–315 (2017).
pubmed: 28114287 pmcid: 5330805
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
pubmed: 24658644 pmcid: 4122333 doi: 10.1038/nbt.2859

Auteurs

Wanqiu Chen (W)

Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA, USA.

Yongmei Zhao (Y)

CCR-SF Bioinformatics Group, Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
Sequencing Facility, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.

Xin Chen (X)

Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA, USA.
Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA, USA.

Zhaowei Yang (Z)

Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA, USA.
Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, People's Republic of China.

Xiaojiang Xu (X)

Integrative Bioinformatics Support Group, National Institute of Environment Health Sciences, Research Triangle Park, NC, USA.

Yingtao Bi (Y)

Abbvie Cambridge Research Center, Cambridge, MA, USA.

Vicky Chen (V)

CCR-SF Bioinformatics Group, Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
Sequencing Facility, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.

Jing Li (J)

Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA, USA.
Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, People's Republic of China.

Hannah Choi (H)

Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA, USA.

Ben Ernest (B)

Digicon Corporation, McLean, VA, USA.

Bao Tran (B)

Sequencing Facility, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.

Monika Mehta (M)

Sequencing Facility, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.

Parimal Kumar (P)

Sequencing Facility, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.

Andrew Farmer (A)

Takara Bio USA, Inc., Mountain View, CA, USA.

Alain Mir (A)

Takara Bio USA, Inc., Mountain View, CA, USA.

Urvashi Ann Mehra (UA)

Digicon Corporation, McLean, VA, USA.

Jian-Liang Li (JL)

Integrative Bioinformatics Support Group, National Institute of Environment Health Sciences, Research Triangle Park, NC, USA.

Malcolm Moos (M)

Center for Biologics Evaluation and Research & Division of Cellular and Gene Therapies, U.S. Food and Drug Administration, Silver Spring, MD, USA.

Wenming Xiao (W)

The Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD, USA. wenming.xiao@fda.hhs.gov.

Charles Wang (C)

Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA, USA. oxwang@gmail.com.
Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA, USA. oxwang@gmail.com.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH