A benchmark of batch-effect correction methods for single-cell RNA sequencing data.
Batch correction
Batch effect
Differential gene expression
Integration
Single-cell RNA-seq
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
16 01 2020
16 01 2020
Historique:
received:
20
05
2019
accepted:
03
10
2019
entrez:
18
1
2020
pubmed:
18
1
2020
medline:
10
6
2020
Statut:
epublish
Résumé
Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal. We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression. Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.
Sections du résumé
BACKGROUND
Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal.
RESULTS
We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression.
CONCLUSION
Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.
Identifiants
pubmed: 31948481
doi: 10.1186/s13059-019-1850-9
pii: 10.1186/s13059-019-1850-9
pmc: PMC6964114
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
12Références
Genome Biol. 2017 Sep 12;18(1):174
pubmed: 28899397
Nat Methods. 2019 Jan;16(1):43-49
pubmed: 30573817
Cell. 2018 Aug 9;174(4):1015-1030.e16
pubmed: 30096299
Science. 2018 Apr 13;360(6385):176-182
pubmed: 29545511
Nat Methods. 2019 Dec;16(12):1289-1296
pubmed: 31740819
Diabetes. 2016 Oct;65(10):3028-38
pubmed: 27364731
Nat Biotechnol. 2019 Jun;37(6):685-691
pubmed: 31061482
Proc Natl Acad Sci U S A. 2019 May 14;116(20):9775-9784
pubmed: 31028141
Nat Immunol. 2012 Oct;13(10):1000-9
pubmed: 22902830
F1000Res. 2016 Aug 31;5:2122
pubmed: 27909575
Cell Syst. 2016 Oct 26;3(4):385-394.e3
pubmed: 27693023
Nature. 2018 Oct;562(7727):367-372
pubmed: 30283141
Cell. 2015 May 21;161(5):1202-1214
pubmed: 26000488
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Cell Syst. 2016 Oct 26;3(4):346-360.e4
pubmed: 27667365
Cell Metab. 2016 Oct 11;24(4):608-615
pubmed: 27667665
R J. 2016 Aug;8(1):289-317
pubmed: 27818791
Science. 2017 Apr 21;356(6335):
pubmed: 28428369
Bioinformatics. 2017 Aug 15;33(16):2539-2546
pubmed: 28419223
Genome Biol. 2018 Feb 6;19(1):15
pubmed: 29409532
Bioinformatics. 2019 Aug 10;:null
pubmed: 31400197
Nat Methods. 2014 Jul;11(7):740-2
pubmed: 24836921
Nat Biotechnol. 2018 Jun;36(5):411-420
pubmed: 29608179
Cell. 2016 Aug 25;166(5):1308-1323.e30
pubmed: 27565351
Cell. 2015 Dec 17;163(7):1663-77
pubmed: 26627738
Nat Biotechnol. 2018 Jun;36(5):421-427
pubmed: 29608177
Nat Commun. 2018 Jan 18;9(1):284
pubmed: 29348443
Cell. 2018 Feb 22;172(5):1091-1107.e17
pubmed: 29474909
Cell. 2019 Jun 13;177(7):1888-1902.e21
pubmed: 31178118
Cell Metab. 2016 Oct 11;24(4):593-607
pubmed: 27667667
Nat Commun. 2017 Jan 16;8:14049
pubmed: 28091601
Neural Comput. 2004 Dec;16(12):2639-64
pubmed: 15516276
Methods. 2003 Dec;31(4):265-73
pubmed: 14597310
Blood. 2016 Aug 25;128(8):e20-31
pubmed: 27365425