A benchmark of batch-effect correction methods for single-cell RNA sequencing data.

Batch correction Batch effect Differential gene expression Integration Single-cell RNA-seq

Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
16 01 2020
Historique:
received: 20 05 2019
accepted: 03 10 2019
entrez: 18 1 2020
pubmed: 18 1 2020
medline: 10 6 2020
Statut: epublish

Résumé

Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal. We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression. Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

Sections du résumé

BACKGROUND
Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal.
RESULTS
We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression.
CONCLUSION
Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

Identifiants

pubmed: 31948481
doi: 10.1186/s13059-019-1850-9
pii: 10.1186/s13059-019-1850-9
pmc: PMC6964114
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

12

Références

Genome Biol. 2017 Sep 12;18(1):174
pubmed: 28899397
Nat Methods. 2019 Jan;16(1):43-49
pubmed: 30573817
Cell. 2018 Aug 9;174(4):1015-1030.e16
pubmed: 30096299
Science. 2018 Apr 13;360(6385):176-182
pubmed: 29545511
Nat Methods. 2019 Dec;16(12):1289-1296
pubmed: 31740819
Diabetes. 2016 Oct;65(10):3028-38
pubmed: 27364731
Nat Biotechnol. 2019 Jun;37(6):685-691
pubmed: 31061482
Proc Natl Acad Sci U S A. 2019 May 14;116(20):9775-9784
pubmed: 31028141
Nat Immunol. 2012 Oct;13(10):1000-9
pubmed: 22902830
F1000Res. 2016 Aug 31;5:2122
pubmed: 27909575
Cell Syst. 2016 Oct 26;3(4):385-394.e3
pubmed: 27693023
Nature. 2018 Oct;562(7727):367-372
pubmed: 30283141
Cell. 2015 May 21;161(5):1202-1214
pubmed: 26000488
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Cell Syst. 2016 Oct 26;3(4):346-360.e4
pubmed: 27667365
Cell Metab. 2016 Oct 11;24(4):608-615
pubmed: 27667665
R J. 2016 Aug;8(1):289-317
pubmed: 27818791
Science. 2017 Apr 21;356(6335):
pubmed: 28428369
Bioinformatics. 2017 Aug 15;33(16):2539-2546
pubmed: 28419223
Genome Biol. 2018 Feb 6;19(1):15
pubmed: 29409532
Bioinformatics. 2019 Aug 10;:null
pubmed: 31400197
Nat Methods. 2014 Jul;11(7):740-2
pubmed: 24836921
Nat Biotechnol. 2018 Jun;36(5):411-420
pubmed: 29608179
Cell. 2016 Aug 25;166(5):1308-1323.e30
pubmed: 27565351
Cell. 2015 Dec 17;163(7):1663-77
pubmed: 26627738
Nat Biotechnol. 2018 Jun;36(5):421-427
pubmed: 29608177
Nat Commun. 2018 Jan 18;9(1):284
pubmed: 29348443
Cell. 2018 Feb 22;172(5):1091-1107.e17
pubmed: 29474909
Cell. 2019 Jun 13;177(7):1888-1902.e21
pubmed: 31178118
Cell Metab. 2016 Oct 11;24(4):593-607
pubmed: 27667667
Nat Commun. 2017 Jan 16;8:14049
pubmed: 28091601
Neural Comput. 2004 Dec;16(12):2639-64
pubmed: 15516276
Methods. 2003 Dec;31(4):265-73
pubmed: 14597310
Blood. 2016 Aug 25;128(8):e20-31
pubmed: 27365425

Auteurs

Hoa Thi Nhu Tran (HTN)

Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos Building, Level 3, Singapore, 138648, Singapore.

Kok Siong Ang (KS)

Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos Building, Level 3, Singapore, 138648, Singapore.

Marion Chevrier (M)

Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos Building, Level 3, Singapore, 138648, Singapore.

Xiaomeng Zhang (X)

Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos Building, Level 3, Singapore, 138648, Singapore.

Nicole Yee Shin Lee (NYS)

Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos Building, Level 3, Singapore, 138648, Singapore.

Michelle Goh (M)

Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos Building, Level 3, Singapore, 138648, Singapore.

Jinmiao Chen (J)

Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos Building, Level 3, Singapore, 138648, Singapore. chen_jinmiao@immunol.a-star.edu.sg.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH