Journal
NAR genomics and bioinformatics
ISSN: 2631-9268
Titre abrégé: NAR Genom Bioinform
Pays: England
ID NLM: 101756213
Informations de publication
Date de publication:
Sep 2020
Sep 2020
Historique:
received:
19
04
2020
revised:
09
08
2020
accepted:
17
09
2020
entrez:
5
10
2020
pubmed:
6
10
2020
medline:
6
10
2020
Statut:
ppublish
Résumé
The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effects adjustment assume the data follow a continuous, bell-shaped Gaussian distribution. However in RNA-seq studies the data are typically skewed, over-dispersed counts, so this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used previously to better capture the properties of counts. We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-seq adjusted data results in better statistical power and control of false positives in differential expression compared to data adjusted by the other available methods. We further demonstrated in a real data example that ComBat-seq successfully removes batch effects and recovers the biological signal in the data.
Identifiants
pubmed: 33015620
doi: 10.1093/nargab/lqaa078
pii: lqaa078
pmc: PMC7518324
doi:
Types de publication
Journal Article
Langues
eng
Pagination
lqaa078Subventions
Organisme : NIGMS NIH HHS
ID : R01 GM083084
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM127430
Pays : United States
Organisme : NCI NIH HHS
ID : U01 CA220413
Pays : United States
Informations de copyright
© The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
Références
Bioinformatics. 2012 Mar 15;28(6):882-3
pubmed: 22257669
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Nucleic Acids Res. 2012 May;40(10):4288-97
pubmed: 22287627
Genome Biol. 2014;15(12):550
pubmed: 25516281
Nucleic Acids Res. 2014 Dec 1;42(21):
pubmed: 25294822
Bioinformatics. 2016 Dec 15;32(24):3836-3838
pubmed: 27540268
Bioinformatics. 2010 Jan 1;26(1):139-40
pubmed: 19910308
Genome Biol. 2014 Feb 03;15(2):R29
pubmed: 24485249
BMC Cancer. 2019 Sep 5;19(1):881
pubmed: 31488082
Bioinformatics. 2015 Sep 1;31(17):2778-84
pubmed: 25926345
Nat Biotechnol. 2014 Sep;32(9):896-902
pubmed: 25150836
BMC Bioinformatics. 2018 Jul 13;19(1):262
pubmed: 30001694
Genome Biol. 2010;11(3):R25
pubmed: 20196867
Genome Med. 2017 Apr 26;9(1):40
pubmed: 28446242
Nat Rev Genet. 2010 Oct;11(10):733-9
pubmed: 20838408