A parallelized strategy for epistasis analysis based on Empirical Bayesian Elastic Net models.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 06 2020
01 06 2020
Historique:
received:
19
09
2019
revised:
05
03
2020
accepted:
26
03
2020
pubmed:
1
4
2020
medline:
29
12
2020
entrez:
1
4
2020
Statut:
ppublish
Résumé
Epistasis reflects the distortion on a particular trait or phenotype resulting from the combinatorial effect of two or more genes or genetic variants. Epistasis is an important genetic foundation underlying quantitative traits in many organisms as well as in complex human diseases. However, there are two major barriers in identifying epistasis using large genomic datasets. One is that epistasis analysis will induce over-fitting of an over-saturated model with the high-dimensionality of a genomic dataset. Therefore, the problem of identifying epistasis demands efficient statistical methods. The second barrier comes from the intensive computing time for epistasis analysis, even when the appropriate model and data are specified. In this study, we combine statistical techniques and computational techniques to scale up epistasis analysis using Empirical Bayesian Elastic Net (EBEN) models. Specifically, we first apply a matrix manipulation strategy for pre-computing the correlation matrix and pre-filter to narrow down the search space for epistasis analysis. We then develop a parallelized approach to further accelerate the modeling process. Our experiments on synthetic and empirical genomic data demonstrate that our parallelized methods offer tens of fold speed up in comparison with the classical EBEN method which runs in a sequential manner. We applied our parallelized approach to a yeast dataset, and we were able to identify both main and epistatic effects of genetic variants associated with traits such as fitness. The software is available at github.com/shilab/parEBEN.
Identifiants
pubmed: 32227194
pii: 5813727
doi: 10.1093/bioinformatics/btaa216
pmc: PMC7320619
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
3803-3810Subventions
Organisme : NHGRI NIH HHS
ID : R15 HG009565
Pays : United States
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
Elife. 2014 Apr 25;3:e01381
pubmed: 24771767
Nat Genet. 2017 Apr;49(4):497-503
pubmed: 28250458
PLoS Genet. 2006 Sep 22;2(9):e157
pubmed: 17002500
Proc Natl Acad Sci U S A. 2009 Apr 7;106(14):5755-60
pubmed: 19299502
Ann Hum Genet. 2011 Jan;75(1):183-93
pubmed: 21091453
Bioinformatics. 2012 Jan 1;28(1):5-12
pubmed: 22053078
BMC Genet. 2012 Jul 24;13:63
pubmed: 22827487
Bioinformatics. 2012 Aug 1;28(15):1957-64
pubmed: 22618535
Hum Genet. 2019 Apr;138(4):293-305
pubmed: 30840129
Theor Popul Biol. 2010 Feb;77(1):1-5
pubmed: 19818800
Am J Hum Genet. 2007 Sep;81(3):559-75
pubmed: 17701901
Bioinformatics. 2016 Mar 15;:
pubmed: 27153585
BMC Bioinformatics. 2011 May 26;12:211
pubmed: 21615941
PLoS Genet. 2009 May;5(5):e1000464
pubmed: 19412524
Expert Rev Mol Diagn. 2004 Nov;4(6):795-803
pubmed: 15525222
BMC Genomics. 2017 Oct 16;18(Suppl 7):756
pubmed: 29513198
J Comput Biol. 2014 May;21(5):385-93
pubmed: 24689773
IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):695-704
pubmed: 26357280
PLoS Genet. 2011 Dec;7(12):e1002382
pubmed: 22144906
Mol Syst Biol. 2007;3:96
pubmed: 17389876
Bioinformatics. 2012 May 15;28(10):1353-8
pubmed: 22492648
Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1193-8
pubmed: 22223662
Nucleic Acids Res. 2012 Jul;40(Web Server issue):W628-32
pubmed: 22689639
Genome Res. 2001 Mar;11(3):458-70
pubmed: 11230170
Nat Rev Genet. 2004 Aug;5(8):618-25
pubmed: 15266344
Nat Commun. 2015 Nov 05;6:8712
pubmed: 26537231
Nat Genet. 2007 Sep;39(9):1167-73
pubmed: 17721534
PLoS One. 2010 Aug 26;5(8):e12264
pubmed: 20865037
Nat Genet. 2005 Apr;37(4):413-7
pubmed: 15793588
Front Genet. 2014 Apr 30;5:106
pubmed: 24817878
Cell. 2017 Jun 15;169(7):1177-1186
pubmed: 28622505
Heredity (Edinb). 2015 Jan;114(1):107-15
pubmed: 25204301
Cell. 2019 May 2;177(4):1022-1034.e6
pubmed: 31051098
Bioinformatics. 2010 Jun 1;26(11):1468-9
pubmed: 20375113
Nat Genet. 2010 Jul;42(7):558-60
pubmed: 20581876
Theor Popul Biol. 1996 Feb;49(1):58-89
pubmed: 8813014
PLoS One. 2013 Jun 21;8(6):e66545
pubmed: 23805232
Pac Symp Biocomput. 2015;:495-505
pubmed: 25741542
BMC Bioinformatics. 2015;16 Suppl 5:S5
pubmed: 25860109
PLoS Genet. 2012;8(8):e1002839
pubmed: 22876191
Nat Commun. 2019 Aug 14;10(1):3657
pubmed: 31413260
Proc Natl Acad Sci U S A. 2009 Apr 21;106(16):6441-6
pubmed: 19223586
Genet Epidemiol. 2013 Jul;37(5):440-51
pubmed: 23633124
BioData Min. 2017 May 30;10:19
pubmed: 28572842
PLoS Comput Biol. 2010 Jan 15;6(1):e1000642
pubmed: 20090830
BioData Min. 2009 Sep 22;2(1):5
pubmed: 19772641
Front Genet. 2013 May 31;4:51
pubmed: 23755066