Fast and accurate exhaustive higher-order epistasis search with BitEpi.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
05 08 2021
Historique:
received: 19 03 2021
accepted: 20 07 2021
entrez: 6 8 2021
pubmed: 7 8 2021
medline: 7 8 2021
Statut: epublish

Résumé

Complex genetic diseases may be modulated by a large number of epistatic interactions affecting a polygenic phenotype. Identifying these interactions is difficult due to computational complexity, especially in the case of higher-order interactions where more than two genomic variants are involved. In this paper, we present BitEpi, a fast and accurate method to test all possible combinations of up to four bi-allelic variants (i.e. Single Nucleotide Variant or SNV for short). BitEpi introduces a novel bitwise algorithm that is 1.7 and 56 times faster for 3-SNV and 4-SNV search, than established software. The novel entropy statistic used in BitEpi is 44% more accurate to identify interactive SNVs, incorporating a p-value-based significance testing. We demonstrate BitEpi on real world data of 4900 samples and 87,000 SNPs. We also present EpiExplorer to visualize the potentially large number of individual and interacting SNVs in an interactive Cytoscape graph. EpiExplorer uses various visual elements to facilitate the discovery of true biological events in a complex polygenic environment.

Identifiants

pubmed: 34354094
doi: 10.1038/s41598-021-94959-y
pii: 10.1038/s41598-021-94959-y
pmc: PMC8342486
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

15923

Informations de copyright

© 2021. The Author(s).

Références

Wei, W.-H., Hemani, G. & Haley, C. S. Detecting epistasis in human complex traits. Nat. Rev. Genet. 15(11), 722 (2014).
doi: 10.1038/nrg3747
Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary geneticists worry about higher-order epistasis?. Curr. Opin. Genet. Development 23(6), 700–707 (2013).
doi: 10.1016/j.gde.2013.10.007
Taylor, M. B. & Ehrenreich, I. M. Higher-order genetic interactions and their contribution to complex traits. Trends Genet. 31(1), 34–40 (2015).
doi: 10.1016/j.tig.2014.09.001
Niel, C., Sinoquet, C., Dina, C. & Rocheleau, G. A survey about methods dedicated to epistasis detection. Front. Genet. 6, 285 (2015).
doi: 10.3389/fgene.2015.00285
Shang, J. et al. Performance analysis of novel methods for detecting epistasis. BMC Bioinform. 12, 475 (2011).
doi: 10.1186/1471-2105-12-475
Chen, L., Yu, G., Miller, D.J., Song, L., Langefeld, C., Herrington, D., Liu, Y., & Wang, Y. A ground truth based comparative study on detecting epistatic SNPs. in 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop, 26–31. (IEEE, 2009).
Emily, M. A survey of statistical methods for gene–gene interaction in case–control genome-wide association studies. Journal de la société française de statistique 159(1), 27–67 (2018).
Cordell, H. J. Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11(20), 2463–2468 (2002).
doi: 10.1093/hmg/11.20.2463
Eppstein, M.J. & Haake, P. Very large scale ReliefF for genome-wide association analysis. in 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 112–119. (IEEE, 2008).
Yoshida, M. & Koike, A. Snpinterforest: A new method for detecting epistatic interactions. BMC Bioinform. 12(1), 469 (2011).
doi: 10.1186/1471-2105-12-469
Cao, X., Yu, G., Liu, J., Jia, L. & Wang, J. Clustermi: Detecting high-order SNP interactions based on clustering and mutual information. Int. J. Mol. Sci. 19(8), 2267 (2018).
doi: 10.3390/ijms19082267
Meng, Y., Yang, Q., Cuenco, K.T., Cupples, L.A., DeStefano, A.L., & Lunetta, K.L. Two-stage approach for identifying single-nucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks. in BMC Proceedings, Vol. 1, S56. (BioMed Central, 2007).
Breiman, L. Random forests. Mach. Learn. 45(1), 5–32 (2001).
doi: 10.1023/A:1010933404324
Jiang, R., Tang, W., Wu, X. & Fu, W. A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinform. 10(1), S65 (2009).
doi: 10.1186/1471-2105-10-S1-S65
Bayat, A. et al. Variantspark, a random forest machine learning implementation for ultra high dimensional data. GigaScience 9(8), giaa077. https://doi.org/10.1093/gigascience/giaa077 (2019).
doi: 10.1093/gigascience/giaa077
Urbanowicz, R. J. et al. Gametes: A fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Mining 5(1), 16 (2012).
doi: 10.1186/1756-0381-5-16
Zhang, X., Huang, S., Zou, F. & Wang, W. Team: Efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26(12), i217–i227 (2010).
doi: 10.1093/bioinformatics/btq186
Wan, X. et al. Boost: A fast approach to detecting gene-gene interactions in genome-wide case–control studies. Am. J. Hum. Genet. 87(3), 325–340 (2010).
doi: 10.1016/j.ajhg.2010.07.021
Shang, J. et al. Cinoedv: A co-information based method for detecting and visualizing n-order epistatic interactions. BMC Bioinform. 17(1), 214 (2016).
doi: 10.1186/s12859-016-1076-8
Moore, J. H. & Andrews, P. C. Epistasis analysis using multifactor dimensionality reduction. in Epistasis, 301–314. https://doi.org/10.1007/978-1-4939-2155-3_16 (Springer, 2015).
doi: 10.1007/978-1-4939-2155-3_16
Ponte-Fernández, C., González-Domínguez, J. & Martín, M. J. Fast search of third-order epistatic interactions on CPU and GPU clusters. Int. J. High Performance Comput. Appl. https://doi.org/10.1177/1094342019852128 (2019).
doi: 10.1177/1094342019852128
Hu, T. et al. An information-gain approach to detecting three-way epistatic interactions in genetic association studies. J. Am. Med. Inf. Assoc. 20(4), 630–636 (2013).
doi: 10.1136/amiajnl-2012-001525
Leem, S., Jeong, H.-H., Lee, J., Wee, K. & Sohn, K.-A. Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure. Comput. Biol. Chem. 50, 19–28 (2014).
doi: 10.1016/j.compbiolchem.2014.01.005
Shannon, P. et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003).
doi: 10.1101/gr.1239303
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009).
doi: 10.1093/bioinformatics/btp352
Jing, P.-J. & Shen, H.-B. Macoed: A multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics 31(5), 634–641 (2014).
doi: 10.1093/bioinformatics/btu702
Wellcome Trust Case Control Consortium et al. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447(7145), 661 (2007).
doi: 10.1038/nature05911
Purcell, S. et al. Plink: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007).
doi: 10.1086/519795
Jiang, Y. & Reif, J. C. Efficient algorithms for calculating epistatic genomic relationship matrices. Genetics 216(3), 651–669 (2020).
doi: 10.1534/genetics.120.303459

Auteurs

Arash Bayat (A)

Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.
The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia.

Brendan Hosking (B)

Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.

Yatish Jain (Y)

Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.
Department of Biomedical Sciences, Macquarie University, Macquarie Park, NSW, 2113, Australia.

Cameron Hosking (C)

Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.

Milindi Kodikara (M)

Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.

Daniel Reti (D)

Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.
Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia.

Natalie A Twine (NA)

Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.
Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia.

Denis C Bauer (DC)

Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia. Denis.Bauer@CSIRO.au.
Department of Biomedical Sciences, Macquarie University, Macquarie Park, NSW, 2113, Australia. Denis.Bauer@CSIRO.au.
Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia. Denis.Bauer@CSIRO.au.

Classifications MeSH