Next-Gen GWAS: full 2D epistatic interaction maps retrieve part of missing heritability and improve phenotypic prediction.
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
25 Mar 2024
25 Mar 2024
Historique:
received:
13
07
2023
accepted:
19
02
2024
medline:
25
3
2024
pubmed:
25
3
2024
entrez:
25
3
2024
Statut:
epublish
Résumé
The problem of missing heritability requires the consideration of genetic interactions among different loci, called epistasis. Current GWAS statistical models require years to assess the entire combinatorial epistatic space for a single phenotype. We propose Next-Gen GWAS (NGG) that evaluates over 60 billion single nucleotide polymorphism combinatorial first-order interactions within hours. We apply NGG to Arabidopsis thaliana providing two-dimensional epistatic maps at gene resolution. We demonstrate on several phenotypes that a large proportion of the missing heritability can be retrieved, that it indeed lies in epistatic interactions, and that it can be used to improve phenotype prediction.
Identifiants
pubmed: 38523316
doi: 10.1186/s13059-024-03202-0
pii: 10.1186/s13059-024-03202-0
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
76Informations de copyright
© 2024. The Author(s).
Références
Fuchsberger C, Flannick J, Teslovich TM, Mahajan A, Agarwala V, Gaulton KJ, et al. The genetic architecture of type 2 diabetes. Nature. 2016;536:41–7.
doi: 10.1038/nature18642
pubmed: 27398621
pmcid: 5034897
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78.
doi: 10.1038/nature05911
MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017;45:D896-901.
doi: 10.1093/nar/gkw1133
pubmed: 27899670
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010;465:627–31.
doi: 10.1038/nature08800
pubmed: 20336072
pmcid: 3023908
Tian D, Wang P, Tang B, Teng X, Li C, Liu X, et al. GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res. 2019;48:D927–32.
doi: 10.1093/nar/gkz828
pmcid: 6943065
Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101:5–22.
doi: 10.1016/j.ajhg.2017.06.005
pubmed: 28686856
pmcid: 5501872
Chatelain C, Durand G, Thuillier V, Augé F. Performance of epistasis detection methods in semi-simulated GWAS. BMC Bioinformatics. 2018;19:231.
doi: 10.1186/s12859-018-2229-8
pubmed: 29914375
pmcid: 6006572
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–53.
doi: 10.1038/nature08494
pubmed: 19812666
pmcid: 2831613
Phillips PC. Epistasis–the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008;9:855–67.
doi: 10.1038/nrg2452
pubmed: 18852697
pmcid: 2689140
Hind J, Lisboa P, Hussain AJ, Al-Jumeily D. A Novel Approach to Detecting Epistasis using Random Sampling Regularisation. IEEE/ACM Trans Comput Biol Bioinform. 2020;17:1535–45.
pubmed: 31634840
Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10:392–404.
doi: 10.1038/nrg2579
pubmed: 19434077
pmcid: 2872761
Niel C, Sinoquet C, Dina C, Rocheleau G. A survey about methods dedicated to epistasis detection. Front Genet. 2015;6:285.
doi: 10.3389/fgene.2015.00285
pubmed: 26442103
pmcid: 4564769
Slim L, Chatelain C, Azencott C-A, Vert J-P. Novel methods for epistasis detection in genome-wide association studies. PLoS One. 2020;15:e0242927.
doi: 10.1371/journal.pone.0242927
pubmed: 33253293
pmcid: 7703915
Snaebjarnarson AS, Helgadottir A, Arnadottir GA, Ivarsdottir EV, Thorleifsson G, Ferkingstad E, et al. Complex effects of sequence variants on lipid levels and coronary artery disease. Cell. 2023;186:4085-99.e15.
doi: 10.1016/j.cell.2023.08.012
pubmed: 37714134
Koo CL, Liew MJ, Mohamad MS, Salleh AHM, Deris S, Ibrahim Z, et al. Software for detecting gene-gene interactions in genome wide association studies. Biotechnol Bioprocess Eng. 2015;20:662–76.
doi: 10.1007/s12257-015-0064-6
Candès EJ, Romberg JK, Tao T. Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math. 2006;59:1207–23.
doi: 10.1002/cpa.20124
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44.
doi: 10.1038/nature14539
pubmed: 26017442
Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A. 2012;109:1193–8.
doi: 10.1073/pnas.1119675109
pubmed: 22223662
pmcid: 3268279
Slyusar VI. A family of face products of matrices and its properties. Cybern Syst Anal. 1999;35:379–84.
doi: 10.1007/BF02733426
Martini JWR, Crossa J, Toledo FH, Cuevas J. On Hadamard and Kronecker products in covariance structures for genotype × environment interaction. Plant Genome. 2020;13:e20033.
doi: 10.1002/tpg2.20033
pubmed: 33217210
1001 Genomes Consortium. Electronic address: magnus.nordborg@gmi.oeaw.ac.at, 1001 Genomes Consortium. 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana. Cell. 2016;166:481–91.
doi: 10.1016/j.cell.2016.05.063
Marchini J, Donnelly P, Cardon LR. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005;37:413–7.
doi: 10.1038/ng1537
pubmed: 15793588
Korte A, Vilhjálmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44:1066–71.
doi: 10.1038/ng.2376
pubmed: 22902788
pmcid: 3432668
Grant MR, Godiard L, Straube E, Ashfield T, Lewald J, Sattler A, et al. Structure of the Arabidopsis RPM1 gene enabling dual specificity disease resistance. Science. 1995;269:843–6.
doi: 10.1126/science.7638602
pubmed: 7638602
Campos ACAL, van Dijk WFA, Ramakrishna P, Giles T, Korte P, Douglas A, et al. 1,135 ionomes reveals the global pattern of leaf and seed mineral nutrient and trace element diversity in Arabidopsis thaliana. Plant J. 2021. https://doi.org/10.1111/tpj.15177
Michaels SD, Amasino RM. FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell. 1999; Available from: http://www.plantcell.org/content/11/5/949.short
Sheldon CC, Burn JE, Perez PP, Metzger J, Edwards JA, Peacock WJ, et al. The FLF MADS box gene: a repressor of flowering in Arabidopsis regulated by vernalization and methylation. Plant Cell. 1999;11:445–58.
doi: 10.1105/tpc.11.3.445
pubmed: 10072403
pmcid: 144185
Segura V, Vilhjálmsson BJ, Platt A, Korte A, Seren Ü, Long Q, et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012;44:825–30.
doi: 10.1038/ng.2314
pubmed: 22706313
pmcid: 3386481
John M, Ankenbrand MJ, Artmann C, Freudenthal JA, Korte A, Grimm DG. Efficient Permutation-based Genome-wide Association Studies for Normal and Skewed Phenotypic Distributions. bioRxiv. 2022 p. 2022.04.05.487185. Available from: https://www.biorxiv.org/content/10.1101/2022.04.05.487185 , [Cited 2022 Jul 13].
Verzelen N. Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. EJSS. 2012;6:38–90.
Park SH. Collinearity and Optimal Restrictions on Regression Parameters for Estimating Responses. Technometrics. 1981;23:289–95.
doi: 10.2307/1267793
Carré C, Carluer JB, Mas A, Krouk G.. Next Gen GWAS. Zenodo; 2024.. https://doi.org/10.5281/zenodo.10656895