A guide for kernel generalized regression methods for genomic-enabled prediction.


Journal

Heredity
ISSN: 1365-2540
Titre abrégé: Heredity (Edinb)
Pays: England
ID NLM: 0373007

Informations de publication

Date de publication:
04 2021
Historique:
received: 24 10 2020
accepted: 24 01 2021
revised: 23 01 2021
pubmed: 3 3 2021
medline: 26 10 2021
entrez: 2 3 2021
Statut: ppublish

Résumé

The primary objective of this paper is to provide a guide on implementing Bayesian generalized kernel regression methods for genomic prediction in the statistical software R. Such methods are quite efficient for capturing complex non-linear patterns that conventional linear regression models cannot. Furthermore, these methods are also powerful for leveraging environmental covariates, such as genotype × environment (G×E) prediction, among others. In this study we provide the building process of seven kernel methods: linear, polynomial, sigmoid, Gaussian, Exponential, Arc-cosine 1 and Arc-cosine L. Additionally, we highlight illustrative examples for implementing exact kernel methods for genomic prediction under a single-environment, a multi-environment and multi-trait framework, as well as for the implementation of sparse kernel methods under a multi-environment framework. These examples are followed by a discussion on the strengths and limitations of kernel methods and, subsequently by conclusions about the main contributions of this paper.

Identifiants

pubmed: 33649571
doi: 10.1038/s41437-021-00412-1
pii: 10.1038/s41437-021-00412-1
pmc: PMC8115678
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S. Review

Langues

eng

Sous-ensembles de citation

IM

Pagination

577-596

Références

Buil A, Brown AA, Lappalainen T, Viñuela A, Davies MN, Zheng HF et al. (2015) Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins. Nat Genet 47:88–91
doi: 10.1038/ng.3162
Cho Y, Saul LK (2009) Kernel methods for deep learning. NIPS’09 Proceedings of the 22nd International Conference on Neural Information Processing Systems, 342–350
Cordell HJ (2002) Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 11:2463–2468
doi: 10.1093/hmg/11.20.2463
Cordell HJ (2009) Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 10:392–404
doi: 10.1038/nrg2579
Crossa J, de los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL et al. (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724
doi: 10.1534/genetics.110.118521
Cuevas J, Crossa J, Soberanis V, Pérez-Elizalde S, Pérez-Rodríguez P, de los Campos G et al. (2016) Genomic prediction of genotype × environment interaction kernel regression models. Plant Genome 9(3):1. 20
doi: 10.3835/plantgenome2016.03.0024
Cuevas J, Crossa J, Montesinos-López OA, Burgueño J, Pérez-Rodríguez P, de los Campos G (2017) Bayesian genomic prediction with genotype × environment kernel models. G3: Genes|Genomes|Genet 7(1):41–53
doi: 10.1534/g3.116.035584
Cuevas J, Granato I, Fritsche-Neto R, Montesinos-Lopez OA, Burgueño J, Bandeira e Sousa M et al. (2018) Genomic-enabled prediction kernel models with random intercepts for multi-environment trials. Genes, Genomes Genet 8(4):1347–1365
Cuevas J, Montesinos-López OA, Juliana P, Guzmán C, Pérez-Rodríguez P, González-Bucio J et al. (2019) Deep kernel for genomic and near infrared predictions in multi-environment breeding trials. G3-Genes Genomes Genet 9(9):2913–2924
Cuevas J, Montesinos-López OA, Martini JWR, Pérez-Rodríguez P, Lillemo M, Crossa J (2020) Approximate genome-based kernel models for large data sets including main effects and interactions. Front Genet 11:567757
doi: 10.3389/fgene.2020.567757
Da Y, Wang C, Wang S, Hu G (2014) Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers. PLoS One 9:e87666
doi: 10.1371/journal.pone.0087666
de los Campos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res 92:295–308
doi: 10.1017/S0016672310000285
Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255
doi: 10.3835/plantgenome2011.08.0024
Gianola D, Fernando RL, Stella A (2006) Genomic-assisted prediction of genetic value with semi parametric procedures. Genetics 173:1761–1776
doi: 10.1534/genetics.105.049510
Gianola D, van Kaam JBCHM (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178:2289–2303
doi: 10.1534/genetics.107.084285
Golan D, Rosset S (2014) Effective genetic-risk prediction using mixed models. Am J Hum Genet 95:383–393
doi: 10.1016/j.ajhg.2014.09.007
González-Camacho JM, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J (2018) Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. Plant Genome 11(2):1–15
doi: 10.3835/plantgenome2017.11.0104
Hemani G, Shakhbazov K, Westra HJ, Esko T, Henders AK, McRae AF et al. (2014) Detection and replication of epistasis influencing transcription in humans. Nature 508:249–253
doi: 10.1038/nature13005
Henderson CR (1985) Best linear unbiased prediction of nonadditive genetic merits. J Anim Sci 60:111–117
doi: 10.2527/jas1985.601111x
Jiang Y, Reif JC (2015) Modeling epistasis in genomic selection. Genetics 201:759–768
doi: 10.1534/genetics.115.177907
Khaki S, Wang L (2019) Crop yield prediction using deep neural networks. Front Plant Sci 2019(10):621
doi: 10.3389/fpls.2019.00621
Lehner B (2011) Molecular mechanisms of epistasis within and between genes. Trends Genet 27:323–331
doi: 10.1016/j.tig.2011.05.007
Long N, Gianola D, Rosa GJ, Weigel KA, Kranis A, González- Recio O (2010) Radial basis function regression methods for predicting quantitative traits using SNP markers. Genet Res 92:209–225
doi: 10.1017/S0016672310000157
Ma W, Qiu Z, Song J, Li J, Cheng Q, Zhai J et al. (2018) A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta 248:1307–1318
doi: 10.1007/s00425-018-2976-9
Ma R, Dicker LH (2019) The mahalanobis kernel for heritability estimation in genome-wide association studies: fixed-effects and random-effects methods. arXiv Prepr arXiv 1901:02936
Martini JWR, Toledo FH, Crossa J (2020) On the approximation of interaction effect models by Hadamard powers of the additive genomic relationship. Theor Popul Biol 132(2020):16–23
doi: 10.1016/j.tpb.2020.01.004
Mathew B, Leon J, Sillanpää MJ (2018) A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction. Heredity 120:356–368
doi: 10.1038/s41437-017-0023-4
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome‐wide dense marker maps. Genetics 157:1819–1829
pubmed: 11290733 pmcid: 1461589 doi: 10.1093/genetics/157.4.1819
Moore JH, Williams SM (2009) Epistasis and its implications for personal genetics. Am J Hum Genet 85:309–320
doi: 10.1016/j.ajhg.2009.08.006
Morota G, Koyama M, Rosa GJM, Weigel KA, Gianola D (2013) Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data. Genet Sel Evol 45:17
doi: 10.1186/1297-9686-45-17
Morota G, Boddhireddy P, Vukasinovic N, Gianola D, Denise S (2014) Kernel-based variance component estimation and whole-genome prediction of pre-corrected phenotypes and progeny tests for dairy cow health traits. Front Genet 5:56
pubmed: 24715901 pmcid: 3970026 doi: 10.3389/fgene.2014.00056
Ober U, Erbe M, Long N, Porcu E, Schlather M, Simianer H (2011) Predicting genetic values: a kernel-based best linear unbiased prediction with genomic data. Genetics 188:695–708
doi: 10.1534/genetics.111.128694
Pérez-Rodríguez P, de los Campos G (2014) Genome-wide regression & prediction with the BGLR statistical package. Genetics 198:483–495
doi: 10.1534/genetics.114.164442
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. Austria, http://www.R-project.org/
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. University Press, Cambridge, UK
doi: 10.1017/CBO9780511809682
Theodoridis S (2020) Machine learning. A Bayesian and optimization perspective. Academic Press, London, United Kingdom
Tusell L, Pérez-Rodríguez P, Forni S, Wu X-L, Gianola D (2013) Genome-enabled methods for predicting litter size in pigs: a comparison. Animal 7:1739–1749
doi: 10.1017/S1751731113001389
Waldmann P (2018) Approximate Bayesian neural networks in genomic prediction. Genet Selection Evol 50:70
doi: 10.1186/s12711-018-0439-1
Waldmann P, Pfeiffer C, Mészáros G (2020) Sparse convolutional neural networks for genome-wide prediction. Front Genet 11:25
doi: 10.3389/fgene.2020.00025
Wellmann R, Bennewitz J (2012) Bayesian models with dominance effects for genomic evaluation of quantitative traits. Genet Res 94:21–37
doi: 10.1017/S0016672312000018
Zuk O, Hechter E, Sunyaev SR, Lander ES (2012) The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci USA 109:1193–1198
doi: 10.1073/pnas.1119675109

Auteurs

Abelardo Montesinos-López (A)

Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, México.

Osval Antonio Montesinos-López (OA)

Facultad de Telemática, Universidad de Colima, 28040, Colima, México. oamontes2@hotmail.com.

José Cricelio Montesinos-López (JC)

Departamento de Estadística, Centro de Investigación en Matemáticas, 36023, Guanajuato, México.

Carlos Alberto Flores-Cortes (CA)

Facultad de Telemática, Universidad de Colima, 28040, Colima, México.

Roberto de la Rosa (R)

Colegio de Postgraduados (CP), Campus Tabasco, Producción Agroalimentaria en el Trópico, H. Cárdenas, Tabasco, México.

José Crossa (J)

Colegio de Postgraduados, Campus Montecillos, CP 56230, Montecillos, Edo. de México, México. j.crossa@cgiar.org.
Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, México. j.crossa@cgiar.org.

Articles similaires

Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Humans Immunization, Secondary COVID-19 Vaccines COVID-19 SARS-CoV-2
Genome, Bacterial Virulence Phylogeny Genomics Plant Diseases
Triticum Transcription Factors Gene Expression Regulation, Plant Plant Proteins Salt Stress

Classifications MeSH