Investigating the Performance of Frequentist and Bayesian Techniques in Genomic Evaluation.

Bayesian Bias Frequentist method Practical significance Prediction accuracy Statistical significance

Journal

Biochemical genetics
ISSN: 1573-4927
Titre abrégé: Biochem Genet
Pays: United States
ID NLM: 0126611

Informations de publication

Date de publication:
01 Jul 2024
Historique:
received: 20 02 2024
accepted: 16 05 2024
medline: 2 7 2024
pubmed: 2 7 2024
entrez: 1 7 2024
Statut: aheadofprint

Résumé

The genomic evaluation process relies on the assumption of linkage disequilibrium between dense single-nucleotide polymorphism (SNP) markers at the genome level and quantitative trait loci (QTL). The present study was conducted with the aim of evaluating four frequentist methods including Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net, and Genomic Best Linear Unbiased Prediction (GBLUP) and five Bayesian methods including Bayes Ridge Regression (BRR), Bayes A, Bayesian LASSO, Bayes C, and Bayes B, in genomic selection using simulation data. The difference between prediction accuracy was assessed in pairs based on statistical significance (p-value) (i.e., t test and Mann-Whitney U test) and practical significance (Cohen's d effect size) For this purpose, the data were simulated based on two scenarios in different marker densities (4000 and 8000, in the whole genome). The simulated data included a genome with four chromosomes, 1 Morgan each, on which 100 randomly distributed QTL and two different densities of evenly distributed SNPs (1000 and 2000), at the heritability level of 0.4, was considered. For the frequentist methods except for GBLUP, the regularization parameter λ was calculated using a five-fold cross-validation approach. For both scenarios, among the frequentist methods, the highest prediction accuracy was observed by Ridge Regression and GBLUP. The lowest and the highest bias were shown by Ridge Regression and GBLUP, respectively. Also, among the Bayesian methods, Bayes B and BRR showed the highest and lowest prediction accuracy, respectively. The lowest bias in both scenarios was registered by Bayesian LASSO and the highest bias in the first and the second scenario were shown by BRR and Bayes B, respectively. Across all the studied methods in both scenarios, the highest and the lowest accuracy were shown by Bayes B and LASSO and Elastic Net, respectively. As expected, the greatest similarity in performance was observed between GBLUP and BRR (

Identifiants

pubmed: 38951354
doi: 10.1007/s10528-024-10842-1
pii: 10.1007/s10528-024-10842-1
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© 2024. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Références

Abdollahi-Arpanahi R, Gianola D, Peñagaricano F (2020) Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genet Sel Evol 52:1–15
doi: 10.1186/s12711-020-00531-z
Calus MPL, Goddard ME, Wientjes YCJ, Bowman PJ, Hayes BJ (2018) Multibreed genomic prediction using multi-trait genomic residual maximum likelihood and multitask Bayesian variable selection. J Dairy Sci 101(5):4279–4294
doi: 10.3168/jds.2017-13366 pubmed: 29550121
Cohen J (1988) Statistical power analysis for the behavioral sciences. Routledge Academic, New York
Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Taylor & Francis Group, New York and London
De Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385
doi: 10.1534/genetics.109.101501
De los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345
doi: 10.1534/genetics.112.143313 pubmed: 22745228 pmcid: 3567727
Esfandyari H. Sørensen AC (2017) Xbreed: An R Package for Genomic Simulation of Purebreds and Crossbreds, in 68th Annual Meeting of the European Federation of Animal Science, Tallinn, Estonia, 28 Aug–1 Sep 2017
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22
doi: 10.18637/jss.v033.i01 pubmed: 20808728 pmcid: 2929880
Goddard M (2009) Genomic selection: prediction of accuracy and maximisation of long-term response. Genetics 136:245–257
Goddard M, Hayes B (2007) Genomic selection. J Anim Breed Genet 124:323–330
doi: 10.1111/j.1439-0388.2007.00702.x pubmed: 18076469
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31(4):337–350
doi: 10.1007/s10654-016-0149-3 pubmed: 27209009 pmcid: 4877414
Habier D, Fernando R, Dekkers J (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397
doi: 10.1534/genetics.107.081190 pubmed: 18073436 pmcid: 2219482
Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12:186
doi: 10.1186/1471-2105-12-186 pubmed: 21605355 pmcid: 3144464
Hayes B, Bowman P, Chamberlain A, Verbyla K, Goddard M (2009) Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genet Sel Evol 41:51–66
doi: 10.1186/1297-9686-41-51 pubmed: 19930712 pmcid: 2791750
Hill W, Robertson A (1968) Linkage disequilibrium in finite populations. Theor Appl Genet 38:226–231
doi: 10.1007/BF01245622 pubmed: 24442307
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technomterics 12:55–67
doi: 10.1080/00401706.1970.10488634
Howard R, Carriquiry AL, Beavis WD (2014) Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3 (Bethesda) 4(6):1027–1046. https://doi.org/10.1534/g3.114.010298
doi: 10.1534/g3.114.010298 pubmed: 24727289
Kumar S, Molloy C, Muñoz P, Daetwyler H, Chagné D, Volz R (2015) Genome-enabled estimates of additive and nonadditive genetic variances and prediction of apple phenotypes across environments. G3 (Bethesda). 5(12):2711–8. https://doi.org/10.1534/g3.115.021105
doi: 10.1534/g3.115.021105 pubmed: 26497141 pmcid: 4683643
Lakens D (2013) Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol 26(4):863. https://doi.org/10.3389/fpsyg.2013.00863
doi: 10.3389/fpsyg.2013.00863
Landers J (1981) Quantification in history, topic 4: hypothesis testing II-differing central tendency. All Souls College, Oxford
Mann HB, Whitney DR (1947) On a test of whether one of 2 random variables is stochastically larger than the other. Ann Math Stat 18:50–60
doi: 10.1214/aoms/1177730491
Mäntysaari E, Liu Z, VanRaden P (2010) Interbull validation test for genomic evaluations. Interbull Bull 41:17–21
Meuwssen T, Hayes B, Goddard M (2001) Prediction of total genetic value using genome-wide dense maker maps. Genetics 157:1819–1829
doi: 10.1093/genetics/157.4.1819
Nachar N (2008) The Mann-Whitney U: a test for assessing whether two independent samples come from the same distribution. Tutor Quant Methods Psychol. https://doi.org/10.20982/tqmp.04.1
doi: 10.20982/tqmp.04.1
Neyman J (1937) X—outline of a theory of statistical estimation based on the classical theory of probability. Phil Trans R Soc Lond a 236(767):333–380
doi: 10.1098/rsta.1937.0005
Nuzzo R (2014) Scientific method: statistical errors. Nat News 506(7487):150
doi: 10.1038/506150a
Ogutu JO, Schulz-Streeck T, Piepho HP (2012) Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc 6(Suppl 2):S10. https://doi.org/10.1186/1753-6561-6-S2-S10
doi: 10.1186/1753-6561-6-S2-S10 pubmed: 22640436 pmcid: 3363152
Park T, Casella G (2008) The bayesian lasso. Am Stat Assoc 103:681–686
doi: 10.1198/016214508000000337
Pérez P, de los Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 3:106–116
doi: 10.3835/plantgenome2010.04.0005 pubmed: 21566722 pmcid: 3091623
Perez P, De los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198:483–495
doi: 10.1534/genetics.114.164442 pubmed: 25009151 pmcid: 4196607
Piyasation N, Dekkers J (2013) Accuracy of genomic Prediction when accounting for population structure and polygenic effects. Anim Ind Rep 659:68
Robert M et al (1988) Fondements et étapes de la recherche scientifique en psychologie. Maloine, Saint-Hyacinthe: Edisem et Paris
Rosenthal R (1994) Parametric measures of effect size. In: Cooper H, Hedges LV (eds) The hand-book of research synthesis. Sage, New York, pp 231–244
Sahebalam H, Gholizadeh M, Hafezian H, Farhadi A (2019) Comparison of parametric, semiparametric and nonparametric methods in genomic evaluation. J Genet 98:102
doi: 10.1007/s12041-019-1149-3 pubmed: 31767821
Sahebalam H, Gholizadeh M, Hafezian H, Ebrahimi F (2022) Evaluation of Bagging approach versus GBLUP and Bayesian LASSO in genomic prediction. J Genet 101:19
doi: 10.1007/s12041-022-01358-x
Salehi A, Bazrafshan M, Abdollahi-Arpanahi R (2020) Assessment of parametric and non-parametric methods for prediction of quantitative traits with non-additive genetic architecture. Annal Anim Sci. https://doi.org/10.2478/aoas-2020-0087
doi: 10.2478/aoas-2020-0087
Siegel S, Castellan NJ Jr (1988) Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill book company, États-Unis
Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1–13
doi: 10.18637/jss.v039.i05 pubmed: 27065756 pmcid: 4824408
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS (2012) Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS One. 7(9):e45293. https://doi.org/10.1371/journal.pone.0045293
doi: 10.1371/journal.pone.0045293 pubmed: 23028912 pmcid: 3441703
Thomasen JR, Sørensen AC, Su G, Madsen P, Lund MS, Guldbrandtsen B (2013) The admixed population structure in Danish Jersey challenges accurate genomic predictions. J Anim Sci 91:3105–3112
doi: 10.2527/jas.2012-5490 pubmed: 23658363
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288
doi: 10.1111/j.2517-6161.1996.tb02080.x
VanRaden P (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
doi: 10.3168/jds.2007-0980 pubmed: 18946147
VanRaden P, Van Tassell C, Wiggans G, Sonstegard T, Schnabel R, Taylor J, Schenkel F (2009) Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci 92:16–24
doi: 10.3168/jds.2008-1514 pubmed: 19109259
Velazco JG, Malosetti M, Hunt CH, Mace ES, Jordan DR, van Eeuwijk FA (2019) Combining pedigree and genomic information to improve prediction quality: an example in sorghum. Theor Appl Genet. 132(7):2055–2067. https://doi.org/10.1007/s00122-019-03337-w
doi: 10.1007/s00122-019-03337-w pubmed: 30968160 pmcid: 6588709
Wang X, Miao J, Chang T, Xia J, An B, Li Y et al (2019) Evaluation of GBLUP, BayesB and elastic net for genomic prediction in Chinese Simmental beef cattle. PLoS ONE 14(2):e0210442
doi: 10.1371/journal.pone.0210442 pubmed: 30817758 pmcid: 6394919
Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process, and purpose. Am Stat 70(2):129–133
doi: 10.1080/00031305.2016.1154108
Wieringen W (2015) Lecture notes on ridge regression
Wolc A, Stricker C, Arango J, Settar P, Fulton JE, O’Sullivan NP, Preisinger R, Habier D, Fernando R, Garrick DJ, Lamont SJ, Dekkers JCM (2011) Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model. Gen Sel Evol 43:5
doi: 10.1186/1297-9686-43-5
Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179:1045–1055
doi: 10.1534/genetics.107.085589 pubmed: 18505874 pmcid: 2429858
Zhang X, Lourenco D, Aguilar I, Legarra A, Misztal I (2016) Weighting strategies for single-step genomic BLUP: an iterative approach for accurate calculation of GEBV and GWAS. Front Genet 7:151
doi: 10.3389/fgene.2016.00151 pubmed: 27594861 pmcid: 4990542
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc B 67:301–320
doi: 10.1111/j.1467-9868.2005.00503.x

Auteurs

Hamid Sahebalam (H)

Department of Animal Science, Faculty of Animal and Aquatic Science, Sari Agricultural Sciences and Natural Resources University, Sari, Iran. hamid.sahebalam@yahoo.com.

Mohsen Gholizadeh (M)

Department of Animal Science, Faculty of Animal and Aquatic Science, Sari Agricultural Sciences and Natural Resources University, Sari, Iran.

Hasan Hafezian (H)

Department of Animal Science, Faculty of Animal and Aquatic Science, Sari Agricultural Sciences and Natural Resources University, Sari, Iran.

Classifications MeSH