Investigating the Performance of Frequentist and Bayesian Techniques in Genomic Evaluation.

Bayesian Bias Frequentist method Practical significance Prediction accuracy Statistical significance

Journal

Biochemical genetics

ISSN: 1573-4927

Titre abrégé: Biochem Genet

Pays: United States

ID NLM: 0126611

Informations de publication

Date de publication:
01 Jul 2024

Historique:

received: 20 02 2024

accepted: 16 05 2024

medline: 2 7 2024

pubmed: 2 7 2024

entrez: 1 7 2024

Statut: aheadofprint

Résumé

The genomic evaluation process relies on the assumption of linkage disequilibrium between dense single-nucleotide polymorphism (SNP) markers at the genome level and quantitative trait loci (QTL). The present study was conducted with the aim of evaluating four frequentist methods including Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net, and Genomic Best Linear Unbiased Prediction (GBLUP) and five Bayesian methods including Bayes Ridge Regression (BRR), Bayes A, Bayesian LASSO, Bayes C, and Bayes B, in genomic selection using simulation data. The difference between prediction accuracy was assessed in pairs based on statistical significance (p-value) (i.e., t test and Mann-Whitney U test) and practical significance (Cohen's d effect size) For this purpose, the data were simulated based on two scenarios in different marker densities (4000 and 8000, in the whole genome). The simulated data included a genome with four chromosomes, 1 Morgan each, on which 100 randomly distributed QTL and two different densities of evenly distributed SNPs (1000 and 2000), at the heritability level of 0.4, was considered. For the frequentist methods except for GBLUP, the regularization parameter λ was calculated using a five-fold cross-validation approach. For both scenarios, among the frequentist methods, the highest prediction accuracy was observed by Ridge Regression and GBLUP. The lowest and the highest bias were shown by Ridge Regression and GBLUP, respectively. Also, among the Bayesian methods, Bayes B and BRR showed the highest and lowest prediction accuracy, respectively. The lowest bias in both scenarios was registered by Bayesian LASSO and the highest bias in the first and the second scenario were shown by BRR and Bayes B, respectively. Across all the studied methods in both scenarios, the highest and the lowest accuracy were shown by Bayes B and LASSO and Elastic Net, respectively. As expected, the greatest similarity in performance was observed between GBLUP and BRR (

Identifiants

DOI: 10.1007/s10528-024-10842-1 PMID: 38951354

pubmed: 38951354

doi: 10.1007/s10528-024-10842-1

pii: 10.1007/s10528-024-10842-1

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Abdollahi-Arpanahi R, Gianola D, Peñagaricano F (2020) Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genet Sel Evol 52:1–15

doi: 10.1186/s12711-020-00531-z

Calus MPL, Goddard ME, Wientjes YCJ, Bowman PJ, Hayes BJ (2018) Multibreed genomic prediction using multi-trait genomic residual maximum likelihood and multitask Bayesian variable selection. J Dairy Sci 101(5):4279–4294

doi: 10.3168/jds.2017-13366 pubmed: 29550121

Cohen J (1988) Statistical power analysis for the behavioral sciences. Routledge Academic, New York

Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Taylor & Francis Group, New York and London

De Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385

doi: 10.1534/genetics.109.101501

De los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345

doi: 10.1534/genetics.112.143313 pubmed: 22745228 pmcid: 3567727

Esfandyari H. Sørensen AC (2017) Xbreed: An R Package for Genomic Simulation of Purebreds and Crossbreds, in 68th Annual Meeting of the European Federation of Animal Science, Tallinn, Estonia, 28 Aug–1 Sep 2017

Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22

doi: 10.18637/jss.v033.i01 pubmed: 20808728 pmcid: 2929880

Goddard M (2009) Genomic selection: prediction of accuracy and maximisation of long-term response. Genetics 136:245–257

Goddard M, Hayes B (2007) Genomic selection. J Anim Breed Genet 124:323–330

doi: 10.1111/j.1439-0388.2007.00702.x pubmed: 18076469

Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31(4):337–350

doi: 10.1007/s10654-016-0149-3 pubmed: 27209009 pmcid: 4877414

Habier D, Fernando R, Dekkers J (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397

doi: 10.1534/genetics.107.081190 pubmed: 18073436 pmcid: 2219482

Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12:186

doi: 10.1186/1471-2105-12-186 pubmed: 21605355 pmcid: 3144464

Hayes B, Bowman P, Chamberlain A, Verbyla K, Goddard M (2009) Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genet Sel Evol 41:51–66

doi: 10.1186/1297-9686-41-51 pubmed: 19930712 pmcid: 2791750

Hill W, Robertson A (1968) Linkage disequilibrium in finite populations. Theor Appl Genet 38:226–231

doi: 10.1007/BF01245622 pubmed: 24442307

Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technomterics 12:55–67

doi: 10.1080/00401706.1970.10488634

Howard R, Carriquiry AL, Beavis WD (2014) Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3 (Bethesda) 4(6):1027–1046. https://doi.org/10.1534/g3.114.010298

doi: 10.1534/g3.114.010298 pubmed: 24727289

Kumar S, Molloy C, Muñoz P, Daetwyler H, Chagné D, Volz R (2015) Genome-enabled estimates of additive and nonadditive genetic variances and prediction of apple phenotypes across environments. G3 (Bethesda). 5(12):2711–8. https://doi.org/10.1534/g3.115.021105

doi: 10.1534/g3.115.021105 pubmed: 26497141 pmcid: 4683643

Lakens D (2013) Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol 26(4):863. https://doi.org/10.3389/fpsyg.2013.00863

doi: 10.3389/fpsyg.2013.00863

Landers J (1981) Quantification in history, topic 4: hypothesis testing II-differing central tendency. All Souls College, Oxford

Mann HB, Whitney DR (1947) On a test of whether one of 2 random variables is stochastically larger than the other. Ann Math Stat 18:50–60

doi: 10.1214/aoms/1177730491

Mäntysaari E, Liu Z, VanRaden P (2010) Interbull validation test for genomic evaluations. Interbull Bull 41:17–21

Meuwssen T, Hayes B, Goddard M (2001) Prediction of total genetic value using genome-wide dense maker maps. Genetics 157:1819–1829

doi: 10.1093/genetics/157.4.1819

Nachar N (2008) The Mann-Whitney U: a test for assessing whether two independent samples come from the same distribution. Tutor Quant Methods Psychol. https://doi.org/10.20982/tqmp.04.1

doi: 10.20982/tqmp.04.1

Neyman J (1937) X—outline of a theory of statistical estimation based on the classical theory of probability. Phil Trans R Soc Lond a 236(767):333–380

doi: 10.1098/rsta.1937.0005

Nuzzo R (2014) Scientific method: statistical errors. Nat News 506(7487):150

doi: 10.1038/506150a

Ogutu JO, Schulz-Streeck T, Piepho HP (2012) Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc 6(Suppl 2):S10. https://doi.org/10.1186/1753-6561-6-S2-S10

doi: 10.1186/1753-6561-6-S2-S10 pubmed: 22640436 pmcid: 3363152

Park T, Casella G (2008) The bayesian lasso. Am Stat Assoc 103:681–686

doi: 10.1198/016214508000000337

Pérez P, de los Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 3:106–116

doi: 10.3835/plantgenome2010.04.0005 pubmed: 21566722 pmcid: 3091623

Perez P, De los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198:483–495

doi: 10.1534/genetics.114.164442 pubmed: 25009151 pmcid: 4196607

Piyasation N, Dekkers J (2013) Accuracy of genomic Prediction when accounting for population structure and polygenic effects. Anim Ind Rep 659:68

Robert M et al (1988) Fondements et étapes de la recherche scientifique en psychologie. Maloine, Saint-Hyacinthe: Edisem et Paris

Rosenthal R (1994) Parametric measures of effect size. In: Cooper H, Hedges LV (eds) The hand-book of research synthesis. Sage, New York, pp 231–244

Sahebalam H, Gholizadeh M, Hafezian H, Farhadi A (2019) Comparison of parametric, semiparametric and nonparametric methods in genomic evaluation. J Genet 98:102

doi: 10.1007/s12041-019-1149-3 pubmed: 31767821

Sahebalam H, Gholizadeh M, Hafezian H, Ebrahimi F (2022) Evaluation of Bagging approach versus GBLUP and Bayesian LASSO in genomic prediction. J Genet 101:19

doi: 10.1007/s12041-022-01358-x

Salehi A, Bazrafshan M, Abdollahi-Arpanahi R (2020) Assessment of parametric and non-parametric methods for prediction of quantitative traits with non-additive genetic architecture. Annal Anim Sci. https://doi.org/10.2478/aoas-2020-0087

doi: 10.2478/aoas-2020-0087

Siegel S, Castellan NJ Jr (1988) Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill book company, États-Unis

Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1–13

doi: 10.18637/jss.v039.i05 pubmed: 27065756 pmcid: 4824408

Su G, Christensen OF, Ostersen T, Henryon M, Lund MS (2012) Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS One. 7(9):e45293. https://doi.org/10.1371/journal.pone.0045293

doi: 10.1371/journal.pone.0045293 pubmed: 23028912 pmcid: 3441703

Thomasen JR, Sørensen AC, Su G, Madsen P, Lund MS, Guldbrandtsen B (2013) The admixed population structure in Danish Jersey challenges accurate genomic predictions. J Anim Sci 91:3105–3112

doi: 10.2527/jas.2012-5490 pubmed: 23658363

Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288

doi: 10.1111/j.2517-6161.1996.tb02080.x

VanRaden P (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423

doi: 10.3168/jds.2007-0980 pubmed: 18946147

VanRaden P, Van Tassell C, Wiggans G, Sonstegard T, Schnabel R, Taylor J, Schenkel F (2009) Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci 92:16–24

doi: 10.3168/jds.2008-1514 pubmed: 19109259

Velazco JG, Malosetti M, Hunt CH, Mace ES, Jordan DR, van Eeuwijk FA (2019) Combining pedigree and genomic information to improve prediction quality: an example in sorghum. Theor Appl Genet. 132(7):2055–2067. https://doi.org/10.1007/s00122-019-03337-w

doi: 10.1007/s00122-019-03337-w pubmed: 30968160 pmcid: 6588709

Wang X, Miao J, Chang T, Xia J, An B, Li Y et al (2019) Evaluation of GBLUP, BayesB and elastic net for genomic prediction in Chinese Simmental beef cattle. PLoS ONE 14(2):e0210442

doi: 10.1371/journal.pone.0210442 pubmed: 30817758 pmcid: 6394919

Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process, and purpose. Am Stat 70(2):129–133

doi: 10.1080/00031305.2016.1154108

Wieringen W (2015) Lecture notes on ridge regression

Wolc A, Stricker C, Arango J, Settar P, Fulton JE, O’Sullivan NP, Preisinger R, Habier D, Fernando R, Garrick DJ, Lamont SJ, Dekkers JCM (2011) Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model. Gen Sel Evol 43:5

doi: 10.1186/1297-9686-43-5

Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179:1045–1055

doi: 10.1534/genetics.107.085589 pubmed: 18505874 pmcid: 2429858

Zhang X, Lourenco D, Aguilar I, Legarra A, Misztal I (2016) Weighting strategies for single-step genomic BLUP: an iterative approach for accurate calculation of GEBV and GWAS. Front Genet 7:151

doi: 10.3389/fgene.2016.00151 pubmed: 27594861 pmcid: 4990542

Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc B 67:301–320

doi: 10.1111/j.1467-9868.2005.00503.x

Investigating the Performance of Frequentist and Bayesian Techniques in Genomic Evaluation.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Hamid Sahebalam (H)

Mohsen Gholizadeh (M)

Hasan Hafezian (H)

Classifications MeSH