Statistical Approach for Improving Genomic Prediction Accuracy through Efficient Diagnostic Measure of Influential Observation.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
21 05 2020
21 05 2020
Historique:
received:
16
10
2019
accepted:
28
04
2020
entrez:
23
5
2020
pubmed:
23
5
2020
medline:
15
1
2021
Statut:
epublish
Résumé
It is expected the predictive performance of genomic prediction methods may be adversely affected in the presence of outliers. In agriculture science an outlier may arise due to wrong data imputation, outlying response, and in a series of trials over the time or location. Although several statistical procedures are already there in literature for identification of outlier but identification of true outlier is still a challenge especially in case of high dimensional genomic data. Here we have proposed an efficient approach for detecting outlier in high dimensional genomic data, our approach is p-value based combination methods to produce single p-value for detecting the outliers. Robustness of our approach has been tested using simulated data through the evaluation measures like precision, recall etc. It has been observed that significant improvement in the performance of genomic prediction has been obtained by detecting the outliers and handling them accordingly through our proposed approach using real data.
Identifiants
pubmed: 32439883
doi: 10.1038/s41598-020-65323-3
pii: 10.1038/s41598-020-65323-3
pmc: PMC7242349
doi:
Substances chimiques
Genetic Markers
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
8408Références
Hayes, B. & Goddard, M. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
pubmed: 11290733
pmcid: 1461589
Jannink, J.-L., Lorenz, A. J. & Iwata, H. Genomic selection in plant breeding: from theory to practice. Briefings in functional genomics 9, 166–177 (2010).
doi: 10.1093/bfgp/elq001
Zhao, Y., Mette, M. F. & Reif, J. C. Genomic selection in hybrid breeding. Plant Breeding 134, 1–10 (2015).
doi: 10.1111/pbr.12231
Hayes, B. J., Bowman, P. J., Chamberlain, A. & Goddard, M. Invited review: Genomic selection in dairy cattle: Progress and challenges. Journal of dairy science 92, 433–443 (2009).
doi: 10.3168/jds.2008-1646
Daetwyler, H. D., Swan, A. A., van der Werf, J. H. & Hayes, B. J. Accuracy of pedigree and genomic predictions of carcass and novel meat quality traits in multi-breed sheep data assessed by cross-validation. Genetics Selection Evolution 44, 33 (2012).
doi: 10.1186/1297-9686-44-33
Daetwyler, H., Kemper, K., Van der Werf, J. & Hayes, B. Components of the accuracy of genomic prediction in a multi-breed sheep population. Journal of animal science 90, 3375–3384 (2012).
doi: 10.2527/jas.2011-4557
Wang, C. et al. Accuracy of genomic prediction using an evenly spaced, low-density single nucleotide polymorphism panel in broiler chickens. Poultry science 92, 1712–1723 (2013).
doi: 10.3382/ps.2012-02941
Atkinson, A. & PLOTS, T. Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Oxford Statistical Science Series, Oxford University Press: Oxford (1985).
Belsley, D. A., Kuh, E. & Welsch, R. Identifying influential data and sources of collinearity. Regression Diagnostics (1980).
Cook, R. D. Detection of influential observation in linear regression. Technometrics 19, 15–18 (1977).
Cook, R. D. Influential observations in linear regression. Journal of the American Statistical Association 74, 169–174 (1979).
doi: 10.1080/01621459.1979.10481634
Peña, D. A new statistic for influence in linear regression. Technometrics 47, 1–12 (2005).
doi: 10.1198/004017004000000662
Geweke, J. Bayesian treatment of the independent Student‐t linear model. Journal of applied econometrics 8, S19–S40 (1993).
doi: 10.1002/jae.3950080504
Jylänki, P., Vanhatalo, J. & Vehtari, A. Robust Gaussian process regression with a Student-t likelihood. Journal of Machine Learning Research 12, 3227–3257 (2011).
Lange, K. L., Little, R. J. & Taylor, J. M. Robust statistical modeling using the t distribution. Journal of the American Statistical Association 84, 881–896 (1989).
Lourenço, V. M. & Pires, A. M. M-regression, false discovery rates and outlier detection with application to genetic association studies. Computational Statistics & Data Analysis 78, 33–42 (2014).
doi: 10.1016/j.csda.2014.03.019
Rajaratnam, B., Roberts, S., Sparks, D. & Yu, H. Influence Diagnostics for High-Dimensional Lasso Regression. Journal of Computational and Graphical Statistics, 1–14 (2019).
Edgington, E. S. An additive method for combining probability values from independent experiments. The Journal of Psychology 80, 351–363 (1972).
doi: 10.1080/00223980.1972.9924813
Sutton, A. J., Abrams, K. R., Jones, D. R., Sheldon, T. A. & Song, F. Methods for meta-analysis in medical research. Vol. 348 (Wiley Chichester, 2000).
Won, S., Morris, N., Lu, Q. & Elston, R. C. Choosing an optimal method to combine P‐values. Statistics in medicine 28, 1537–1553 (2009).
doi: 10.1002/sim.3569
Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 267–288 (1996).
Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. Least angle regression. The Annals of statistics 32, 407–499 (2004).
doi: 10.1214/009053604000000067
Usai, M. G., Goddard, M. E. & Hayes, B. J. LASSO with cross-validation for genomic selection. Genetics research 91, 427–436 (2009).
doi: 10.1017/S0016672309990334
Crossa, J. et al. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186, 713–724 (2010).
doi: 10.1534/genetics.110.118521
Cuevas, J. et al. Genomic prediction of genotype× environment interaction kernel regression models. The Plant Genome 9 (2016).
Poland, J. et al. Genomic selection in wheat breeding using genotyping-by-sequencing. The Plant Genome 5, 103–113 (2012).
doi: 10.3835/plantgenome2012.06.0006
Yandell, B. S. et al. R/qtlbim: QTL with Bayesian interval mapping in experimental crosses. Bioinformatics 23, 641–643 (2007).
doi: 10.1093/bioinformatics/btm011
Yi, N. et al. An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects. Genetics 176, 1865–1877 (2007).
doi: 10.1534/genetics.107.071365
Yi, N. & Banerjee, S. Hierarchical generalized linear models for multiple quantitative trait locus mapping. Genetics 181, 1101–1113 (2009).
doi: 10.1534/genetics.108.099556
Piao, Z. et al. Bayesian dissection for genetic architecture of traits associated with nitrogen utilization efficiency in rice. African Journal of Biotechnology 8 (2009).
Hwang, C.-L. & Yoon, K. In Multiple attribute decision making 58–191 (Springer, 1981).
Assari, A. & Assari, E. Role of public participation in sustainability of historical city: usage of TOPSIS method. Indian Journal of Science and Technology 5, 2289–2294 (2012).
Henderson, C. R. Estimation of changes in herd environment. Journal of Dairy Science 32, 706–715 (1949).
Endelman, J. B. & Jannink, J.-L. Shrinkage estimation of the realized relationship matrix. G3: Genes, Genomes, Genetics 2, 1405–1413 (2012).
doi: 10.1534/g3.112.004259
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33, 1 (2010).
doi: 10.18637/jss.v033.i01
Endelman, J. B. Ridge regression and other kernels for genomic selection with R package rrBLUP. The Plant Genome 4, 250–255 (2011).
doi: 10.3835/plantgenome2011.08.0024
Taylor, J. & Taylor, M. J. hett: Heteroscedastic t-Regression. R package version 0.3-2. https://CRAN.R-project.org/package=hett . (2018).
Tanaka, E. Simple robust genomic prediction and outlier detection for a multi-environmental field trial. arXiv preprint arXiv:1807.07268 (2018).
Fisher, R. (Edinburgh, 1932).
Mudholkar, G. & George, E. In Symposium on optimizing methods in statistics. 345–366 (Academic Press New York).
Stouffer, S., Suchman, E., Devinney, L., Star, S. & Williams, R. (Princeton: Princeton University Press).