Development of a genomic tool for breed assignment by comparison of different classification models: Application to three local cattle breeds.

SNP panel breed assignment classification informative SNPs local breeds partial least squares

Journal

Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie
ISSN: 1439-0388
Titre abrégé: J Anim Breed Genet
Pays: Germany
ID NLM: 100955807

Informations de publication

Date de publication:
Jan 2022
Historique:
revised: 06 08 2021
received: 11 03 2021
accepted: 08 08 2021
pubmed: 25 8 2021
medline: 17 12 2021
entrez: 24 8 2021
Statut: ppublish

Résumé

Assignment of individual cattle to a specific breed can often not rely on pedigree information. This is especially the case for local breeds for which the development of genomic assignment tools is required to allow individuals of unknown origin to be included to their herd books. A breed assignment model can be based on two specific stages: (a) the selection of breed-informative markers and (b) the assignment of individuals to a breed with a classification method. However, the performance of combination of methods used in these two stages has been rarely studied until now. In this study, the combination of 16 different SNP panels with four classification methods was developed on 562 reference genotypes from 12 cattle breeds. Based on their performances, best models were validated on three local breeds of interest. In cross-validation, 14 models had a global cross-validation accuracy higher than 90%, with a maximum of 98.22%. In validation, best models used 7,153 or 2,005 SNPs, based on a partial least squares-discriminant analysis (PLS-DA) and assigned individuals to breeds based on nearest shrunken centroids. The average validation sensitivity of the first two best models for the three local breeds of interest were 98.33% and 97.5%. Moreover, results reported in this study suggest that further studies should consider the PLS-DA method when selecting breed-informative SNPs.

Identifiants

pubmed: 34427366
doi: 10.1111/jbg.12643
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

40-61

Subventions

Organisme : Service Public de Wallonie
Organisme : Terra Teaching and Research Centre
Organisme : Administration of Technical Agricultural Services
Organisme : INTERREG V France-Wallonie-Vlaanderen
ID : 2.3.284
Organisme : Fonds De La Recherche Scientifique - FNRS
ID : T.0095.19
Organisme : Fonds De La Recherche Scientifique - FNRS
ID : J.0174.18

Informations de copyright

© 2021 Wiley-VCH GmbH.

Références

Alexander, D. H., Novembre, J., & Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9), 1655-1664. https://doi.org/10.1101/gr.094052.109
Baumung, R., Cubric-Curik, V., Schwend, K., Achmann, R., & Sölkner, J. (2006). Genetic characterisation and breed assignment in Austrian sheep breeds using microsatellite marker information. Journal of Animal Breeding and Genetics, 123(4), 265-271. https://doi.org/10.1111/j.1439-0388.2006.00583.x
Bertolini, F., Galimberti, G., Calò, D. G., Schiavo, G., Matassino, D., & Fontanesi, L. (2015). Combined use of principal component analysis and random forests identify population-informative single nucleotide polymorphisms: Application in cattle breeds. Journal of Animal Breeding and Genetics, 132(5), 346-356. https://doi.org/10.1111/jbg.12155
Bertolini, F., Galimberti, G., Schiavo, G., Mastrangelo, S., Di Gerlando, R., Strillacci, M. G., Bagnato, A., Portolano, B., & Fontanesi, L. (2018). Preselection statistics and random forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds. Animal, 12(1), 12-19. https://doi.org/10.1017/S1751731117001355
BlueSter. (2020). BlueSter. https://www.projet-bluester.eu/
Boulesteix, A. L., Bender, A., Bermejo, J. L., & Strobl, C. (2012). Random forest Gini importance favours SNPs with large minor allele frequency: Impact, sources and recommendations. Briefings in Bioinformatics, 13(3), 292-304. https://doi.org/10.1093/bib/bbr053
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1017/CBO9781107415324.004
Chang, C. C., Chow, C. C., Tellier, L. C. A. M., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015). Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience, 4(7), 1-16. https://doi.org/10.1186/s13742-015-0047-8
Dalvit, C., De Marchi, M., Dal Zotto, R., Gervaso, M., Meuwissen, T., & Cassandro, M. (2008). Breed assignment test in four Italian beef cattle breeds. Meat Science, 80(2), 389-395. https://doi.org/10.1016/j.meatsci.2008.01.001
Dalvit, C., De Marchi, M. D., Targhetta, C., Gervaso, M., & Cassandro, M. (2008). Genetic traceability of meat using microsatellite markers. Food Research International, 41, 301-307. https://doi.org/10.1016/j.foodres.2007.12.010
Despagne, F., Massart, L. D., & Chabot, P. (2000). Development of a robust calibration model for nonlinear in-line process data. Analytical Chemistry, 72(7), 1657-1665. https://doi.org/10.1021/ac991076k
Dimauro, C., Cellesi, M., Steri, R., Gaspa, G., Sorbolini, S., Stella, A., & Macciotta, N. P. P. (2013). Use of the canonical discriminant analysis to select SNP markers for bovine breed assignment and traceability purposes. Animal Genetics, 44, 377-382. https://doi.org/10.1111/age.12021
Ding, L., Wiener, H., Abebe, T., Altaye, M., Go, R. C. P., Kercsmar, C., Grabowski, G., Martin, L. J., Khurana Hershey, G. K., Chakorborty, R., & Baye, T. M. (2011). Comparison of measures of marker informativeness for ancestry and admixture mapping. BMC Genomics, 12, 622. https://doi.org/10.1186/1471-2164-12-622
Frkonja, A., Gredler, B., Schnyder, U., Curik, I., & Sölkner, J. (2012). Prediction of breed composition in an admixed cattle population. Animal Genetics, 43(6), 696-703. https://doi.org/10.1111/j.1365-2052.2012.02345.x
Funkhouser, S. A., Bates, R. O., Ernst, C. W., Newcom, D., & Steibel, J. P. (2017). Estimation of genome-wide and locus-specific breed composition in pigs. Translational Animal Science, 1(1), 36-44. https://doi.org/10.2527/tas2016.0003
Gebrehiwot, N. Z., Strucken, E. M., Marshall, K., Aliloo, H., & Gibson, J. P. (2021). SNP panels for the estimation of dairy breed proportion and parentage assignment in African crossbred dairy cattle. Genetics Selection Evolution, 53(21), 1-18. https://doi.org/10.1186/s12711-021-00615-4
Gobena, M., Elzo, M. A., & Mateescu, R. G. (2018). Population structure and genomic breed composition in an Angus-Brahman crossbred cattle population. Frontiers in Genetics, 9, 90. https://doi.org/10.3389/fgene.2018.00090
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.; Springer ed.). Springer. https://doi.org/10.1007/978-1-4419-9863-7_941
He, J., Guo, Y., Xu, J., Li, H., Fuller, A., Tait, R. G., Wu, X.-L., & Bauck, S. (2018). Comparing SNP panels and statistical methods for estimating genomic breed composition of individual animals in ten cattle breeds. BMC Genetics, 19, 56. https://doi.org/10.1186/s12863-018-0654-3
Henson, E. L. (FAO). (1992). The need for conservation. In FAO (Ed.), In situ conservation of livestock and poultry (pp. 21-36). FAO and UNEP.
Hulsegge, B., Calus, M. P. L., Windig, J. J., Hoving-Bolink, A. H., Maurice-van Eijndhoven, M. H. T., & Hiemstra, S. J. (2013). Selection of SNP from 50K and 777K arrays to predict breed of origin in cattle. Journal of Animal Science, 91(11), 5128-5134. https://doi.org/10.2527/jas.2013-6678
Hulsegge, I., Schoon, M., Windig, J., Neuteboom, M., Hiemstra, S. J., & Schurink, A. (2019). Development of a genetic tool for determining breed purity of cattle. Livestock Science, 223, 60-67. https://doi.org/10.1016/j.livsci.2019.03.002
Iquebal, M. A., Ansari, M. S., Sarika, S., Dixit, S. P., Verma, N. K., Aggarwal, R. A. K., Jayakumar, S., Rai, A., & Kumar, D. (2014). Locus minimization in breed prediction using artificial neural network approach. Animal Genetics, 45(6), 898-902. https://doi.org/10.1111/age.12208
Jolliffe, I. T. (2002). Principal components analysis. Springer series in statistics (2nd ed.). Springer-Verlag. https://doi.org/10.1016/B978-0-08-044894-7.01358-0
Josse, J., & Husson, F. (2012). Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique, 153(2), 77-99.
Judge, M. M., Kelleher, M. M., Kearney, J. F., Sleator, R. D., & Berry, D. P. (2017). Ultra-low-density genotype panels for breed assignment of Angus and Hereford cattle. Animal, 11(6), 938-947. https://doi.org/10.1017/S1751731116002457
Kersbergen, P., van Duijn, K., Kloosterman, A. D., den Dunnen, J. T., Kayser, M., & de Knijff, P. (2009). Developing a set of ancestry-sensitive DNA markers reflecting continental origins of humans. BMC Genetics, 10, 69. https://doi.org/10.1186/1471-2156-10-69
Kuehn, L. A., Keele, J. W., Bennett, G. L., McDaneld, T. G., Smith, T. P. L., Snelling, W. M., Sonstegard, T. S., & Thallman, R. M. (2011). Predicting breed composition using breed frequencies of 50,000 markers from the US Meat Animal Research Center 2,000 bull project. Journal of Animal Science, 89(6), 1742-1750. https://doi.org/10.2527/jas.2010-3530
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1-26. https://doi.org/10.18637/jss.v028.i05
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. In S. Imprint (Ed.), Applied predictive modeling. Spinger Nature. https://doi.org/10.1007/978-1-4614-6849-3
Lê, S., Josse, F., & Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software, 25(1), 1-18. https://doi.org/10.18637/jss.v025.i01
Lewis, J., Abas, Z., Dadousis, C., Lykidis, D., Paschou, P., & Drineas, P. (2011). Tracing cattle breeds with principal components analysis ancestry informative SNPs. PLoS One, 6(4), e18007. https://doi.org/10.1371/journal.pone.0018007
Nikolic, N., Park, Y.-S., Sancristobal, M., Lek, S., & Chevalet, C. (2009). What do artificial neural networks tell us about the genetic structure of populations? The example of European pig populations. Genetics Research, 91(2), 121-132. https://doi.org/10.1017/S0016672309000093
Padilla, J. Á., Sansinforiano, E., Parejo, J. C., Rabasco, A., & Martínez-Trancón, M. (2009). Inference of admixture in the endangered Blanca Cacereña bovine breed by microsatellite analyses. Livestock Science, 122(2-3), 314-322. https://doi.org/10.1016/j.livsci.2008.09.016
Paschou, P., Ziv, E., Burchard, E. G., Choudhry, S., Rodriguez-Cintron, W., Mahoney, M. W., & Drineas, P. (2007). PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genetics, 3(9), 1672-1686. https://doi.org/10.1371/journal.pgen.0030160
Pasupa, K., Rathasamuth, W., & Tongsima, S. (2020). Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique. BMC Bioinformatics, 21, 216. https://doi.org/10.1186/s12859-020-3471-4
Pokorska, J., Kułaj, D., Dusza, M., Żychlińska-Buczek, J., & Makulska, J. (2016). New rapid method of DNA isolation from milk somatic cells. Animal Biotechnology, 27(2), 113-117. https://doi.org/10.1080/10495398.2015.1116446
Pongpanich, M., Sullivan, P. F., & Tzeng, J.-Y. (2010). A quality control algorithm for filtering SNPs in genome-wide association studies. Bioinformatics, 26(14), 1731-1737. https://doi.org/10.1093/bioinformatics/btq272
Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A., & Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38, 904-909. https://doi.org/10.1038/ng1847
Purcell, S., & Chang, C. (2019). PLINK v1.9. www.cog-genomics.org/plink/1.9/
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Maller, J., Sklar, P., de Bakker, P. I. W., Daly, M. J., & Sham, P. C. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81(3), 559-575. https://doi.org/10.1086/519795
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.r-project.org/
RStudio Team (2020). RStudio: Integrated development for R. RStudio. http://www.rstudio.com/
Schiavo, G., Bertolini, F., Galimberti, G., Bovo, S., Dall'Olio, S., Nanni Costa, L., Gallo, M., & Fontanesi, L. (2019). A machine learning approach for the identification of population-informative markers from high-throughput genotyping data: Application to several pig breeds. Animal, 14, 223-232. https://doi.org/10.1017/S1751731119002167
Soyeurt, H., Grelet, C., McParland, S., Calmels, M., Coffey, M., Tedde, A., Delhez, P., Dehareng, F., & Gengler, N. (2020). A comparison of 4 different machine learning algorithms to predict lactoferrin content in bovine milk from mid-infrared spectra. Journal of Dairy Science, 103(12), 11585-11596. https://doi.org/10.3168/jds.2020-18870
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6567-6572. https://doi.org/10.1073/pnas.082099299
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. Evolution, 38(6), 1358-1370. https://doi.org/10.2307/2408641
Wilkinson, S., Wiener, P., Archibald, A. L., Law, A., Schnabel, R. D., McKay, S. D., Taylor, J. F., & Ogden, R. (2011). Evaluation of approaches for identifying population informative markers from high density SNP Chips. BMC Genetics, 12, 45. https://doi.org/10.1186/1471-2156-12-45
Wright, S. (1951). The genetical structure of populations. Annals of Eugenics, 15, 323-354. https://doi.org/10.1111/j.1469-1809.1949.tb02451.x

Auteurs

Hélène Wilmot (H)

National Fund for Scientific Research (F.R.S.-FNRS), Brussels, Belgium.
TERRA Teaching and Research Centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium.

Jeanne Bormann (J)

Administration of Technical Agricultural Services (ASTA), Luxembourg, Grand Duchy of Luxembourg.

Hélène Soyeurt (H)

TERRA Teaching and Research Centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium.

Xavier Hubin (X)

Walloon Breeders Association, Ciney, Belgium.

Géry Glorieux (G)

Walloon Breeders Association, Ciney, Belgium.

Patrick Mayeres (P)

Walloon Breeders Association, Ciney, Belgium.

Carlo Bertozzi (C)

Walloon Breeders Association, Ciney, Belgium.

Nicolas Gengler (N)

TERRA Teaching and Research Centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium.

Articles similaires

Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice
Animals Tail Swine Behavior, Animal Animal Husbandry

Classifications MeSH