The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method.
breed assignment
genomic relationship matrix
local breeds
machine learning
single nucleotide polymorphism
support vector machine
Journal
Journal of animal science
ISSN: 1525-3163
Titre abrégé: J Anim Sci
Pays: United States
ID NLM: 8003002
Informations de publication
Date de publication:
03 Jan 2023
03 Jan 2023
Historique:
received:
20
02
2023
accepted:
20
05
2023
medline:
19
6
2023
pubmed:
24
5
2023
entrez:
23
5
2023
Statut:
ppublish
Résumé
To develop a breed assignment model, three main steps are generally followed: 1) The selection of breed informative single nucleotide polymorphism (SNP); 2) The training of a model, based on a reference population, that allows to classify animals to their breed of origin; and 3) The validation of the developed model on external animals i.e., that were not used in previous steps. However, there is no consensus in the literature about which methodology to follow for the first step, nor about the number of SNP to be selected. This can raise many questions when developing the model and lead to the use of sophisticated methodologies for selecting SNP (e.g., with iterative algorithms, partitions of SNP, or combination of several methods). Therefore, it may be of interest to avoid the first step by the use of all the available SNP. For this purpose, we propose the use of a genomic relationship matrix (GRM), combined or not with a machine learning method, for breed assignment. We compared it with a previously developed model based on selected informative SNP. Four methodologies were investigated: 1) The PLS_NSC methodology: selection of SNP based on a partial least square-discriminant analysis (PLS-DA) and breed assignment by classification based on the nearest shrunken centroids (NSC) method; 2) Breed assignment based on the highest mean relatedness of an animal to the reference populations of each breed (referred to mean_GRM); 3) Breed assignment based on the highest SD of the relatedness of an animal to the reference populations of each breed (referred to SD_GRM) and 4) The GRM_SVM methodology: the use of means and SD of the relatedness defined in mean_GRM and SD_GRM methodologies combined with the linear support vector machine (SVM), a machine learning method used for classification. Regarding mean global accuracies, results showed that the use of mean_GRM or GRM_SVM was not significantly different (Bonferroni corrected P > 0.0083) than the model based on a reduced SNP panel (PLS_NSC). Moreover, the mean_GRM and GRM_SVM methodology were more efficient than PLS_NSC as it was faster to compute. Therefore, it is possible to bypass the selection of SNP and, by the use of a GRM, to develop an efficient breed assignment model. In routine, we recommend the use of GRM_SVM over mean_GRM as it gave a slightly increased global accuracy, which can help endangered breeds to be maintained. The script to execute the different methodologies can be accessed on: https://github.com/hwilmot675/Breed_assignment. Breed assignment models generally rely on three main steps: 1) Selection of markers that allow to distinguish the breeds under study; 2) Development of a classification model that assigns each animal to its breed of origin; and 3) Validation of the developed model with new animals, to verify that the developed model is not overfitted. The first step often raises several questions about the methodology to select the best markers or about the number of markers to select. That is why it can be interesting to avoid this first step and to use an appropriate methodology that performs similarly without the need for single nucleotide polymorphism (SNP) selection. In this study, we developed different methodologies based on the genomic relationship matrix (GRM), combined or not with a machine learning method, to assign animals to their breed of origin. The results showed that the model based on a GRM combined with a machine learning method showed equivalent percentage of correct assignment to a previously developed model relying on SNP selection while being substantially faster to compute. It is therefore possible to assign animals to their breed by the use of a GRM and to bypass the first step of selection of SNP.
Autres résumés
Type: plain-language-summary
(eng)
Breed assignment models generally rely on three main steps: 1) Selection of markers that allow to distinguish the breeds under study; 2) Development of a classification model that assigns each animal to its breed of origin; and 3) Validation of the developed model with new animals, to verify that the developed model is not overfitted. The first step often raises several questions about the methodology to select the best markers or about the number of markers to select. That is why it can be interesting to avoid this first step and to use an appropriate methodology that performs similarly without the need for single nucleotide polymorphism (SNP) selection. In this study, we developed different methodologies based on the genomic relationship matrix (GRM), combined or not with a machine learning method, to assign animals to their breed of origin. The results showed that the model based on a GRM combined with a machine learning method showed equivalent percentage of correct assignment to a previously developed model relying on SNP selection while being substantially faster to compute. It is therefore possible to assign animals to their breed by the use of a GRM and to bypass the first step of selection of SNP.
Identifiants
pubmed: 37220912
pii: 7176407
doi: 10.1093/jas/skad172
pmc: PMC10276639
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Fonds De La Recherche Scientifique - FNRS
Organisme : Wallonia-Brussels Federation
Informations de copyright
© The Author(s) 2023. Published by Oxford University Press on behalf of the American Society of Animal Science.
Références
BMC Genet. 2018 Aug 9;19(1):56
pubmed: 30092776
BMC Bioinformatics. 2020 May 26;21(1):216
pubmed: 32456608
IEEE Trans Neural Netw. 2002;13(1):143-59
pubmed: 18244416
J Anim Breed Genet. 2022 Jan;139(1):40-61
pubmed: 34427366
Transl Anim Sci. 2017 Feb 01;1(1):36-44
pubmed: 32704628
Animal. 2017 Jun;11(6):938-947
pubmed: 27881206
Genet Sel Evol. 2011 Mar 02;43:10
pubmed: 21366914
J Anim Sci. 2013 Nov;91(11):5128-34
pubmed: 24045484
Proc Natl Acad Sci U S A. 2002 May 14;99(10):6567-72
pubmed: 12011421
Front Genet. 2018 Mar 27;9:90
pubmed: 29636769
BMC Genet. 2011 May 13;12:45
pubmed: 21569514
J Dairy Sci. 2008 Nov;91(11):4414-23
pubmed: 18946147
Animal. 2023 May;17(5):100793
pubmed: 37087997
Animal. 2018 Jan;12(1):12-19
pubmed: 28643617
Genes (Basel). 2022 Nov 03;13(11):
pubmed: 36360261
BMC Genet. 2014 Sep 16;15:92
pubmed: 25223795