Comparison of machine learning methods for genomic prediction of selected Arabidopsis thaliana traits.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2024
Historique:
received: 29 11 2023
accepted: 04 08 2024
medline: 29 8 2024
pubmed: 29 8 2024
entrez: 28 8 2024
Statut: epublish

Résumé

We present a comparison of machine learning methods for the prediction of four quantitative traits in Arabidopsis thaliana. High prediction accuracies were achieved on individuals grown under standardized laboratory conditions from the 1001 Arabidopsis Genomes Project. An existing body of evidence suggests that linear models may be impeded by their inability to make use of non-additive effects to explain phenotypic variation at the population level. The results presented here use a nested cross-validation approach to confirm that some machine learning methods have the ability to statistically outperform linear prediction models, with the optimal model dependent on availability of training data and genetic architecture of the trait in question. Linear models were competitive in their performance as per previous work, though the neural network class of predictors was observed to be the most accurate and robust for traits with high heritability. The extent to which non-linear models exploit interaction effects will require further investigation of the causal pathways that lay behind their predictions. Future work utilizing more traits and larger sample sizes, combined with an improved understanding of their respective genetic architectures, may lead to improvements in prediction accuracy.

Identifiants

pubmed: 39196916
doi: 10.1371/journal.pone.0308962
pii: PONE-D-23-39918
doi:

Types de publication

Journal Article Comparative Study

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0308962

Informations de copyright

Copyright: © 2024 Kelly, McLaughlin. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Auteurs

Ciaran Michael Kelly (CM)

Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland.

Russell Lewis McLaughlin (RL)

Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Arabidopsis Arabidopsis Proteins Osmotic Pressure Cytoplasm RNA, Messenger

Classifications MeSH