Exploring the predictive capability of advanced machine learning in identifying severe disease phenotype in Salmonella enterica.
Disease phenotypes
Disease severity
Food safety
Machine learning
Predictive modeling
Salmonella
Journal
Food research international (Ottawa, Ont.)
ISSN: 1873-7145
Titre abrégé: Food Res Int
Pays: Canada
ID NLM: 9210143
Informations de publication
Date de publication:
01 2022
01 2022
Historique:
received:
17
09
2021
revised:
12
11
2021
accepted:
17
11
2021
entrez:
4
1
2022
pubmed:
5
1
2022
medline:
28
1
2022
Statut:
ppublish
Résumé
The past few years have seen a significant increase in availability of whole genome sequencing information, allowing for its incorporation in predictive modeling for foodborne pathogens to account for inter- and intra-species differences in their virulence. However, this is hindered by the inability of traditional statistical methods to analyze such large amounts of data compared to the number of observations/isolates. In this study, we have explored the applicability of machine learning (ML) models to predict the disease outcome, while identifying features that exert a significant effect on the prediction. This study was conducted on Salmonella enterica, a major foodborne pathogen with considerable inter- and intra-serovar variation. WGS of isolates obtained from various sources (i.e., human, chicken, and swine) were used as input in four machine learning models (logistic regression with ridge, random forest, support vector machine, and AdaBoost) to classify isolates based on disease severity (extraintestinal vs. gastrointestinal) in the host. The predictive performances of all models were tested with and without Elastic Net regularization to combat dimensionality issues. Elastic Net-regularized logistic regression model showed the best area under the receiver operating characteristic curve (AUC-ROC; 0.86) and outcome prediction accuracy (0.76). Additionally, genes coding for transcriptional regulation, acidic, oxidative, and anaerobic stress response, and antibiotic resistance were found to be significant predictors of disease severity. These genes, which were significantly associated with each outcome, could possibly be input in amended, gene-expression-specific predictive models to estimate virulence pattern-specific effect of Salmonella and other foodborne pathogens on human health.
Identifiants
pubmed: 34980422
pii: S0963-9969(21)00717-1
doi: 10.1016/j.foodres.2021.110817
pii:
doi:
Types de publication
Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
110817Subventions
Organisme : FDA HHS
ID : U01 FD001418
Pays : United States
Informations de copyright
Copyright © 2021 Elsevier Ltd. All rights reserved.