Using machine learning to predict COVID-19 infection and severity risk among 4,510 aged adults: a UK Biobank cohort study.
COVID-19
SARS-CoV-2
antibodies
cohort study
epidemiology
host response
linear discriminant analysis
machine learning
non-parametric
serology
Journal
medRxiv : the preprint server for health sciences
Titre abrégé: medRxiv
Pays: United States
ID NLM: 101767986
Informations de publication
Date de publication:
05 Jan 2021
05 Jan 2021
Historique:
pubmed:
25
6
2020
medline:
25
6
2020
entrez:
25
6
2020
Statut:
epublish
Résumé
Many risk factors have emerged for novel 2019 coronavirus disease (COVID-19). It is relatively unknown how these factors collectively predict COVID-19 infection risk, as well as risk for a severe infection (i.e., hospitalization). Among aged adults (69.3 ± 8.6 years) in UK Biobank, COVID-19 data was downloaded for 4,510 participants with 7,539 test cases. We downloaded baseline data from 10-14 years ago, including demographics, biochemistry, body mass, and other factors, as well as antibody titers for 20 common to rare infectious diseases. Permutation-based linear discriminant analysis was used to predict COVID-19 risk and hospitalization risk. Probability and threshold metrics included receiver operating characteristic curves to derive area under the curve (AUC), specificity, sensitivity, and quadratic mean. The "best-fit" model for predicting COVID-19 risk achieved excellent discrimination (AUC=0.969, 95% CI=0.934-1.000). Factors included age, immune markers, lipids, and serology titers to common pathogens like human cytomegalovirus. The hospitalization "best-fit" model was more modest (AUC=0.803, 95% CI=0.663-0.943) and included only serology titers. Accurate risk profiles can be created using standard self-report and biomedical data collected in public health and medical settings. It is also worthwhile to further investigate if prior host immunity predicts current host immunity to COVID-19.
Sections du résumé
BACKGROUND
BACKGROUND
Many risk factors have emerged for novel 2019 coronavirus disease (COVID-19). It is relatively unknown how these factors collectively predict COVID-19 infection risk, as well as risk for a severe infection (i.e., hospitalization).
METHODS
METHODS
Among aged adults (69.3 ± 8.6 years) in UK Biobank, COVID-19 data was downloaded for 4,510 participants with 7,539 test cases. We downloaded baseline data from 10-14 years ago, including demographics, biochemistry, body mass, and other factors, as well as antibody titers for 20 common to rare infectious diseases. Permutation-based linear discriminant analysis was used to predict COVID-19 risk and hospitalization risk. Probability and threshold metrics included receiver operating characteristic curves to derive area under the curve (AUC), specificity, sensitivity, and quadratic mean.
RESULTS
RESULTS
The "best-fit" model for predicting COVID-19 risk achieved excellent discrimination (AUC=0.969, 95% CI=0.934-1.000). Factors included age, immune markers, lipids, and serology titers to common pathogens like human cytomegalovirus. The hospitalization "best-fit" model was more modest (AUC=0.803, 95% CI=0.663-0.943) and included only serology titers.
CONCLUSIONS
CONCLUSIONS
Accurate risk profiles can be created using standard self-report and biomedical data collected in public health and medical settings. It is also worthwhile to further investigate if prior host immunity predicts current host immunity to COVID-19.
Identifiants
pubmed: 32577673
doi: 10.1101/2020.06.09.20127092
pmc: PMC7302228
pii:
doi:
Types de publication
Preprint
Langues
eng
Subventions
Organisme : NIA NIH HHS
ID : K99 AG047282
Pays : United States
Organisme : NIA NIH HHS
ID : R00 AG047282
Pays : United States
Commentaires et corrections
Type : UpdateIn
Déclaration de conflit d'intérêts
Competing Interests Statement The authors declare that they have no competing interests.