Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.
SARS-CoV-2 (sub)lineages
anomaly detection
deep learning
genomic surveillance
spike protein sequences
Journal
Briefings in bioinformatics
ISSN: 1477-4054
Titre abrégé: Brief Bioinform
Pays: England
ID NLM: 100912837
Informations de publication
Date de publication:
23 Sep 2024
23 Sep 2024
Historique:
received:
25
07
2024
revised:
10
09
2024
accepted:
08
10
2024
medline:
24
10
2024
pubmed:
24
10
2024
entrez:
24
10
2024
Statut:
ppublish
Résumé
The COVID-19 pandemic is marked by the successive emergence of new SARS-CoV-2 variants, lineages, and sublineages that outcompete earlier strains, largely due to factors like increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system, to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute >10% of all the viral sequences added to the GISAID, a public database supporting viral genetic sequence sharing, in a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of ~4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01%-3%), with median lead times of 4-17 weeks, and predicts FDLs between ~5 and ~25 times better than a baseline approach. For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness and may provide significant insights for the optimization of public health 'pre-emptive' intervention strategies.
Identifiants
pubmed: 39446192
pii: 7833672
doi: 10.1093/bib/bbae535
pii:
doi:
Substances chimiques
Spike Glycoprotein, Coronavirus
0
spike protein, SARS-CoV-2
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NIH HHS
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI170187
Pays : United States
Informations de copyright
© The Author(s) 2024. Published by Oxford University Press.