Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.


Journal

Briefings in bioinformatics
ISSN: 1477-4054
Titre abrégé: Brief Bioinform
Pays: England
ID NLM: 100912837

Informations de publication

Date de publication:
23 Sep 2024
Historique:
received: 25 07 2024
revised: 10 09 2024
accepted: 08 10 2024
medline: 24 10 2024
pubmed: 24 10 2024
entrez: 24 10 2024
Statut: ppublish

Résumé

The COVID-19 pandemic is marked by the successive emergence of new SARS-CoV-2 variants, lineages, and sublineages that outcompete earlier strains, largely due to factors like increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system, to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute >10% of all the viral sequences added to the GISAID, a public database supporting viral genetic sequence sharing, in a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of ~4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01%-3%), with median lead times of 4-17 weeks, and predicts FDLs between ~5 and ~25 times better than a baseline approach. For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness and may provide significant insights for the optimization of public health 'pre-emptive' intervention strategies.

Identifiants

pubmed: 39446192
pii: 7833672
doi: 10.1093/bib/bbae535
pii:
doi:

Substances chimiques

Spike Glycoprotein, Coronavirus 0
spike protein, SARS-CoV-2 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIH HHS
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI170187
Pays : United States

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press.

Auteurs

Simone Rancati (S)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Adolfo Ferrata 5, Pavia, 27100, Italy.

Giovanna Nicora (G)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Adolfo Ferrata 5, Pavia, 27100, Italy.

Mattia Prosperi (M)

Department of Epidemiology, College of Public Health and Health Professions, University of Florida, 2004 Mowry Road, Gainesville, FL 32610, United States.
Emerging Pathogens Institute, University of Florida, 2055 Mowry Road, Gainesville, FL 32610, United States.

Riccardo Bellazzi (R)

Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Adolfo Ferrata 5, Pavia, 27100, Italy.

Marco Salemi (M)

Emerging Pathogens Institute, University of Florida, 2055 Mowry Road, Gainesville, FL 32610, United States.
Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, 1600 SW Archer Road, Gainesville, FL 32610, United States.

Simone Marini (S)

Department of Epidemiology, College of Public Health and Health Professions, University of Florida, 2004 Mowry Road, Gainesville, FL 32610, United States.
Emerging Pathogens Institute, University of Florida, 2055 Mowry Road, Gainesville, FL 32610, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH