A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models.


Journal

ArXiv
ISSN: 2331-8422
Titre abrégé: ArXiv
Pays: United States
ID NLM: 101759493

Informations de publication

Date de publication:
08 Apr 2020
Historique:
pubmed: 19 6 2020
medline: 19 6 2020
entrez: 19 6 2020
Statut: epublish

Résumé

We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of historical disease activity observations, characteristic of emerging outbreaks. Our model's predictive power outperforms a collection of baseline models in 27 out of the 32 Chinese provinces, and could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.

Identifiants

pubmed: 32550248
pii: 2004.04019
pmc: PMC7280902
pii:

Types de publication

News Preprint

Langues

eng

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM130668
Pays : United States

Auteurs

Classifications MeSH