Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models.

COVID-19 coronavirus digital data digital epidemiology emerging outbreak forecasting hybrid model hybrid simulation machine learning machine learning in public health mechanistic model modeling modeling disease outbreaks precision public health simulation

Journal

Journal of medical Internet research
ISSN: 1438-8871
Titre abrégé: J Med Internet Res
Pays: Canada
ID NLM: 100959882

Informations de publication

Date de publication:
17 08 2020
Historique:
received: 14 05 2020
accepted: 24 07 2020
revised: 24 07 2020
pubmed: 31 7 2020
medline: 28 8 2020
entrez: 31 7 2020
Statut: epublish

Résumé

The inherent difficulty of identifying and monitoring emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing coronavirus disease (COVID-19) outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events. We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real time. Our method uses the following as inputs: (a) official health reports, (b) COVID-19-related internet search activity, (c) news media activity, and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine learning methodology uses a clustering technique that enables the exploitation of geospatial synchronicities of COVID-19 activity across Chinese provinces and a data augmentation technique to deal with the small number of historical disease observations characteristic of emerging outbreaks. Our model is able to produce stable and accurate forecasts 2 days ahead of the current time and outperforms a collection of baseline models in 27 out of 32 Chinese provinces. Our methodology could be easily extended to other geographies currently affected by COVID-19 to aid decision makers with monitoring and possibly prevention.

Sections du résumé

BACKGROUND
The inherent difficulty of identifying and monitoring emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing coronavirus disease (COVID-19) outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events.
OBJECTIVE
We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real time.
METHODS
Our method uses the following as inputs: (a) official health reports, (b) COVID-19-related internet search activity, (c) news media activity, and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine learning methodology uses a clustering technique that enables the exploitation of geospatial synchronicities of COVID-19 activity across Chinese provinces and a data augmentation technique to deal with the small number of historical disease observations characteristic of emerging outbreaks.
RESULTS
Our model is able to produce stable and accurate forecasts 2 days ahead of the current time and outperforms a collection of baseline models in 27 out of 32 Chinese provinces.
CONCLUSIONS
Our methodology could be easily extended to other geographies currently affected by COVID-19 to aid decision makers with monitoring and possibly prevention.

Identifiants

pubmed: 32730217
pii: v22i8e20285
doi: 10.2196/20285
pmc: PMC7459435
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

e20285

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM130668
Pays : United States

Commentaires et corrections

Type : ErratumIn

Informations de copyright

©Canelle Poirier, Dianbo Liu, Leonardo Clemente, Xiyu Ding, Matteo Chinazzi, Jessica Davis, Alessandro Vespignani, Mauricio Santillana. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 17.08.2020.

Références

Lancet. 2020 Feb 15;395(10223):470-473
pubmed: 31986257
N Engl J Med. 2020 Feb 20;382(8):727-733
pubmed: 31978945
N Engl J Med. 2020 Feb 27;382(9):872-874
pubmed: 31991079
Lancet Digit Health. 2020 Apr;2(4):e201-e208
pubmed: 32309796
Clin Infect Dis. 2014 Nov 15;59(10):1446-50
pubmed: 25115873
Lancet. 2020 Feb 29;395(10225):689-697
pubmed: 32014114
Nat Commun. 2019 Jan 11;10(1):147
pubmed: 30635558
Science. 2020 May 1;368(6490):493-497
pubmed: 32213647
Proc Natl Acad Sci U S A. 2017 May 30;114(22):E4334-E4343
pubmed: 28442561
Lancet. 2020 Feb 15;395(10223):514-523
pubmed: 31986261
Proc Natl Acad Sci U S A. 2015 Nov 24;112(47):14473-8
pubmed: 26553980
N Engl J Med. 2020 Mar 26;382(13):1199-1207
pubmed: 31995857
Biometrika. 1967 Jun;54(1):1-24
pubmed: 4860564
PLoS Negl Trop Dis. 2017 Jan 13;11(1):e0005295
pubmed: 28085877
Science. 2020 May 1;368(6490):489-493
pubmed: 32179701
PLoS Comput Biol. 2013;9(1):e1002803
pubmed: 23341757
Euro Surveill. 2020 Mar;25(10):
pubmed: 32183935
Curr Top Microbiol Immunol. 2019;424:59-74
pubmed: 31292726
Nature. 2020 Sep;585(7825):410-413
pubmed: 32365354
Clin Infect Dis. 2016 Jan 1;62(1):24-31
pubmed: 26338786
PLoS Comput Biol. 2020 Aug 17;16(8):e1008117
pubmed: 32804932
PLoS One. 2021 May 19;16(5):e0250890
pubmed: 34010293
PLoS Curr. 2014 Sep 02;6:
pubmed: 25642360
J Comput Sci. 2010 Aug 1;1(3):132-145
pubmed: 21415939
PLoS Comput Biol. 2015 Oct 29;11(10):e1004513
pubmed: 26513245
Lancet. 2020 Mar 14;395(10227):871-877
pubmed: 32087820
J Clin Med. 2020 Feb 14;9(2):
pubmed: 32075152
Science. 2020 Apr 24;368(6489):395-400
pubmed: 32144116

Auteurs

Dianbo Liu (D)

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
Department of Pediatrics, Harvard Medical School, Boston, MA, United States.

Leonardo Clemente (L)

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
Department of Pediatrics, Harvard Medical School, Boston, MA, United States.
Tecnologico de Monterrey, Monterrey, Mexico.

Canelle Poirier (C)

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
Department of Pediatrics, Harvard Medical School, Boston, MA, United States.

Xiyu Ding (X)

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
Harvard TH Chan School of Public Health, Boston, MA, United States.

Matteo Chinazzi (M)

Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States.

Jessica Davis (J)

Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States.

Alessandro Vespignani (A)

Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States.
ISI Foundation, Turin, Italy.

Mauricio Santillana (M)

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
Department of Pediatrics, Harvard Medical School, Boston, MA, United States.
Harvard TH Chan School of Public Health, Boston, MA, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH