Utilizing a novel high-resolution malaria dataset for climate-informed predictions with a deep learning transformer model.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
28 Dec 2023
28 Dec 2023
Historique:
received:
25
07
2023
accepted:
15
12
2023
medline:
29
12
2023
pubmed:
29
12
2023
entrez:
28
12
2023
Statut:
epublish
Résumé
Climatic factors influence malaria transmission via the effect on the Anopheles vector and Plasmodium parasite. Modelling and understanding the complex effects that climate has on malaria incidence can enable important early warning capabilities. Deep learning applications across fields are proving valuable, however the field of epidemiological forecasting is still in its infancy with a lack of applied deep learning studies for malaria in southern Africa which leverage quality datasets. Using a novel high resolution malaria incidence dataset containing 23 years of daily data from 1998 to 2021, a statistical model and XGBOOST machine learning model were compared to a deep learning Transformer model by assessing the accuracy of their numerical predictions. A novel loss function, used to account for the variable nature of the data yielded performance around + 20% compared to the standard MSE loss. When numerical predictions were converted to alert thresholds to mimic use in a real-world setting, the Transformer's performance of 80% according to AUROC was 20-40% higher than the statistical and XGBOOST models and it had the highest overall accuracy of 98%. The Transformer performed consistently with increased accuracy as more climate variables were used, indicating further potential for this prediction framework to predict malaria incidence at a daily level using climate data for southern Africa.
Identifiants
pubmed: 38155182
doi: 10.1038/s41598-023-50176-3
pii: 10.1038/s41598-023-50176-3
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
23091Informations de copyright
© 2023. The Author(s).
Références
Thomson, M. C. et al. Malaria early warnings based on seasonal climate forecasts from multi-model ensembles. Nature 439(7076), 576–579 (2006).
pubmed: 16452977
doi: 10.1038/nature04503
Hashizume, M., Terao, T. & Minakawa, N. The Indian Ocean Dipole and malaria risk in the highlands of western Kenya. Proc. Natl. Acad. Sci. 106(6), 1857–1862 (2009).
pubmed: 19174522
pmcid: 2644128
doi: 10.1073/pnas.0806544106
Haileselassie, W. et al. Burden of malaria, impact of interventions and climate variability in Western Ethiopia: an area with large irrigation-based farming. BMC Public Health 22(1), 1–11 (2022).
doi: 10.1186/s12889-022-12571-9
Zhou, G., Minakawa, N., Githeko, A. K. & Yan, G. Association between climate variability and malaria epidemics in the East African highlands. Proc. Natl. Acad. Sci. 101(8), 2375–2380 (2004).
pubmed: 14983017
pmcid: 356958
doi: 10.1073/pnas.0308714100
M’Bra, R. K. et al. Impact of climate variability on the transmission risk of malaria in northern Côte d’Ivoire. PLoS One 13(6), e0182304 (2018).
pubmed: 29897901
pmcid: 5999085
doi: 10.1371/journal.pone.0182304
Talapko, J., Škrlec, I., Alebić, T., Jukić, M. & Vćev, A. Malaria: the past and the present. Microorganisms 7(6), 179 (2019).
pubmed: 31234443
pmcid: 6617065
doi: 10.3390/microorganisms7060179
World Health Organization. World Malaria Report 2020 (World Health Organization, 2020).
doi: 10.30875/60123dd4-en
Ohrt, C. et al. Information systems to support surveillance for malaria elimination. Am. J. Trop. Med. Hyg. 93(1), 145 (2015).
pubmed: 26013378
pmcid: 4497887
doi: 10.4269/ajtmh.14-0257
Kim, Y. et al. Malaria predictions based on seasonal climate forecasts in South Africa: A time series distributed lag nonlinear model. Sci. Rep. 9(1), 1–10 (2019).
Santosh, T., Ramesh, D. & Reddy, D. LSTM based prediction of malaria abundances using big data. Comput. Biol. Med. 124, 103859 (2020).
pubmed: 32771672
doi: 10.1016/j.compbiomed.2020.103859
Mohapatra, P., Tripathi, N. K., Pal, I. & Shrestha, S. Comparative analysis of machine learning classifiers for the prediction of malaria incidence attributed to climatic factors.
Masinde, M. Africa's Malaria epidemic predictor: Application of machine learning on malaria incidence and climate data. Proc. of the 2020 the 4th International Conference on Compute and Data Analysis. 29–37 (2020).
Mussumeci, E. & Coelho, F. C. Large-scale multivariate forecasting models for Dengue-LSTM versus random forest regression. Spatial Spatio Temporal Epidemiol. 35, 100372 (2020).
doi: 10.1016/j.sste.2020.100372
Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med. 380(14), 1347–1358 (2019).
pubmed: 30943338
doi: 10.1056/NEJMra1814259
Nkiruka, O., Prasad, R. & Clement, O. Prediction of malaria incidence using climate variability and machine learning. Inf. Med. Unlocked 22, 100508 (2021).
doi: 10.1016/j.imu.2020.100508
Thomson, M. C., Mason, S. J., Phindela, T. & Connor, S. J. Use of rainfall and sea surface temperature monitoring for malaria early warning in Botswana. Am. J. Trop. Med. Hyg. 73(1), 214–221 (2005).
pubmed: 16014862
doi: 10.4269/ajtmh.2005.73.214
Behera, S. K. et al. Malaria incidences in South Africa linked to a climate mode in southwestern Indian Ocean. Environ. Dev.. 27, 47–57 (2018).
doi: 10.1016/j.envdev.2018.07.002
Eikenberry, S. E. & Gumel, A. B. Mathematical modeling of climate change and malaria transmission dynamics: A historical review. J. Math. Biol. 77(4), 857–933 (2018).
pubmed: 29691632
doi: 10.1007/s00285-018-1229-7
Kifle, M. M. et al. Malaria risk stratification and modeling the effect of rainfall on malaria incidence in Eritrea. J. Environ. Public Health 2019, 1–11 (2019).
doi: 10.1155/2019/7314129
Okuneye, K. & Gumel, A. B. Analysis of a temperature-and rainfall-dependent model for malaria transmission dynamics. Math. Biosci. 287, 72–92 (2017).
pubmed: 27107977
doi: 10.1016/j.mbs.2016.03.013
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. Attention is all you need. In: Advances in neural information processing systems. Vol 30. (2017).
Carmichael, I. & Marron, J. S. Data science vs. statistics: Two cultures?. Jpn. J. Stat. Data Sci. 1(1), 117–138 (2018).
doi: 10.1007/s42081-018-0009-3
Abbasimehr, H. & Baghery, F. S. A novel time series clustering method with fine-tuned support vector regression for customer behavior analysis. Expert Syst. Appl. 204, 117584 (2022).
doi: 10.1016/j.eswa.2022.117584
Xu, J. et al. Forecast of dengue cases in 20 Chinese cities based on the deep learning method. Int. J. Environ. Res. Public Health 17(2), 453 (2020).
pubmed: 31936708
pmcid: 7014037
doi: 10.3390/ijerph17020453
Ho, T. S. et al. Comparing machine learning with case-control models to identify confirmed dengue cases. PLoS Negl. Trop. Dis. 14(11), e0008843 (2020).
pubmed: 33170848
pmcid: 7654779
doi: 10.1371/journal.pntd.0008843
Wang, M. et al. A novel model for malaria prediction based on ensemble algorithms. PloS One 14(12), e0226910 (2019).
pubmed: 31877185
pmcid: 6932799
doi: 10.1371/journal.pone.0226910
Lim, B., Arık, S. Ö., Loeff, N. & Pfister, T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 37(4), 1748–1764 (2021).
doi: 10.1016/j.ijforecast.2021.03.012
Susan, S. & Kumar, A. The balancing trick: Optimized sampling of imbalanced datasets—a brief survey of the recent state of the art. Eng. Rep. 3(4), e12298 (2021).
doi: 10.1002/eng2.12298
Thickstun, J. The Transformer Model in Equations (University of Washington, 2021).
Bengio, S., Vinyals, O., Jaitly, N. & Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in Neural Information Processing Systems. 28 (2015).
Mohapatra, P., Tripathi, N. K., Pal, I. & Shrestha, S. Determining suitable machine learning classifier technique for prediction of malaria incidents attributed to climate of Odisha. Int. J. Environ. Health Res. 32(8), 1716–1732 (2022).
pubmed: 33769141
doi: 10.1080/09603123.2021.1905782
Jdey, I., Hcini, G. & Ltifi, H. Deep learning and machine learning for Malaria detection: Overview, challenges and future directions. arXiv preprint arXiv:2209.13292 . (2022).
Munir, M., Siddiqui, S. A., Chattha, M. A., Dengel, A. & Ahmed, S. Fusead: Unsupervised anomaly detection in streaming sensors data by fusing statistical and deep learning models. Sensors 19(11), 2451 (2019).
pubmed: 31146357
pmcid: 6603659
doi: 10.3390/s19112451
Kim, M. Prediction of COVID-19 confirmed cases after vaccination: Based on statistical and deep learning models. Sci. Med. J. 3(2), 153–165 (2021).
Martineau, P. et al. Predicting malaria outbreaks from sea surface temperature variability up to 9 months ahead in Limpopo, South Africa, using machine learning. Front. Pub. Health 25(10), 962377 (2022).
doi: 10.3389/fpubh.2022.962377
Adeola, A. M., Botai, J. O., Olwoch, J. M., Rautenbach, H. C., Adisa, O. M., De Jager, C., Botai, C. M. & Aaron, M. Predicting malaria cases using remotely sensed environmental variables in Nkomazi, South Africa. Geospatial Health. 14(1) (2019).
Mbunge, E., Milham, R. C., Sibiya, M. N. & Jr Takavarasha, S. Machine learning techniques for predicting malaria: Unpacking emerging challenges and opportunities for tackling malaria in sub-saharan Africa. Proc. Computer Science On-line Conference 327–344. (Springer International Publishing, Cham, 2023).
Nguyen, V. H. et al. Deep learning models for forecasting dengue fever based on climate data in Vietnam. PLoS Neglect. Trop. Dis. 16(6), e0010509 (2022).
doi: 10.1371/journal.pntd.0010509
Wu, N., Green, B., Ben, X. & O'Banion, S. Deep transformer models for time series forecasting: The influenza prevalence case. arXiv preprint arXiv:2001.08317 . (2020).
Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A. & Eickhoff, C. A transformer-based framework for multivariate time series representation learning. Proc. of the 27th ACM SIGKDD conference on knowledge discovery & data mining 2114–2124 (2021).
Wang, N. & Zhao, X. Time series forecasting based on convolution transformer. IEICE Trans. Inf. Syst. 106(5), 976–985 (2023).
doi: 10.1587/transinf.2022EDP7136
Xu, C., Li, J., Feng, B. & Lu, B. A financial time-series prediction model based on multiplex attention and linear transformer structure. Appl. Sci. 13(8), 5175 (2023).
doi: 10.3390/app13085175
Ahmed, D. M., Hassan, M. M. & Mstafa, R. J. A review on deep sequential models for forecasting time series data. Appl. Comput. Intell. Soft Comput. 3, 2022 (2022).
Ahmed, S., Nielsen, I. E., Tripathi, A., Siddiqui, S., Rasool, G. & Ramachandran, R. P. Transformers in time-series analysis: A tutorial. arXiv 2022. arXiv preprint arXiv:2205.01138 .
Haugsdal, E., Aune, E. & Ruocco, M. Persistence initialization: A novel adaptation of the transformer architecture for time series forecasting. Appl. Intell. 29, 1–6 (2023).
Mohammadi Farsani, R. & Pazouki, E. A transformer self-attention model for time series forecasting. J. Electric. Comput. Eng. Innov. (JECEI) 9(1), 1 (2020).
Kamana, E., Zhao, J. & Bai, D. Predicting the impact of climate change on the re-emergence of malaria cases in China using LSTMSeq2Seq deep learning model: A modelling and prediction analysis study. BMJ Open. 12(3), e053922 (2022).
pubmed: 35361642
pmcid: 8971767
doi: 10.1136/bmjopen-2021-053922
Teklehaimanot, H. D., Schwartz, J., Teklehaimanot, A. & Lipsitch, M. Alert threshold algorithms and malaria epidemic detection. Emerg. Infect. Dis. 10(7), 1220 (2004).
pubmed: 15324541
pmcid: 3323320
doi: 10.3201/eid1007.030722
Hartfield, M. & Alizon, S. Introducing the outbreak threshold in epidemiology. PLoS Pathog. 9(6), e1003277 (2013).
pubmed: 23785276
pmcid: 3680036
doi: 10.1371/journal.ppat.1003277
Bingham, N. H. & Fry, J. M. Regression: Linear Models in Statistics (Springer Science & Business Media, 2010).
doi: 10.1007/978-1-84882-969-5
Das, A., Kong, W., Sen, R. & Zhou, Y. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688 . (2023).
Radford, A. et al. Language models are unsupervised multitask learners. Open AI Blog. 1(8), 9 (2019).
NOAA Physical sciences laboratory. NCEP/DOE AMIP-II Reanalysis (Reanalysis-2) Data. NOAA physical sciences laboratory. Available from: https://psl.noaa.gov/data/gridded/ data.ncep.reanalysis2.html. Accessed March 2023.
Liu, M., Ren, S., Ma, S., Jiao, J., Chen, Y., Wang, Z. & Song, W. Gated transformer networks for multivariate time series classification. arXiv preprint arXiv:2103.14438 . (2021).
Chu J, Cao J, Chen Y. An ensemble deep learning model based on transformers for long sequence time-series forecasting. Proc. International Conference on Neural Computing for Advanced Applications 273–286 (Springer Nature, Singapore, 2022).
Liu, C., Yu, S., Yu, M., Wei, B., Li, B., Li, G. & Huang, W. Adaptive smooth L1 loss: A better way to regress scene texts with extreme aspect ratios. Proc. 2021 IEEE Symposium on Computers and Communications (ISCC) 1–7 (IEEE, 2021).