A hybrid machine learning/deep learning COVID-19 severity predictive model from CT images and clinical data.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
14 03 2022
14 03 2022
Historique:
received:
25
05
2021
accepted:
22
02
2022
entrez:
15
3
2022
pubmed:
16
3
2022
medline:
5
4
2022
Statut:
epublish
Résumé
COVID-19 clinical presentation and prognosis are highly variable, ranging from asymptomatic and paucisymptomatic cases to acute respiratory distress syndrome and multi-organ involvement. We developed a hybrid machine learning/deep learning model to classify patients in two outcome categories, non-ICU and ICU (intensive care admission or death), using 558 patients admitted in a northern Italy hospital in February/May of 2020. A fully 3D patient-level CNN classifier on baseline CT images is used as feature extractor. Features extracted, alongside with laboratory and clinical data, are fed for selection in a Boruta algorithm with SHAP game theoretical values. A classifier is built on the reduced feature space using CatBoost gradient boosting algorithm and reaching a probabilistic AUC of 0.949 on holdout test set. The model aims to provide clinical decision support to medical doctors, with the probability score of belonging to an outcome class and with case-based SHAP interpretation of features importance.
Identifiants
pubmed: 35288579
doi: 10.1038/s41598-022-07890-1
pii: 10.1038/s41598-022-07890-1
pmc: PMC8919158
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
4329Informations de copyright
© 2022. The Author(s).
Références
Struyf, T. et al. Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 disease. In Cochrane Database of Systematic Reviews (2020).
Gupta, A. et al. Extrapulmonary manifestations of COVID-19. Nat. Med. 26, 1017–1032 (2020).
pubmed: 32651579
doi: 10.1038/s41591-020-0968-3
Li, H. et al. SARS-CoV-2 and viral sepsis: Observations and hypotheses. Lancet 395, 1517–1520 (2020).
pubmed: 32311318
pmcid: 7164875
doi: 10.1016/S0140-6736(20)30920-X
Wiersinga, W. J., Rhodes, A., Cheng, A. C., Peacock, S. J. & Prescott, H. C. Pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19): A review. JAMA 324, 782–793 (2020).
pubmed: 32648899
doi: 10.1001/jama.2020.12839
Tayarani-N, M.-H. Applications of artificial intelligence in battling against Covid-19: A literature review. Chaos Solitons Fractals 142, 110338 (2021).
pubmed: 33041533
doi: 10.1016/j.chaos.2020.110338
Born, J. et al. On the role of artificial intelligence in medical imaging of COVID-19. Patterns 2, 100330 (2021).
pubmed: 34405156
pmcid: 8361688
doi: 10.1016/j.patter.2021.100330
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural. Inf. Process. Syst. 31, 6638–6648 (2018).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 . Workshop on ML Systems at NIPS 2017 (2018).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, 4765–4774 (2017).
Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Bbiomed. Eng. 2, 749–760 (2018).
doi: 10.1038/s41551-018-0304-0
Bottino, F. et al. COVID mortality prediction with machine learning methods: A systematic review and critical appraisal. J. Pers. Med. 11, 893 (2021).
pubmed: 34575670
pmcid: 8467935
doi: 10.3390/jpm11090893
Kulkarni, A. R. et al. Deep learning model to predict the need for mechanical ventilation using chest X-ray images in hospitalised patients with COVID-19. BMJ Innov.7 (2021).
Feng, Y.-Z. et al. Severity assessment and progression prediction of COVID-19 patients based on the LesionEncoder framework and chest CT. Information 12, 471 (2021).
doi: 10.3390/info12110471
Xiao, L.-S. et al. Development and validation of a deep learning-based model using computed tomography imaging for predicting disease severity of coronavirus disease 2019. Front. Bioeng. Biotechnol. 8, 898 (2020).
pubmed: 32850746
pmcid: 7411489
doi: 10.3389/fbioe.2020.00898
Wang, S. et al. A deep learning radiomics model to identify poor outcome in COVID-19 patients with underlying health conditions: A multicenter study. IEEE J. Biomed. Health Inform. 25, 2353–2362 (2021).
pubmed: 33905341
doi: 10.1109/JBHI.2021.3076086
Zhang, K. et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell 181, 1423–1433 (2020).
pubmed: 32416069
pmcid: 7196900
doi: 10.1016/j.cell.2020.04.045
Chassagnon, G. et al. AI-driven quantification, staging and outcome prediction of COVID-19 pneumonia. Med. Image Anal. 67, 101860 (2021).
pubmed: 33171345
doi: 10.1016/j.media.2020.101860
Chao, H. et al. Integrative analysis for COVID-19 patient outcome prediction. Med. Image Anal. 67, 101844 (2020).
pubmed: 33091743
pmcid: 7553063
doi: 10.1016/j.media.2020.101844
Wu, Q. et al. Radiomics analysis of computed tomography helps predict poor prognostic outcome in COVID-19. Theranostics 10, 7231 (2020).
pubmed: 32641989
pmcid: 7330838
doi: 10.7150/thno.46428
Ning, W. et al. Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning. Nat. Biomed. Eng. 4, 1197–1207 (2020).
pubmed: 33208927
pmcid: 7723858
doi: 10.1038/s41551-020-00633-5
Lassau, N. et al. Integrating deep learning CT-scan model, biological and clinical variables to predict severity of COVID-19 patients. Nat. Commun. 12, 1–11 (2021).
doi: 10.1038/s41467-020-20657-4
Jiao, Z. et al. Prognostication of patients with COVID-19 using artificial intelligence based on chest x-rays and clinical data: A retrospective study. Lancet Digit. Health 3, e286–e294 (2021).
pubmed: 33773969
pmcid: 7990487
doi: 10.1016/S2589-7500(21)00039-X
Wang, R. et al. Artificial intelligence for prediction of COVID-19 progression using CT imaging and clinical data. Eur. Radiol. 35, 205–212 (2022).
doi: 10.1007/s00330-021-08049-8
Shamout, F. E. et al. An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department. NPJ Digit. Med. 4, 1–11 (2021).
doi: 10.1038/s41746-021-00453-0
Kwon, Y. J. et al. Combining initial radiographs and clinical variables improves deep learning prognostication in patients with COVID-19 from the emergency department. Radiol. Artif. Intell. 3, e200098 (2020).
pubmed: 33928257
pmcid: 7754832
doi: 10.1148/ryai.2020200098
Ho, T. T. et al. Deep learning models for predicting severe progression in COVID-19-infected patients: Retrospective study. JMIR Med. Inform. 9, e24973 (2021).
pubmed: 33455900
pmcid: 7850779
doi: 10.2196/24973
Xu, M. et al. Accurately differentiating COVID-19, other viral infection, and healthy individuals using multimodal features via late fusion learning. J. Med. Internet Res. 23, e25535 (2021).
pubmed: 33404516
pmcid: 7790733
doi: 10.2196/25535
Fang, C. et al. Deep learning for predicting COVID-19 malignant progression. Med. Image Anal. 72, 102096 (2021).
pubmed: 34051438
pmcid: 8112895
doi: 10.1016/j.media.2021.102096
Soda, P. et al. AIforCOVID: Predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study. Med. Image Anal. 74, 102216 (2021).
pubmed: 34492574
pmcid: 8401374
doi: 10.1016/j.media.2021.102216
Aloisio, E. et al. A comprehensive appraisal of laboratory biochemistry tests as major predictors of COVID-19 severity. Arch. Pathol. Lab. Med. 144, 1457–1464 (2020).
pubmed: 32649222
doi: 10.5858/arpa.2020-0389-SA
Chen, X.-Y., Huang, M.-Y., Xiao, Z.-W., Yang, S. & Chen, X.-Q. Lactate dehydrogenase elevations is associated with severity of COVID-19: A meta-analysis. Crit. Care 24, 1–3 (2020).
doi: 10.1186/s13054-020-03161-5
Lippi, G. & Favaloro, E. J. D-dimer is associated with severity of coronavirus disease 2019: A pooled analysis. Thromb. Haemost. 120, 876 (2020).
pubmed: 32246450
pmcid: 7295300
doi: 10.1055/s-0040-1709650
McElvaney, O. J. et al. Characterization of the inflammatory response to severe COVID-19 illness. Am. J. Respir. Crit. Care Med. 202, 812–821 (2020).
pubmed: 32584597
pmcid: 7491404
doi: 10.1164/rccm.202005-1583OC
Rodriguez-Morales, A. J. et al. Clinical, laboratory and imaging features of COVID-19: A systematic review and meta-analysis. Travel Med. Infect. Dis. 34, 101623 (2020).
pubmed: 32179124
pmcid: 7102608
doi: 10.1016/j.tmaid.2020.101623
Guan, W.-J. et al. Clinical characteristics of coronavirus disease 2019 in China. N. Engl. J. Med. 382, 1708–1720 (2020).
pubmed: 32109013
doi: 10.1056/NEJMoa2002032
Keany, E. et al. Ekeany/Boruta-Shap: BorutaShap, https://doi.org/10.5281/zenodo.4247618 (2020).
Sluimer, I., Prokop, M. & Van Ginneken, B. Toward automated segmentation of the pathological lung in CT. IEEE Trans. Med. Imaging 24, 1025–1038 (2005).
pubmed: 16092334
doi: 10.1109/TMI.2005.851757
Liauchuk, V. & Kovalev, V. ImageCLEF 2017: Supervoxels and co-occurrence for tuberculosis CT image classification. In CLEF2017 Working Notes, CEUR Workshop Proceedings (CEUR-WS.org http://ceur-ws.org , Dublin, Ireland, 2017).
Sharp, G. C. et al. Plastimatch: An open source software suite for radiotherapy image processing. In Proceedings of the XVIth International Conference on the use of Computers in Radiotherapy (ICCR), Amsterdam, Netherlands (2010).
ImageCLEFmed Tubercolosis (2020). Accessed: 2020-12-19.
Wu, Y. & He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), 3–19 (2018).
Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
Pérez-García, F., Sparks, R. & Ourselin, S. TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. arXiv preprint arXiv:2003.04696 (2020).
Lin, W. et al. Convolutional neural networks-based MRI image analysis for the Alzheimer’s disease prediction from mild cognitive impairment. Front. Neurosci. 12, 777 (2018).
pubmed: 30455622
pmcid: 6231297
doi: 10.3389/fnins.2018.00777
Kursa, M. B., Jankowski, A. & Rudnicki, W. R. Boruta—A system for feature selection. Fund. Inform. 101, 271–285 (2010).
Lundberg, S. M., Erion, G. G. & Lee, S.-I. Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888 (2018).
Hooker, G. & Mentch, L. Please stop permuting features: An explanation and alternatives. arXiv preprint arXiv:1905.03151 (2019).
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631 (2019).
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 8026–8037 (2019).
Force, A. D. T. et al. Acute respiratory distress syndrome. JAMA 307, 2526–2533 (2012).
Altaf, T., Anwar, S. M., Gul, N., Majeed, M. N. & Majid, M. Multi-class Alzheimer’s disease classification using image and clinical features. Biomed. Signal Process. Control 43, 64–74 (2018).
doi: 10.1016/j.bspc.2018.02.019
Tunali, I. et al. Novel clinical and radiomic predictors of rapid disease progression phenotypes among lung cancer patients treated with immunotherapy: An early report. Lung Cancer 129, 75–79 (2019).
pubmed: 30797495
doi: 10.1016/j.lungcan.2019.01.010
LeCun, Y. The Unreasonable Effectiveness of Deep Learning. http://videolectures.net/sahd2014_lecun_deep_learning/ (2014). UCL-Duke Workshop on Sensing and Analysis of High-Dimensional Data.
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).
Chui, M. et al. Notes from the AI frontier: Insights from hundreds of use cases (McKinsey Global Institute, 2018).
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (2016).
Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, 3146–3154 (2017).
Bansal, S. Historical Data Science Trends on Kaggle. https://www.kaggle.com/shivamb/data-science-trends-on-kaggle (2019).
Shwartz-Ziv, R. & Armon, A. Tabular Data: Deep Learning Is Not All You Need. arXiv preprint arXiv:2106.03253 (2021).
Popov, S., Morozov, S. & Babenko, A. Neural oblivious decision ensembles for deep learning on tabular data. arXiv preprint arXiv:1909.06312 (2019).
Arık, S. O. & Pfister, T. (Attentive interpretable tabular learning. arXiv, Tabnet, 2020).
Meng, L. et al. A deep learning prognosis model help alert for COVID-19 patients at high-risk of death: A multi-center study. IEEE J. Biomed. Health Inform. 24, 3576–3584 (2020).
pubmed: 33108303
doi: 10.1109/JBHI.2020.3034296
Liu, M., Zhang, J., Adeli, E. & Shen, D. Joint classification and regression via deep multi-task multi-channel learning for Alzheimer’s disease diagnosis. IEEE Trans. Biomed. Eng. 66, 1195–1206 (2018).
pubmed: 30222548
pmcid: 6764421
doi: 10.1109/TBME.2018.2869989
Spasov, S. et al. A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer’s disease. Neuroimage 189, 276–287 (2019).
pubmed: 30654174
doi: 10.1016/j.neuroimage.2019.01.031
Gessert, N., Nielsen, M., Shaikh, M., Werner, R. & Schlaefer, A. Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data. MethodsX 7, 100864 (2020).
pubmed: 32292713
pmcid: 7150512
doi: 10.1016/j.mex.2020.100864
Pang, L., Wang, J., Zhao, L., Wang, C. & Zhan, H. A novel protein subcellular localization method with CNN-XGBoost model for Alzheimer’s disease. Front. Genet. 9, 751 (2019).
pubmed: 30713552
pmcid: 6345701
doi: 10.3389/fgene.2018.00751
Ren, X., Guo, H., Li, S., Wang, S. & Li, J. A novel image classification method with CNN-XGBoost model. In International Workshop on Digital Watermarking, 378–390 (Springer, 2017).
Carvalho, E. D., Carvalho, E. D., de Carvalho Filho, A. O., de Araújo, F. H. D. & Rabêlo, R. d. A. L. Diagnosis of COVID-19 in CT image using CNN and XGBoost. In 2020 IEEE Symposium on Computers and Communications (ISCC), 1–6 (IEEE, 2020).
Hancock, J. T. & Khoshgoftaar, T. M. CatBoost for big data: An interdisciplinary review. J. Big Data 7, 1–45 (2020).
doi: 10.1186/s40537-020-00369-8
Bostrom, N. & Yudkowsky, E. The ethics of artificial intelligence. In The Cambridge Handbook of Artificial Intelligence, vol. 1, 316–334 (2014).
European Union. Regulation (eu) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Off. J. L110(59), 1–88 (2016).