Performance evaluation of deep neural ensembles toward malaria parasite detection in thin-blood smear images.

Blood smear Computer-aided diagnosis Convolutional neural networks Deep learning Ensemble Machine learning Malaria Red blood cells Screening Statistical analysis

Journal

PeerJ
ISSN: 2167-8359
Titre abrégé: PeerJ
Pays: United States
ID NLM: 101603425

Informations de publication

Date de publication:
2019
Historique:
received: 05 02 2019
accepted: 14 04 2019
entrez: 11 6 2019
pubmed: 11 6 2019
medline: 11 6 2019
Statut: epublish

Résumé

Malaria is a life-threatening disease caused by State-of-the-art computer-aided diagnostic tools based on data-driven deep learning algorithms like convolutional neural network (CNN) has become the architecture of choice for image recognition tasks. However, CNNs suffer from high variance and may overfit due to their sensitivity to training data fluctuations. The primary aim of this study is to reduce model variance, improve robustness and generalization through constructing model ensembles toward detecting parasitized cells in thin-blood smear images. We evaluate the performance of custom and pretrained CNNs and construct an optimal model ensemble toward the challenge of classifying parasitized and normal cells in thin-blood smear images. Cross-validation studies are performed at the patient level to ensure preventing data leakage into the validation and reduce generalization errors. The models are evaluated in terms of the following performance metrics: (a) Accuracy; (b) Area under the receiver operating characteristic (ROC) curve (AUC); (c) Mean squared error (MSE); (d) Precision; (e) F-score; and (f) Matthews Correlation Coefficient (MCC). It is observed that the ensemble model constructed with VGG-19 and SqueezeNet outperformed the state-of-the-art in several performance metrics toward classifying the parasitized and uninfected cells to aid in improved disease screening. Ensemble learning reduces the model variance by optimally combining the predictions of multiple models and decreases the sensitivity to the specifics of training data and selection of training algorithms. The performance of the model ensemble simulates real-world conditions with reduced variance, overfitting and leads to improved generalization.

Sections du résumé

BACKGROUND BACKGROUND
Malaria is a life-threatening disease caused by
INTRODUCTION BACKGROUND
State-of-the-art computer-aided diagnostic tools based on data-driven deep learning algorithms like convolutional neural network (CNN) has become the architecture of choice for image recognition tasks. However, CNNs suffer from high variance and may overfit due to their sensitivity to training data fluctuations.
OBJECTIVE OBJECTIVE
The primary aim of this study is to reduce model variance, improve robustness and generalization through constructing model ensembles toward detecting parasitized cells in thin-blood smear images.
METHODS METHODS
We evaluate the performance of custom and pretrained CNNs and construct an optimal model ensemble toward the challenge of classifying parasitized and normal cells in thin-blood smear images. Cross-validation studies are performed at the patient level to ensure preventing data leakage into the validation and reduce generalization errors. The models are evaluated in terms of the following performance metrics: (a) Accuracy; (b) Area under the receiver operating characteristic (ROC) curve (AUC); (c) Mean squared error (MSE); (d) Precision; (e) F-score; and (f) Matthews Correlation Coefficient (MCC).
RESULTS RESULTS
It is observed that the ensemble model constructed with VGG-19 and SqueezeNet outperformed the state-of-the-art in several performance metrics toward classifying the parasitized and uninfected cells to aid in improved disease screening.
CONCLUSIONS CONCLUSIONS
Ensemble learning reduces the model variance by optimally combining the predictions of multiple models and decreases the sensitivity to the specifics of training data and selection of training algorithms. The performance of the model ensemble simulates real-world conditions with reduced variance, overfitting and leads to improved generalization.

Identifiants

pubmed: 31179181
doi: 10.7717/peerj.6977
pii: 6977
pmc: PMC6544011
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e6977

Déclaration de conflit d'intérêts

The authors declare there are no competing interests.

Références

Med Biol Eng Comput. 2006 May;44(5):427-36
pubmed: 16937184
Genome Res. 2009 Nov;19(11):2125-32
pubmed: 19679872
Micron. 2013 Feb;45:97-106
pubmed: 23218914
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
Int J Cardiol. 2016 Apr 15;209:346
pubmed: 26574239
Radiology. 2017 Aug;284(2):574-582
pubmed: 28436741
J Biophotonics. 2018 Mar;11(3):null
pubmed: 28851134
Transl Res. 2018 Apr;194:36-55
pubmed: 29360430
PeerJ. 2018 Apr 16;6:e4568
pubmed: 29682411
Conf Proc IEEE Eng Med Biol Soc. 2018 Jul;2018:718-721
pubmed: 30440497

Auteurs

Sivaramakrishnan Rajaraman (S)

Communications Engineering Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

Stefan Jaeger (S)

Communications Engineering Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

Sameer K Antani (SK)

Communications Engineering Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

Classifications MeSH