Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets.

Covid Data mining Hemogram Imbalanced datasets Machine learning

Journal

PeerJ. Computer science
ISSN: 2376-5992
Titre abrégé: PeerJ Comput Sci
Pays: United States
ID NLM: 101660598

Informations de publication

Date de publication:
2021
Historique:
received: 22 04 2021
accepted: 20 07 2021
entrez: 30 8 2021
pubmed: 31 8 2021
medline: 31 8 2021
Statut: epublish

Résumé

The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil's case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.

Identifiants

pubmed: 34458574
doi: 10.7717/peerj-cs.670
pii: cs-670
pmc: PMC8372002
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e670

Informations de copyright

© 2021 Dorn et al.

Déclaration de conflit d'intérêts

The authors declare that they have no competing interests.

Références

Clin Chim Acta. 2020 Jul;506:145-148
pubmed: 32178975
Int J Surg. 2020 Jun;78:185-193
pubmed: 32305533
Lancet Infect Dis. 2020 Sep;20(9):e245-e249
pubmed: 32687805
Lancet. 2020 May 23;395(10237):1608-1610
pubmed: 32401714
Anal Chem. 2021 Feb 2;93(4):2471-2479
pubmed: 33471512
Int J Infect Dis. 2021 Apr;105:579-587
pubmed: 33713813
Am J Hematol. 2020 Jun;95(6):E131-E134
pubmed: 32129508
Clin Chem Lab Med. 2020 Jun 25;58(7):1021-1028
pubmed: 32286245
EJIFCC. 2009 Jan 20;19(4):203-11
pubmed: 27683318
Virusdisease. 2020 Jun;31(2):97-105
pubmed: 32656306
Clin Chem Lab Med. 2020 Jun 25;58(7):1095-1099
pubmed: 32301746
J Med Syst. 2020 Jul 1;44(8):135
pubmed: 32607737
Inform Med Unlocked. 2020;21:100449
pubmed: 33102686
Inform Med Unlocked. 2020;20:100378
pubmed: 32839734
PLoS Negl Trop Dis. 2020 Sep 24;14(9):e0008056
pubmed: 32970674
Lancet. 2020 Apr 11;395(10231):1189-1190
pubmed: 32224299
J Healthc Eng. 2018 May 22;2018:6275435
pubmed: 29951182
Physiol Genomics. 2020 Apr 1;52(4):200-202
pubmed: 32216577
Clin Chem Lab Med. 2020 Jun 25;58(7):1131-1134
pubmed: 32119647
J Clin Virol. 2020 Aug;129:104502
pubmed: 32544861
PeerJ. 2020 Jun 29;8:e9482
pubmed: 32656001
N Engl J Med. 2020 Apr 30;382(18):1708-1720
pubmed: 32109013
Intern Emerg Med. 2020 Nov;15(8):1435-1443
pubmed: 32812204
Eur Radiol. 2020 Oct;30(10):5720-5727
pubmed: 32415585
Front Cell Dev Biol. 2020 Jul 31;8:683
pubmed: 32850809
Eur J Clin Microbiol Infect Dis. 2020 Jun;39(6):1011-1019
pubmed: 32291542
J Biol Rhythms. 2015 Oct;30(5):374-88
pubmed: 26163380
Am J Hematol. 2020 Jul;95(7):834-847
pubmed: 32282949
J Pharm Biomed Anal. 2018 Sep 5;158:494-503
pubmed: 29966946
J Med Virol. 2020 Nov;92(11):2266-2268
pubmed: 32533773
Nat Microbiol. 2020 Oct;5(10):1299-1305
pubmed: 32651556
Clin Microbiol Infect. 2020 Sep;26(9):1178-1182
pubmed: 32593741
Front Genet. 2020 Nov 23;11:586602
pubmed: 33329726
Radiology. 2020 Jun;295(3):200463
pubmed: 32077789
J Glob Health. 2020 Jun;10(1):010375
pubmed: 32582438
Clin Infect Dis. 2020 Jul 28;71(15):833-840
pubmed: 32296824
J Comput Biol. 2019 Apr;26(4):376-386
pubmed: 30789283
Stat Biopharm Res. 2020 Aug 18;12(4):506-517
pubmed: 34191983
Int Immunopharmacol. 2020 Sep;86:106705
pubmed: 32652499
Appl Soft Comput. 2021 Feb;99:106906
pubmed: 33204229
J Pharm Biomed Anal. 2015 Nov 10;115:562-9
pubmed: 26319749
Eur J Radiol. 2020 Jun;127:109009
pubmed: 32325282
OMICS. 2020 Sep;24(9):512-514
pubmed: 32511048
Front Public Health. 2020 May 29;8:241
pubmed: 32574307
J Clin Virol. 2020 Jul;128:104382
pubmed: 32388468
JAMA. 2020 May 19;323(19):1981
pubmed: 32236503
Ann Med. 2020 Aug;52(5):207-214
pubmed: 32370561
J Clin Epidemiol. 2003 Nov;56(11):1129-35
pubmed: 14615004
Clin Chem Lab Med. 2020 Oct 21;59(2):421-431
pubmed: 33079698
J Biomed Inform. 2019 Jan;89:122-133
pubmed: 30521855
J Infect. 2015 Jan;70(1):1-10
pubmed: 25246360
Eur Respir J. 2020 Aug 20;56(2):
pubmed: 32616597
J Med Virol. 2020 Sep;92(9):1533-1541
pubmed: 32181903
Emerg Infect Dis. 2020 Aug;26(8):1839-1841
pubmed: 32384045
Comput Biol Med. 2021 May;132:104335
pubmed: 33812263
PLoS One. 2020 Jul 9;15(7):e0235844
pubmed: 32645053
Clin Chem Lab Med. 2020 Jun 25;58(7):1116-1120
pubmed: 32172226
N Engl J Med. 2020 May 21;382(21):2012-2022
pubmed: 32227758
IEEE Trans Neural Netw. 2010 May;21(5):813-30
pubmed: 20236881
J Comput Biol. 2021 Jul 14;:
pubmed: 34264745
Nat Biotechnol. 2020 Jul;38(7):769-772
pubmed: 32641849
Br J Haematol. 2020 Jul;190(2):179-184
pubmed: 32453877
Clin Infect Dis. 2020 Aug 12;:
pubmed: 32785701
IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):971-989
pubmed: 26390495
ACS Cent Sci. 2020 May 27;6(5):591-605
pubmed: 32382657

Auteurs

Marcio Dorn (M)

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.
Center of Biotechnology, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.
Forensic Science, National Institute of Science and Technology, Porto Alegre, RS, Brazil.

Bruno Iochins Grisci (BI)

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

Pedro Henrique Narloch (PH)

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

Bruno César Feltes (BC)

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.
Department of Genetics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

Eduardo Avila (E)

Forensic Science, National Institute of Science and Technology, Porto Alegre, RS, Brazil.
School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

Alessandro Kahmann (A)

Institute of Mathematics, Statistics and Physics, Federal University of Rio Grande, Rio Grande, RS, Brazil.

Clarice Sampaio Alho (CS)

Forensic Science, National Institute of Science and Technology, Porto Alegre, RS, Brazil.
School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

Classifications MeSH