A quantum-based oversampling method for classification of highly imbalanced and overlapped data.
Classification
class imbalance
class overlapping
oversampling
quantum potential energy
Journal
Experimental biology and medicine (Maywood, N.J.)
ISSN: 1535-3699
Titre abrégé: Exp Biol Med (Maywood)
Pays: England
ID NLM: 100973463
Informations de publication
Date de publication:
28 Jan 2024
28 Jan 2024
Historique:
medline:
28
1
2024
pubmed:
28
1
2024
entrez:
28
1
2024
Statut:
aheadofprint
Résumé
Data imbalance is a challenging problem in classification tasks, and when combined with class overlapping, it further deteriorates classification performance. However, existing studies have rarely addressed both issues simultaneously. In this article, we propose a novel quantum-based oversampling method (QOSM) to effectively tackle data imbalance and class overlapping, thereby improving classification performance. QOSM utilizes the quantum potential theory to calculate the potential energy of each sample and selects the sample with the lowest potential as the center of each cover generated by a constructive covering algorithm. This approach optimizes cover center selection and better captures the distribution of the original samples, particularly in the overlapping regions. In addition, oversampling is performed on the samples of the minority class covers to mitigate the imbalance ratio (IR). We evaluated QOSM using three traditional classifiers (support vector machines [SVM], k-nearest neighbor [KNN], and naive Bayes [NB] classifier) on 10 publicly available KEEL data sets characterized by high IRs and varying degrees of overlap. Experimental results demonstrate that QOSM significantly improves classification accuracy compared to approaches that do not address class imbalance and overlapping. Moreover, QOSM consistently outperforms existing oversampling methods tested. With its compatibility with different classifiers, QOSM exhibits promising potential to improve the classification performance of highly imbalanced and overlapped data.
Identifiants
pubmed: 38281087
doi: 10.1177/15353702231220665
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
15353702231220665Déclaration de conflit d'intérêts
Declaration Of Conflicting InterestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.